A Python pipeline that keeps dddfanmap.com up to date. It scrapes Diners, Drive-Ins and Dives episode data from Wikipedia, geocodes each restaurant location using the Google Places API, stores the results in SQL Server, and exports a GeoJSON file for the web map.
- Scrape —
wiki_scrape.pyfetches the episode list from Wikipedia and outputsddd_episodes.csv - Diff —
refresh_data.ipynbcompares the CSV against the database and identifies new episodes and looks for changes in operational business status (OPERATIONAL, TEMPORARILY_CLOSED, PERMANENTLY_CLOSED) and updates the dataset - Geocode — new restaurant locations are looked up via the Google Places API (
geocode.py) and inserted into the database - Export — location data is exported from the database to
locations.json(GeoJSON), which is consumed by the web map
- Python 3.x
- SQL Server with ODBC Driver 17 or use database of your choice
- Google Maps API key (with Places API and Geocoding API enabled)
Install dependencies:
pip install requests beautifulsoup4 pandas geopandas sqlalchemy pyodbc geodatasets matplotlib python-dotenv- Copy
.env.exampleto.envand add your Google Maps API key - Copy
database.ini.exampletodatabase.iniand fill in your database connection details
Scrape Wikipedia only (generates ddd_episodes.csv):
python wiki_scrape.pyFull pipeline (scrape, sync to DB, geocode new locations, update business status, export GeoJSON):
Run all cells in refresh_data.ipynb.
| File | Description |
|---|---|
wiki_scrape.py |
Scrapes episode data from Wikipedia into a CSV |
geocode.py |
Google Places API helpers for geocoding and business status |
db_conn.py |
Database query and update functions |
config.py |
Reads database.ini and builds the DB connection string |
refresh_data.ipynb |
Notebook orchestrating the full data refresh pipeline |
.env.example |
Template for environment variables |
database.ini.example |
Template for database connection config |
ddd_episodes.csv |
Output csv file from wiki_scrape.py |
locations.json |
Final product. Used in dddfanmap.com map |