Accompanying code for blog post: 38 Million Ecobici Trips in Mexico City
Also included are shapefiles for Mexico City streets and colonias (neighberhoods), filtered to only include those relevant to the parts of the city covered by Ecobici, as well as json file of Ecobici stations.
Streets:
/data/Streets/cdmx
Raw data comes from Open Streetmap data and was downloaded from http://download.geofabrik.de/osm/north-america/mexico-latest-free.shp.zip
Colonias:
/data/Colonias/df
Raw data was downloaded from http://shapesdemexico.wixsite.com/shapes/agebs
Stations:
/data/stations.json
Obtained from https://www.ecobici.cdmx.gob.mx/es/mapa-de-cicloestaciones
Getting set up
- Download data
python get_data.py
- Clean the start and end dates for some months (this is necessary because for most months the dates are in the format yyyy-mm-dd, however for three months in 2016 they are in the format dd/mm/yyyy)
python clean_dates.py
- Initialize the database and load the data
./load_trips.sh
Obtaining Google Maps Instructions
- Add the table for saving the individual trip legs from Google Maps:
psql ecobici -f create_trip_legs.sql
- Extract the relevant routes for obtaining directions from Google:
mkdir output
python extract_station_pairs.py
- Use the
google_instructions.py
script to obtain directions from Google Maps for each distinct route (free account only allows for 2,500 queries per day)
python google_instructions.py -i 'Starting Index' -t 'Num Trips to Process'
Analysis
- Prepare for analysis
psql ecobici -f prepare_analysis.sql
- Count the different legs based on the Google Maps directions to identify most popular routes:
psql ecobici -f count_trip_legs.sql
pgsql2shp -f "line_segments/line_segments" -h 'HOST' -u 'USER' ecobici "SELECT * FROM leg_trips_counts ORDER BY num_legs ASC;"
- Run analysis for Day in The Life and Popular Routes:
python day_in_the_life.py
- Create hexagon bin-tiling and extract shapefile with relevant hexagons:
psql ecobici -f hexagon_tiling.sql
mkdir hexagons
pgsql2shp -f "hexagons/hexagons" -h 'HOST' -u 'USER' ecobici "SELECT hexagon_id, ST_Force2D(the_geom) FROM hex_grid WHERE hexagon_id IN (SELECT DISTINCT(hexagon_id) FROM stations);"
- Core analysis:
python analysis.py