# Streetcar Delay Prediction - Geocode Bounding boxes

Use dataset covering Toronto Transit Commission (TTC) streetcar delays 2014 - present to predict future delays and come up with recommendations for avoiding delays.

Source dataset: : https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#e8f359f0-2f47-3058-bf64-6ec488de52da

This notebook contains the steps to get geo bounding boxes for routes.

# Streetcar routes

From https://www.ttc.ca/PDF/Maps/TTC_StreetcarMap.pdf

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/ttc_sc_map.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>


In [1]:
! pip install -U folium
import folium

Requirement already up-to-date: folium in /opt/conda/envs/fastai/lib/python3.6/site-packages (0.9.1)


# Load libraries

In [47]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# import seaborn as sns
import datetime
import os
from folium.plugins import MarkerCluster
import folium
import pixiedust
from folium.plugins import HeatMap

remove_bad_values = False
city_name = 'Toronto'
pickled_output_dataframe = 'bounding_box_df_july8'


In [48]:
# get the directory for that this notebook is in
rawpath = os.getcwd()
print("raw path is",rawpath)

raw path is /storage/manning/notebooks


In [49]:
# data is in a directory called "data" that is a sibling to the directory containing the notebook
path = os.path.abspath(os.path.join(rawpath, '..', 'data')) + "/"
print("path is", path)

path is /storage/manning/data/


# Load dataset

In [33]:
url="https://raw.githubusercontent.com/ryanmark1867/manning/master/2014_2018_df_cleaned_keep_bad_loc_geocoded_apr23.csv"

df=pd.read_csv(url)
df.head()


Unnamed: 0.1,Unnamed: 0,Report Date,Route,Time,Day,Location,Incident,Min Delay,Min Gap,Direction,Vehicle,Report Date Time,lat_long,latitude,longitude
0,0,2016-01-01 00:00:00,505,00:00:00,Friday,dundas west stationt to broadview station,General Delay,7.0,14.0,w,4028,2016-01-01 00:00:00,"[0.0, 0.0]",0.0,0.0
1,1,2016-01-01 00:00:00,511,02:14:00,Friday,fleet st. and strachan,Mechanical,10.0,20.0,e,4018,2016-01-01 02:14:00,"[43.6362976, -79.4096351]",43.636298,-79.409635
2,2,2016-01-01 00:00:00,301,02:22:00,Friday,queen st. west and roncesvalles,Mechanical,9.0,18.0,w,4201,2016-01-01 02:22:00,"[43.64533489999999, -79.4131843]",43.645335,-79.413184
3,3,2016-01-01 00:00:00,301,03:28:00,Friday,lake shore blvd. and superior st.,Mechanical,20.0,40.0,e,4251,2016-01-01 03:28:00,"[43.61496169999999, -79.4886581]",43.614962,-79.488658
4,4,2016-01-01 00:00:00,501,14:28:00,Friday,roncesvalles to neville park,Mechanical,6.0,12.0,e,4242,2016-01-01 14:28:00,"[0.0, 0.0]",0.0,0.0


In [34]:
df.shape

(69603, 15)

# Scope the dataset down to valid locations
Use the boundaries of the streetcar network to limit the dataset to just the locations that are covered by the streetcar network.

In [35]:
# remove locations outside of portion of Toronto with streetcar routes
# latitude NS (higher north), longitude EW (higher east)

# west of Queen and Victoria Park: 43.674280, -79.280260
# east and north of Lakeshore and Etobicoke Creek: 43.587350, -79.547860
# south of St Clair and Mt. Pleasant: 43.687840,-79.399800


# boundaries of streetcar network:
min_lat = 43.58735
max_lat = 43.687840
min_long = -79.547860
max_long = -79.280260    
    
    
df = df[df.latitude >= min_lat]
df = df[df.latitude <= max_lat]
df = df[df.longitude >= min_long]
df = df[df.longitude <= max_long]
df.head()



Unnamed: 0.1,Unnamed: 0,Report Date,Route,Time,Day,Location,Incident,Min Delay,Min Gap,Direction,Vehicle,Report Date Time,lat_long,latitude,longitude
1,1,2016-01-01 00:00:00,511,02:14:00,Friday,fleet st. and strachan,Mechanical,10.0,20.0,e,4018,2016-01-01 02:14:00,"[43.6362976, -79.4096351]",43.636298,-79.409635
2,2,2016-01-01 00:00:00,301,02:22:00,Friday,queen st. west and roncesvalles,Mechanical,9.0,18.0,w,4201,2016-01-01 02:22:00,"[43.64533489999999, -79.4131843]",43.645335,-79.413184
3,3,2016-01-01 00:00:00,301,03:28:00,Friday,lake shore blvd. and superior st.,Mechanical,20.0,40.0,e,4251,2016-01-01 03:28:00,"[43.61496169999999, -79.4886581]",43.614962,-79.488658
5,5,2016-01-01 00:00:00,505,15:42:00,Friday,broadview station loop,Investigation,4.0,10.0,w,4187,2016-01-01 15:42:00,"[43.677135, -79.35820799999999]",43.677135,-79.358208
6,6,2016-01-01 00:00:00,504,15:54:00,Friday,broadview and queen,Mechanical,6.0,12.0,e,4181,2016-01-01 15:54:00,"[43.6593626, -79.34769709999999]",43.659363,-79.347697


In [36]:
df.shape

(65049, 15)

In [37]:
# clear out all the columns that aren't needed for bounding boxes
df = df.drop(['Report Date','Time','Day','Location','lat_long','Incident','Min Delay','Min Gap','Direction','Vehicle','Report Date Time'], 1)

In [40]:
df.head()


Unnamed: 0,Route,latitude,longitude
1,511,43.636298,-79.409635
2,301,43.645335,-79.413184
3,301,43.614962,-79.488658
5,505,43.677135,-79.358208
6,504,43.659363,-79.347697


In [39]:
# remove unnamed column
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

# Define bounding boxes

In [41]:
# need to get the max and min lat and long for each route
# df.sort_values('counter', ascending=False).drop_duplicates(['ID2'])
df_max_lat = df.sort_values('latitude',ascending=False).drop_duplicates(['Route'])
df_max_long = df.sort_values('longitude',ascending=False).drop_duplicates(['Route'])
df_min_lat = df.sort_values('latitude',ascending=True).drop_duplicates(['Route'])
df_min_long = df.sort_values('longitude',ascending=True).drop_duplicates(['Route'])
# df['Route']
df_max_lat.head()

Unnamed: 0,Route,latitude,longitude
34464,512,43.687496,-79.393707
53557,501,43.687095,-79.393918
38485,506,43.687076,-79.489833
44458,511,43.687076,-79.489833
65371,510,43.687076,-79.489833


In [42]:
df_max_lat = df_max_lat.rename(columns = {'latitude':'max_lat'})
df_max_long = df_max_long.rename(columns = {'longitude':'max_long'})
df_min_lat = df_min_lat.rename(columns = {'latitude':'min_lat'})
df_min_long = df_min_long.rename(columns = {'longitude':'min_long'})

In [44]:
# df_out = pd.merge(df, df_unique, on="Location", how='left')
df_max = pd.merge(df_max_lat,df_max_long, on='Route', how='left')
df_max = df_max.drop(['longitude','latitude'],1)
df_max.head(20)

Unnamed: 0,Route,max_lat,max_long
0,512,43.687496,-79.298018
1,501,43.687095,-79.28135
2,506,43.687076,-79.281542
3,511,43.687076,-79.312599
4,510,43.687076,-79.316565
5,504,43.686952,-79.281542
6,505,43.686952,-79.284859
7,502,43.686952,-79.281542
8,503,43.686952,-79.284053
9,306,43.686911,-79.286622


In [45]:
# df_out = pd.merge(df, df_unique, on="Location", how='left')
df_min = pd.merge(df_min_lat,df_min_long, on='Route', how='left')
df_min = df_min.drop(['longitude','latitude'],1)
df_min.head(20)

Unnamed: 0,Route,min_lat,min_long
0,501,43.588204,-79.546264
1,301,43.591972,-79.544865
2,bad route,43.591972,-79.543895
3,504,43.591972,-79.543895
4,502,43.591972,-79.543895
5,510,43.598886,-79.542885
6,511,43.60051,-79.53825
7,509,43.602653,-79.51908
8,304,43.618434,-79.539704
9,506,43.622834,-79.536421


In [46]:
# join the intermediate dataframes to get the df with the bounding boxes
df_bounding_box = pd.merge(df_min,df_max, on='Route', how='left')
df_bounding_box.head(20)

Unnamed: 0,Route,min_lat,min_long,max_lat,max_long
0,501,43.588204,-79.546264,43.687095,-79.28135
1,301,43.591972,-79.544865,43.680364,-79.281542
2,bad route,43.591972,-79.543895,43.684692,-79.281542
3,504,43.591972,-79.543895,43.686952,-79.281542
4,502,43.591972,-79.543895,43.686952,-79.281542
5,510,43.598886,-79.542885,43.687076,-79.316565
6,511,43.60051,-79.53825,43.687076,-79.312599
7,509,43.602653,-79.51908,43.68243,-79.287259
8,304,43.618434,-79.539704,43.67691,-79.322966
9,506,43.622834,-79.536421,43.687076,-79.281542


In [50]:
# pickle the bounding box dataframe
file_name = path + pickled_output_dataframe
df_bounding_box.to_pickle(file_name)

In [51]:
dfn = pd.read_pickle(file_name)
dfn.head()

Unnamed: 0,Route,min_lat,min_long,max_lat,max_long
0,501,43.588204,-79.546264,43.687095,-79.28135
1,301,43.591972,-79.544865,43.680364,-79.281542
2,bad route,43.591972,-79.543895,43.684692,-79.281542
3,504,43.591972,-79.543895,43.686952,-79.281542
4,502,43.591972,-79.543895,43.686952,-79.281542


# Visualize using Folium: clustering delay incidents
Use Folium to display a cluster view of delay counts

In [None]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
# subset to match subset of locations
MAX_RECORDS = 2500
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)

mc = MarkerCluster()

# iterate through dataset to create clusters

for row in df[0:MAX_RECORDS].itertuples():
    mc.add_child(folium.Marker(location=[row.latitude,  row.longitude],
                 popup=row.Location))

map_tor.add_child(mc)
display(map_tor)

# Visualize using Folium: heatmap of delay counts
Use Folium to display a heat map view of delay counts

In [9]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)
df['count'] = 1

# define heat map

HeatMap(data=df[['latitude', 'longitude', 'count']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(map_tor)


display(map_tor)

# Visualize using Folium: heatmap of delay durations
Use Folium to display a heat map view of delay durations

In [10]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)

# define heat map

HeatMap(data=df[['latitude', 'longitude', 'Min Delay']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(map_tor)


display(map_tor)

# Tableau rendering of the same dataset

Here is an example of the same dataset rendered in Tableau:

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/tableau_smalldots.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>

# Tableau rendering using size and colour

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/tableau_size_colour_zoom.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>

This notebook demonstrated using Pixiedust and Folium to visualize a dataset including latitude and longitude values.