# Streetcar Delay Prediction - Geocode Visualization

Use dataset covering Toronto Transit Commission (TTC) streetcar delays 2014 - present to predict future delays and come up with recommendations for avoiding delays.

Source dataset: : https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#e8f359f0-2f47-3058-bf64-6ec488de52da

This notebook contains the steps for a simple visualization of the cleaned and geocoded version of this dataset.

# Streetcar routes

From https://www.ttc.ca/PDF/Maps/TTC_StreetcarMap.pdf

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/ttc_sc_map.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>


In [1]:
! pip install -U folium
import folium

Requirement already up-to-date: folium in /opt/conda/envs/fastai/lib/python3.6/site-packages (0.9.1)


# Load libraries

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# import seaborn as sns
import datetime
import os
from folium.plugins import MarkerCluster
import folium
import pixiedust
from folium.plugins import HeatMap

remove_bad_values = False
city_name = 'Toronto'


Pixiedust database opened successfully


# Load dataset

In [3]:
url="https://raw.githubusercontent.com/ryanmark1867/manning/master/2014_2018_df_cleaned_keep_bad_loc_geocoded_apr23.csv"

df=pd.read_csv(url)
df.head()


Unnamed: 0.1,Unnamed: 0,Report Date,Route,Time,Day,Location,Incident,Min Delay,Min Gap,Direction,Vehicle,Report Date Time,lat_long,latitude,longitude
0,0,2016-01-01 00:00:00,505,00:00:00,Friday,dundas west stationt to broadview station,General Delay,7.0,14.0,w,4028,2016-01-01 00:00:00,"[0.0, 0.0]",0.0,0.0
1,1,2016-01-01 00:00:00,511,02:14:00,Friday,fleet st. and strachan,Mechanical,10.0,20.0,e,4018,2016-01-01 02:14:00,"[43.6362976, -79.4096351]",43.636298,-79.409635
2,2,2016-01-01 00:00:00,301,02:22:00,Friday,queen st. west and roncesvalles,Mechanical,9.0,18.0,w,4201,2016-01-01 02:22:00,"[43.64533489999999, -79.4131843]",43.645335,-79.413184
3,3,2016-01-01 00:00:00,301,03:28:00,Friday,lake shore blvd. and superior st.,Mechanical,20.0,40.0,e,4251,2016-01-01 03:28:00,"[43.61496169999999, -79.4886581]",43.614962,-79.488658
4,4,2016-01-01 00:00:00,501,14:28:00,Friday,roncesvalles to neville park,Mechanical,6.0,12.0,e,4242,2016-01-01 14:28:00,"[0.0, 0.0]",0.0,0.0


# Visualize using Pixiedust
- do quick visualization using Pixiedust
- select chart type = map; keys = latitude, longitude; values = Min Delay

In [4]:
# visualize using Pixiedust
! pip install pixiedust
import pixiedust
display(df)

In [5]:
df.shape

(69603, 15)

# Scope the dataset down to valid locations
Use the boundaries of the streetcar network to limit the dataset to just the locations that are covered by the streetcar network.

In [6]:
# remove locations outside of portion of Toronto with streetcar routes
# latitude NS (higher north), longitude EW (higher east)

# west of Queen and Victoria Park: 43.674280, -79.280260
# east and north of Lakeshore and Etobicoke Creek: 43.587350, -79.547860
# south of St Clair and Mt. Pleasant: 43.687840,-79.399800


# boundaries of streetcar network:
min_lat = 43.58735
max_lat = 43.687840
min_long = -79.547860
max_long = -79.280260    
    
    
df = df[df.latitude >= min_lat]
df = df[df.latitude <= max_lat]
df = df[df.longitude >= min_long]
df = df[df.longitude <= max_long]
df.head()



Unnamed: 0.1,Unnamed: 0,Report Date,Route,Time,Day,Location,Incident,Min Delay,Min Gap,Direction,Vehicle,Report Date Time,lat_long,latitude,longitude
1,1,2016-01-01 00:00:00,511,02:14:00,Friday,fleet st. and strachan,Mechanical,10.0,20.0,e,4018,2016-01-01 02:14:00,"[43.6362976, -79.4096351]",43.636298,-79.409635
2,2,2016-01-01 00:00:00,301,02:22:00,Friday,queen st. west and roncesvalles,Mechanical,9.0,18.0,w,4201,2016-01-01 02:22:00,"[43.64533489999999, -79.4131843]",43.645335,-79.413184
3,3,2016-01-01 00:00:00,301,03:28:00,Friday,lake shore blvd. and superior st.,Mechanical,20.0,40.0,e,4251,2016-01-01 03:28:00,"[43.61496169999999, -79.4886581]",43.614962,-79.488658
5,5,2016-01-01 00:00:00,505,15:42:00,Friday,broadview station loop,Investigation,4.0,10.0,w,4187,2016-01-01 15:42:00,"[43.677135, -79.35820799999999]",43.677135,-79.358208
6,6,2016-01-01 00:00:00,504,15:54:00,Friday,broadview and queen,Mechanical,6.0,12.0,e,4181,2016-01-01 15:54:00,"[43.6593626, -79.34769709999999]",43.659363,-79.347697


In [7]:
df.shape

(65049, 15)

# Visualize using Folium: clustering delay incidents
Use Folium to display a cluster view of delay counts

In [None]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
# subset to match subset of locations
MAX_RECORDS = 2500
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)

mc = MarkerCluster()

# iterate through dataset to create clusters

for row in df[0:MAX_RECORDS].itertuples():
    mc.add_child(folium.Marker(location=[row.latitude,  row.longitude],
                 popup=row.Location))

map_tor.add_child(mc)
display(map_tor)

# Visualize using Folium: heatmap of delay counts
Use Folium to display a heat map view of delay counts

In [9]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)
df['count'] = 1

# define heat map

HeatMap(data=df[['latitude', 'longitude', 'count']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(map_tor)


display(map_tor)

# Visualize using Folium: heatmap of delay durations
Use Folium to display a heat map view of delay durations

In [10]:
# define centre of map
TOR_COORDINATES = (df['latitude'].mean(), df['longitude'].mean())
 
  
# create empty map zoomed in on Toronto
map_tor = folium.Map(location=TOR_COORDINATES, zoom_start=12)

# define heat map

HeatMap(data=df[['latitude', 'longitude', 'Min Delay']].groupby(['latitude', 'longitude']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(map_tor)


display(map_tor)

# Tableau rendering of the same dataset

Here is an example of the same dataset rendered in Tableau:

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/tableau_smalldots.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>

# Tableau rendering using size and colour

<table style="border: none" align="left">
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://raw.githubusercontent.com/ryanmark1867/manning/master/tableau_size_colour_zoom.jpg" width="900" alt="Icon"> </th>
   </tr>
</table>

This notebook demonstrated using Pixiedust and Folium to visualize a dataset including latitude and longitude values.