# Visualizing Isochrone data
We'll be looking at the average data of walking area accessibility and the population coverage of the train stations. We'll compare different Klang Valley lines and then different cities with KL.
This notebook assumes that the rapidkl_isochrones notebook has been run and isochrone data is available in csv form. 
Or isochrones data has been gathered via other means.

In [1]:
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from isochrones import * 
import folium


In [2]:
# Calling csv files containing isochrones data from all metros of KL, Singapore and Montreal into dataframes

file_kl = '../resources/data/klang_valley_stations_isochrones_2021-07-29.csv'
file_sg = '../resources/data/mrtsg_iso.csv'
file_mtl = '../resources/data/montreal_metro_iso.csv'

#each city are assigned a separate unique dataframe
data_kl = pd.read_csv(file_kl)
data_sg = pd.read_csv(file_sg)
data_mtl = pd.read_csv(file_mtl)


# Data preprocessing
Requires some understanding of the imported dataframes. <br>
We would want to combine the individual dataframes into one dataframe so that we can easily compare and visual the data. <br>
However each metro station don't have any identifier to determine which city the station is from. We will create a column called "City" an assign the correct city to each metro station. 

In [3]:

print(data_kl.columns)
print(data_sg.columns)
print(data_mtl.columns)

#Assigning each city dataframe a column to represent their city
data_kl['City']="Kuala Lumpur"
data_sg['City']="Singapore"
data_mtl['City']="Montreal"

#only Kl has the 'Service Provider Name' column. We will add one for montreal and singapore
data_sg['Service Provider Name']="SMRT"
data_mtl['Service Provider Name']="STM"

#we don't actually need every column for the overall dataframe. so we will select a few relevant columns
columns = ['Name','Route Name','Latitude','Longitude','Line Colour',
            '5 Minute Range Area', '10 Minute Range Area','15 Minute Range Area', 
            '5 Minute Reach Factor','10 Minute Reach Factor', '15 Minute Reach Factor',
            '5 Minute Population', '10 Minute Population', '15 Minute Population','City','Service Provider Name']

#combining the kl, singapore and montreal dataframes
data_all = pd.concat([data_kl[columns],data_sg[columns],data_mtl[columns]])

Index(['Unnamed: 0', 'Stop ID', 'Name', 'Service Provider Name', 'Latitude',
       'Longitude', 'ROUTE ID', 'Route Name', 'Line Number', 'Line Colour',
       'Colour Hex Code', 'iso', '5 Minute Range Area', '10 Minute Range Area',
       '15 Minute Range Area', '5 Minute Reach Factor',
       '10 Minute Reach Factor', '15 Minute Reach Factor',
       '5 Minute Population', '10 Minute Population', '15 Minute Population'],
      dtype='object')
Index(['OBJECTID', 'Name', 'STN_NO', 'X', 'Y', 'Latitude', 'Longitude',
       'Line Colour', 'Colour Hex Code', 'Route Name', 'Unnamed: 10', 'iso',
       '5 Minute Range Area', '10 Minute Range Area', '15 Minute Range Area',
       '5 Minute Reach Factor', '10 Minute Reach Factor',
       '15 Minute Reach Factor', '5 Minute Population', '10 Minute Population',
       '15 Minute Population'],
      dtype='object')
Index(['Unnamed: 0', 'Stop ID', 'Object ID', 'Name', 'Odonym', 'Namesake',
       'Opened', 'Latitude', 'Longitude', 'Route Name', '

## Visualize walkable area within different timeframes from KL train stations
We will visualize data in the three following aspects
- Area coverage 
- Population (ORS uses data from https://ghsl.jrc.ec.europa.eu/visLanding.php .and open data project by the European Union)
- Reach factor (how circular an isochrone is on a scale of 0 to 1, where 1 is perfectly circle meaning maximum reachability)

In [4]:
# draws histograms for RapidKL lines' 5 minute walking area coverage
fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur') &(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='5 Minute Range Area'
             , barmode = 'group'
             , title="Area coverage within 5 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '5 Minute Range Area' : "5 Minutes Walk Area(km^2)"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show() 

# draws histograms for RapidKL lines' 10 minute walking area coverage
fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur') &(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='10 Minute Range Area'
             , barmode = 'group'
             , title="Area coverage within 10 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '10 Minute Range Area' : "10 Minutes Walk Area(km^2)"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show()

# draws histograms for RapidKL lines' 15 minute walking area coverage
fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur') &(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='15 Minute Range Area'
             , barmode = 'group'
             , title="Area coverage within 15 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '15 Minute Range Area' : "15 Minutes Walk Area(km^2)"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show() 

# draws all coverage in one chart along with all train lines
fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur') ]
             ,x='Route Name'
             ,y=['5 Minute Range Area','10 Minute Range Area','15 Minute Range Area']
             , barmode = 'group'
             , title="Area coverage within walking times from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines",'value' : "Walk Area Coverage(km^2)"}
            ).update_xaxes(categoryorder='total ascending')

fig.show()       

In [5]:
# drawshistograms for RapidKL lines' 5 minute walking population coverage

fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur')&(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='5 Minute Population'
             , barmode = 'group'
             , title="Population covered within 5 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '5 Minute Population' : "Population Coverage"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show() 

# draws histograms for RapidKL lines' 10 minute walking population coverage

fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur')&(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='10 Minute Population'
             , barmode = 'group'
             , title="Population coverage within 10 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '10 Minute Population' : "Population Coverage"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show() 

# draws histograms for RapidKL lines' 15 minute walking population coverage

fig = px.histogram(data_all[(data_all["City"]=='Kuala Lumpur')&(data_all["Service Provider Name"]=='Rapid KL')]
             ,x='Route Name'
             ,y='15 Minute Population'
             , barmode = 'group'
             , title="Population coverage within 15 minutes walk from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", '15 Minute Population' : "Population Coverage"}
            ).update_xaxes(categoryorder='total ascending')
fig.update_layout(showlegend=False)
fig.show() 

# draws  histograms for RapidKL lines' 5,10,15 minute walking population coverage

fig = px.histogram(data_all[data_all["City"]=='Kuala Lumpur']
             ,x='Route Name'
             ,y=['5 Minute Population','10 Minute Population','15 Minute Population']
             , barmode = 'group'
             , title="Population coverage within walking times from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines",'value' : "Population Coverage"}
            ).update_xaxes(categoryorder='total ascending')

fig.show()       

In [6]:
#Thought it would be fun to draw the 15 minutes walking area coverage for each KL train station, Bukit Bintang is the best, Serdang the worst 
fig = px.histogram(data_all[data_all["City"]=='Kuala Lumpur']
             ,x='Name'
             ,y='15 Minute Range Area'
             #,color='Route Name'
             , labels={'count':'Count of Heroes'}
             , title="Average Area covered by 15 minute Walk"
             , template='plotly' 
             , histfunc = 'avg'
            ).update_xaxes(categoryorder='total ascending')
        
fig.show()

# Comparing KL with other cities
The Singapore and Montreal dataset doesn't really include the equivalent of Komuter trains like KL so i thought it would be unfair to bundle them together because generally the coverage for those stations are not as great due to low density. So i excluded Komuter in the following averages so that we get purely metro style train stations

In [7]:
data_kl['Service Provider Name'].unique()

array(['Keretapi Tanah Melayu', 'Rapid KL', 'Express Rail Link',
       'Rapid Bus'], dtype=object)

In [8]:
non_rapidkl =['Keretapi Tanah Melayu', 'Express Rail Link','Rapid Bus']
data_temp = data_all[(data_all["Service Provider Name"]=='STM')|(data_all["Service Provider Name"]=='Rapid KL')|(data_all["Service Provider Name"]=='SMRT')]

In [9]:
data_temp

Unnamed: 0,Name,Route Name,Latitude,Longitude,Line Colour,5 Minute Range Area,10 Minute Range Area,15 Minute Range Area,5 Minute Reach Factor,10 Minute Reach Factor,15 Minute Reach Factor,5 Minute Population,10 Minute Population,15 Minute Population,City,Service Provider Name
60,PWTC,Ampang Line,3.166563,101.693594,Orange,0.337242,1.093349,2.503207,0.6183,0.5012,0.5099,2186.0,7710.0,16554.0,Kuala Lumpur,Rapid KL
61,SULTAN ISMAIL,Ampang Line,3.161185,101.694127,Orange,0.154666,0.802314,2.187139,0.2836,0.3678,0.4456,845.0,5572.0,15104.0,Kuala Lumpur,Rapid KL
62,BANDARAYA,Ampang Line,3.155548,101.694406,Orange,0.331767,1.329914,2.975538,0.6083,0.6096,0.6062,2141.0,8555.0,20457.0,Kuala Lumpur,Rapid KL
63,TITIWANGSA,Ampang Line,3.173591,101.695273,Orange,0.377274,1.317302,3.249895,0.6917,0.6038,0.6621,3133.0,9015.0,21377.0,Kuala Lumpur,Rapid KL
64,SENTUL TIMUR,Ampang Line,3.185821,101.695335,Orange,0.159329,0.980199,2.301548,0.2921,0.4493,0.4689,847.0,7814.0,16574.0,Kuala Lumpur,Rapid KL
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,D'Iberville,Blue Line,45.553078,-73.602270,Blue,0.392088,1.532228,3.482008,0.7189,0.7023,0.7093,4854.0,16359.0,40443.0,Montreal,STM
69,Saint-Michel,Blue Line,45.559813,-73.599940,Blue,0.375009,1.528195,3.478716,0.6876,0.7005,0.7087,3847.0,16371.0,30414.0,Montreal,STM
70,Berri–UQAM,Yellow Line,45.515027,-73.561260,Yellow,0.395714,1.521756,3.442349,0.7255,0.6975,0.7013,1952.0,16004.0,33577.0,Montreal,STM
71,Jean-Drapeau,Yellow Line,45.512435,-73.533170,Yellow,0.357867,1.249158,2.073656,0.6561,0.5726,0.4224,0.0,0.0,0.0,Montreal,STM


In [25]:
#comparing walking area coverage averaged out for different cities this includes all our train lines
fig = px.histogram(data_temp
             ,x='City'
             ,y=['5 Minute Range Area','10 Minute Range Area','15 Minute Range Area']
             , barmode = 'group'
             , title="Area coverage within walking times from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines", 'value':'Walkable Area Coverage(km^2)'}
            ).update_xaxes(categoryorder='total ascending')

fig.show() 

In [11]:
fig = px.histogram(data_temp
             ,x='City'
             ,y=['5 Minute Population','10 Minute Population','15 Minute Population']
             , barmode = 'group'
             , title="Population coverage within walking times from station"
             , template='plotly'
             , histfunc = 'avg'
             ,labels={'Route Name': "Lines",'value':'Population Coverage'}
            ).update_xaxes(categoryorder='total ascending')

fig.show() 

In [12]:
data_mtl.describe()

Unnamed: 0.1,Unnamed: 0,Object ID,Latitude,Longitude,5 Minute Range Area,10 Minute Range Area,15 Minute Range Area,5 Minute Reach Factor,10 Minute Reach Factor,15 Minute Reach Factor,5 Minute Population,10 Minute Population,15 Minute Population
count,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0,73.0
mean,36.0,37.0,45.517212,-73.594474,0.358951,1.412011,3.182533,0.658122,0.647216,0.648342,3004.232877,12420.0,26623.205479
std,21.217131,21.217131,0.033642,0.043131,0.057985,0.159607,0.350956,0.106314,0.073157,0.0715,1852.730009,5592.401089,10386.848176
min,0.0,1.0,45.446238,-73.72153,0.083456,0.8052,1.877722,0.153,0.3691,0.3825,0.0,0.0,0.0
25%,18.0,19.0,45.494891,-73.62003,0.345859,1.36188,3.05484,0.6341,0.6242,0.6223,1626.0,8470.0,19066.0
50%,36.0,37.0,45.514946,-73.58206,0.377268,1.464512,3.311741,0.6917,0.6713,0.6747,2871.0,12373.0,28514.0
75%,54.0,55.0,45.541717,-73.56178,0.39478,1.52255,3.425047,0.7238,0.6979,0.6977,4028.0,16359.0,34692.0
max,72.0,73.0,45.596409,-73.52197,0.424272,1.589297,3.546407,0.7779,0.7285,0.7225,10093.0,26210.0,47317.0


# Conclusions
Well as we suspect we dont seem to maximize our train stations for the users compared to other cities like Montreal where our population number is similar or Singapore where our culture and weather are similar. The reasons for that could definitely be further analyzed by looking at individual stations and understanding why it may be so.

## Appendix
### Maps with isochrones of Each KL Line as reference

In [19]:
lines_kl = list(data_kl['Route Name'].unique())
maps_kl = []
location = data_kl['Latitude'].iloc[0],data_kl['Longitude'].iloc[0]
station_kl_dict = dictSetup(data_kl)
for line in lines_kl:
    map_temp = folium.Map(tiles='OpenStreetMap',location=location ,zoom_start=11)
    temp_line = list(data_kl[data_kl['Route Name']==line].index)
    temp_dict = stationSubset(station_kl_dict,temp_line)
    isoVisualizer(map_temp,temp_dict)
    maps_kl.append((map_temp,line))

Done!
Done!
Done!
Done!
Done!
Done!
Done!
Done!
Done!
Done!
Done!


In [20]:
maps_kl[0][0]

In [21]:
def saveMapMultiple(temp_map):
    import os
    import time
    from selenium import webdriver
    delay=10
    #Save the map as an HTML file
    map_name = '{}.png'.format(temp_map[1])
    html_name = '{}.html'.format(temp_map[1])
    tmpurl='file://{path}/{mapfile}'.format(path=os.getcwd(),mapfile=html_name)
    temp_map[0].save(html_name)

    #Open a browser window...
    browser = webdriver.Firefox()
    #..that displays the map...
    browser.get(tmpurl)
    #Give the map tiles some time to load
    time.sleep(delay)
    #Grab the screenshot

    browser.save_screenshot(map_name)
    #Close the browser
    browser.quit()
    return

In [22]:
for kl_map  in maps_kl:
    saveMapMultiple(kl_map)