Based on Henrikki Tenkanen’s online course (includes lots of other good resources). 

We also use data from the Helsinki Region-Travel Time Matrix 2015. Toivonen, T., M. Salonen, H. Tenkanen, P. Saarsalmi, T. Jaakkola & J. Järvi (2014). Joukkoliikenteellä, autolla ja kävellen: Avoin saavutettavuusaineisto pääkaupunkiseudulla. Terra 126: 3, 127-136.




# Part 1 - Geocoding

We're starting with a text file of 10 addresses in the Calgary area

In [None]:
import pandas as pd
import geopandas as gpd
from geopandas.tools import geocode
from shapely.geometry import Point
import matplotlib.pyplot as plt
%matplotlib inline

address_data = pd.read_csv("calgary_addresses.txt", sep=';')
address_data.head()

Create a GeoDataFrame that contains our original address and a ‘geometry’ column containing Shapely Point with coordinates (objects that we can use for exporting the addresses to a Shapefile).

In [None]:
#Fix Geocoder Time Out Error
from geopy.exc import GeocoderTimedOut
from time import sleep

def do_geocode(address):
    try:
        sleep(1)
        return geocode(address, provider='nominatim')
    except GeocoderTimedOut:
        return do_geocode(address)

geo = do_geocode(address_data['addr'])
geo.head()

In [None]:
#Join the two dataframes; add the geometry column to the original columns
geo_join = geo.join(address_data)
geo_join.head()

In [None]:
geo_join.plot()

In [None]:
#If you want to export the shape file to use in Tableau
geo_join.to_file("addresses_calgary.shp")

Let's add the map of Calgary to the plot of addresses, using a geojson file of Calgary's neighborhoods

From https://data.calgary.ca/Base-Maps/Community-Boundaries/ab7m-fwn6


In [None]:
calgary_geo = gpd.read_file("CalgaryBoundaries.geojson")
calgary_geo

In [None]:
#We can plot Calgary's neighborhoods - mapping sector to color
calgary_geo.plot(column="sector", cmap="Set2", edgecolor='black', linewidth=.2)


In [None]:
#Use the map of Calgary as the base, and add the points
base=calgary_geo.plot(color='white', edgecolor='black', linewidth=.2)
geo_join.plot(ax=base, marker='o', color='red', markersize=5)

# Part 2 - Retrieving OpenStreetMap data

Boeing, G. 2017. “OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks.” Computers, Environment and Urban Systems 65, 126-139. doi:10.1016/j.compenvurbsys.2017.05.004

In [None]:
#Import OSMnx
import osmnx as ox

place_name="Calgary, Alberta, Canada"
graph = ox.graph_from_place(place_name)

#Create a network of Calgary's streets

fig, ax = ox.plot_graph(graph)
plt.tight_layout()

In [None]:
#We can download information about buildings in Calgary

buildings = ox.footprints_from_place(place_name)
buildings.head()

In [None]:
#We can download a polygon that fits Calgary's boundaries
area = ox.gdf_from_place(place_name)
area

In [None]:
#We can extract the nodes and edges that make up the street network

nodes, edges = ox.graph_to_gdfs(graph)
nodes.head()

In [None]:
edges.head()

In [None]:
fig, ax = plt.subplots()
area.plot(ax=ax, facecolor='black')
edges.plot(ax=ax, linewidth=.1, edgecolor='#BC8F8F')
buildings.plot(ax=ax, facecolor='khaki', alpha=0.7)
plt.tight_layout()

# Part 3 - Visualizing Binned Data

For Part 3, we're going to visualize travel time data from Helsinki, Finland.

This data originally came from blogs.helsinki.fi (https://blogs.helsinki.fi/accessibility/helsinki-region-travel-time-matrix-2015/), where you can find descriptions of the attributes

In [None]:
fp = "TravelTimes_to_5975375_RailwayStation_Helsinki.geojson"

acc = gpd.read_file(fp)
acc.head(5)

In [None]:
#Travel time by public transportation in rush hour traffic >= 0
acc = acc.loc[acc['pt_r_tt'] >=0]

Let's plot travel time to desination during rush hour

Plot using 9 bins and cluster the values using "Fisher Jenks" method. "Jenks minimizes each cluster's average deviation from the mean, while maximizing deviation from the means of the other groups."

https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization

In [None]:
acc.plot(column="pt_r_tt", scheme="Fisher_Jenks", k=9, cmap="RdYlBu", linewidth=0, legend=True)

plt.tight_layout()

In [None]:
#Plot walking distance to city center
acc.plot(column="walk_d", scheme="Fisher_Jenks", k=9, cmap="RdYlBu", linewidth=0, legend=True)

plt.tight_layout()

Let's use pysal to cluster our data by public transit travel times

In [None]:
from pysal.viz import mapclassify

n_classes=9

In [None]:
#We need to make the classifier and apply it to the column
fj = mapclassify.FisherJenks.make(k=n_classes)

bins = acc[['pt_r_tt']].apply(fj)
bins.head()

In [None]:
#We should rename this new column to something unique
bins.columns = ['nb_pt_r_tt']

#and addd it back to the original table
acc = acc.join(bins)
acc.head()

In [None]:
#Plot the new column of clustered travel times

acc.plot(column="nb_pt_r_tt", linewidth=0, legend=True)
plt.tight_layout()

# Creating a Custom Filter

Let's create a filter for this data. Assume we want to find locations that are more than 4 km away from city center, but travel time during rush hour is less than or equal to 20 minutes.

In [None]:
#Add a column for our suitable area
acc["Suitable_area"] = None

In [None]:
def customFilter(row, src_col1, src_col2, threshold1, threshold2, output_col):
    # 1. If the value in src_col1 is LOWER than the threshold1 value
    # 2. AND the value in src_col2 is HIGHER than the threshold2 value, give value 1, otherwise give 0

    if row[src_col1] < threshold1 and row[src_col2] > threshold2:
        # Update the output column with value 0
        row[output_col] = 1
    # If area of input geometry is higher than the threshold value update with value 1
    else:
        row[output_col] = 0

    # Return the updated row
    return row

In [None]:
#Apply filter and add value to 'suitable_area' column
acc = acc.apply(customFilter, src_col1='pt_r_tt', src_col2='walk_d', threshold1=20, threshold2=4000, output_col="Suitable_area", axis=1)

In [None]:
acc.plot(column="Suitable_area", linewidth=0);