## Choropleth Maps for Spatial Visualisation of Spatial Data (duh). 

#### Why? 
Bar plots provide us a simple visualization with rectangular bars, with size and direction directly proportional to the grouped values that they represent. But since we are dealing with geo-spatial data, there’s a better way to visualize it.

A choropleth map is a medium of visualizing geo-spatial data, the areas in which are shaded in proportion to some statistical measurement being displayed on the map. Typically, the areas with a darker shade, have a higher value of whatever feature being represented. Choropleth maps provide a more intuitive way of digesting the data, and visualizing how it varies over geographic regions.

#### How?

For our project, we used Folium, an open source python library that allows us to bind the data manipulating capabilities of Python and Pandas, and combine it with Leaflet.js, an open source Javascript library for creating interactive maps.

Let's start by importing the necessary libraries.

_Note: Code explained in the previous notebook(s) might not be explained here again. Please visit the notebooks in the order they are presented in the Readme.md file._

_IMPORTANT NOTE: Due to ipython+git, it's not possible to view the resulting choropleth maps in the notebook itself. please visit the choropleth

In [7]:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import folium
import geopandas
# import json

First we load up the csv for SA1s in Inner Melbourne, their <i>Walkability</i> scores and related metrics. 

In [8]:
dframe = pd.read_csv('data/innermelbourne.csv')
dframe.columns = dframe.columns.str.strip() #Taking care of formatting issues in column names.

In [9]:
#Dropping columns deemed irrelevant
aframe = dframe.drop(['gcc_name11','gcc_code11','sa2_5dig11','sa1_7dig11','sa3_code11','sa4_code11','ste_code11','ste_name11'],axis=1)

Now we use pandas to group SA1s together, according to their suburbs, and then calculate the averages for Walkabiity scores and other metrics. 

In [10]:
#Group by suburb
avg_sa2 = aframe[['sa2_name11','SumZScore']].groupby('sa2_name11').mean()

#Group by SA3
avg_sa3 = aframe[['sa3_name11','SumZScore']].groupby('sa3_name11').mean()

In [11]:
#To get names of all sa2 suburbs, regardless of missing values
area_suburbs = Series.sort_values(geopandas.GeoDataFrame.from_file('data/inner_melb_sa2.json')['area_name'])

#Doing this for SA1s right now.
sa1_codes = Series.sort_values(geopandas.GeoDataFrame.from_file('data/inner_melb_sa1.json')['sa1_code'])

Now we'll do some pre-processing on the aggregated data-sets, inc. resetting the index, and changing column names. So that it's ready to be fed-into Folium in the format required.

In [20]:
sa3_walk_avg = avg_sa3[['SumZScore']].reset_index()
sa3_walk_avg.columns = ['SA3 Suburb','Walkability Score']
sa3_walk_avg

Unnamed: 0,SA3 Suburb,Walkability Score
0,Brunswick - Coburg,-1.442297
1,Darebin - South,-1.705241
2,Essendon,-1.50101
3,Melbourne City,2.551466
4,Port Phillip,0.256828
5,Stonnington - West,0.053694
6,Yarra,0.446783


In [21]:
suburb_walk_avg = avg_sa2[['SumZScore']]
suburb_walk_avg = suburb_walk_avg.reindex(area_suburbs,fill_value=None).reset_index()
suburb_walk_avg.columns = ['SA2 Suburb','Walkability Score']
suburb_walk_avg

Unnamed: 0,SA2 Suburb,Walkability Score
0,Abbotsford,0.847583
1,Albert Park,0.425471
2,Alphington - Fairfield,-2.055192
3,Armadale,-1.016522
4,Ascot Vale,-1.463378
5,Brunswick,-0.680594
6,Brunswick East,-0.417797
7,Brunswick West,-1.725236
8,Carlton,2.698953
9,Carlton North - Princes Hill,-0.955973


Everything works as expected till now, so let's start building the choroleths using Folium.

For building a folium/leaflet choropleth map, the inputs include: the starting coordinates for the map, GeoJSON polygons representing SA boundaries, the data-set to visualize (walkability scores), key column to bind the data on, and other cosmetic parameters.

In [22]:
map_melb_sa3 = folium.Map(location=[-37.814,144.954],zoom_start = 12,max_zoom=15)

map_melb_sa3.choropleth(
    geo_path='data/inner_melb_sa3.json', #path to the geojson polygons for inner Melbourne SA3s, obtained from AURIN
    data = sa3_walk_avg, #data to bind to the choropleth
    key_on= 'properties.feature_name', #Key in the geojson file to map to the walkability scores
    columns=['SA3 Suburb', 'Walkability Score'], 
    fill_color='YlGnBu', #Colors to fill the choropleth
    line_weight=2, #Weight of the boundary line
)

map_melb_sa3.create_map('choropleth-maps/sa3melbourne.html') #Saving the maps to an html file
map_melb_sa3




And there it is! Our very own choropleth map built on top of our data. Let's build one for SA2s so that there's more to talk about.

In [23]:
map_melb_sa2 = folium.Map(location=[-37.814,144.954],zoom_start = 12,max_zoom=15)

map_melb_sa2.choropleth(
    geo_path='data/inner_melb_sa2.json',
    data = suburb_walk_avg,
    key_on= 'properties.area_name',
    columns=['SA2 Suburb', 'Walkability Score'],
    fill_color='YlGnBu',
    line_weight=2,
    legend_name = 'Walkability Score'
)

map_melb_sa2.create_map('choropleth-maps/sa2melbourne.html')
map_melb_sa2




Now we have a instinctively understandable visualisation for Inner Melbourne's suburbs. 

Southbank is the most walkable region, and we can also see that the closer a suburb is to Melbourne's CBD, the more walkable it is. This makes sense because you would expect CBD regions to have better road connectivity and land use mixes.

Let's make one last choropleth for SA1s.

In [24]:
#Code below for SA1 choropleths.
avg_sa1 = aframe[['sa1_main11','SumZScore']].groupby('sa1_main11').mean()
sa1_walk_avg = avg_sa1['SumZScore']

In [25]:
#Code below to set the data type of indexes to string. Important for future geoJson and choropleth operations.
sa1_walk_avg = sa1_walk_avg.reset_index()
sa1_walk_avg.columns = ['SA1 Code','Walkability Score']
sa1_walk_avg['SA1 Code'] = sa1_walk_avg['SA1 Code'].apply(str)
sa1_walk_avg.set_index('SA1 Code',inplace=True) 

In [26]:
sa1_walk_avg = sa1_walk_avg.reindex(sa1_codes,fill_value=None).reset_index()
sa1_walk_avg.columns = ['SA1 Code','Walkability Score']

In [28]:
#While building walkability, all SA1s are not considered, due to no available walkability score (explained in the report)
map_melb_sa1 = folium.Map(location=[-37.814,144.954],zoom_start = 13)

map_melb_sa1.choropleth(
    geo_path='data/inner_melb_sa1.json',
    data = sa1_walk_avg,
    key_on= 'properties.sa1_code',
    columns=['SA1 Code', 'Walkability Score'],
    fill_color='YlGnBu',
    line_weight=2,
    #threshold_scale = [-4,0,4,9,12]
)

map_melb_sa1.create_map('choropleth-maps/sa1melbourne.html')
map_melb_sa1



In the SA1 breakdown, we see how the SA1s contribute to the overall score. The colors of SA1s and SA2 seem similar for relative geographic regions in both maps, which is how it should be.

The missing SA1 regions are for parks, the Melbourne University (hi!), and/or industrial regions, which are technically not SA1s due to the minimum resident population requirement. 

And there we have it, all choropleths successfully built. There are folium methods to save them as html objects, and embed them in web applications.