# ONS Boundaries - Choropleth Demo

Fragments associated with grabbing boundaries data.

*Note that a full demo using all local authority ditricts kills MyBinder demo.*

## Counties and Unitary Authorities (December 2015) Full Extent Boundaries in England and Wales

Get the boundaries for Local Authority areas:

http://geoportal.statistics.gov.uk/datasets/counties-and-unitary-authorities-december-2015-full-extent-boundaries-in-england-and-wales

In [1]:
import geopandas

#From the downloads area of the page, grab the link for the shapefile download
url='https://opendata.arcgis.com/datasets/0b09996863af4b5db78058225bac5d1b_1.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'
gdf = geopandas.read_file(url)
gdf.head()

Unnamed: 0,ctyua15cd,ctyua15nm,ctyua15nmw,objectid,st_lengths,st_areasha,geometry
0,E06000001,Hartlepool,,1,65270.302085,98441690.0,"POLYGON ((447213.9000000004 537036.0999999996,..."
1,E06000002,Middlesbrough,,2,41055.846979,54553580.0,"POLYGON ((448489.9000000004 522071.8000000007,..."
2,E06000003,Redcar and Cleveland,,3,101208.779781,253890900.0,"POLYGON ((455834.0999999996 528110.5999999996,..."
3,E06000004,Stockton-on-Tees,,4,108085.159612,209730800.0,"POLYGON ((444157 527956.3000000007, 444165.900..."
4,E06000005,Darlington,,5,107206.323036,197475700.0,"POLYGON ((423496.5999999996 524724.3000000007,..."


In [2]:
Preview shapes from the first few rows using the `folium` package:

SyntaxError: invalid syntax (<ipython-input-2-99d250ee3e12>, line 1)

In [3]:
import folium

m = folium.Map(max_zoom=6, location=[53.9, 0.0])
folium.GeoJson(gdf.head()).add_to(m)
m

In [4]:
#You can download a copy of the actual boundary data file if required
# !wget https://opendata.arcgis.com/datasets/52182cdda64d4b15984f6446ca7ee7fd_1.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D -O wards_fullextent.zip
# !unzip wards_fullextent.zip

## Grab Data to Map Against

Let's get some data to use as the basis of a choropleth map.

We can use something from deprivation indices (we really should check we grabbed boundary files for the correct period...).

In [5]:
#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015
#File 10: local authority district summaries

data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'

Download the data into a `pandas` dataframe. The orginal file is an Excel spreadsheet with mutliple sheets, so let's load them all and preview the sheetnames:

In [6]:
import pandas as pd

df = pd.read_excel(data_url, sheet_name=None)

for k in df.keys():
    print(k)

Notes
IMD
Income
Employment
Education
Health
Crime
Barriers
Living
IDACI
IDAOPI


We an the preview the data in a single sheet:

In [7]:
df['Education'].head()

Unnamed: 0,Local Authority District code (2013),Local Authority District name (2013),"Education, Skills and Training - Average rank","Education, Skills and Training - Rank of average rank","Education, Skills and Training - Average score","Education, Skills and Training - Rank of average score","Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally","Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally"
0,E06000001,Hartlepool,20101.48,72,30.51,47,0.2069,37
1,E06000002,Middlesbrough,22728.01,24,40.64,3,0.4419,1
2,E06000003,Redcar and Cleveland,19185.28,95,27.875,71,0.1818,54
3,E06000004,Stockton-on-Tees,16660.09,150,24.637,110,0.175,59
4,E06000005,Darlington,16385.06,155,22.569,129,0.1385,75


## Generate a Choropleth Map

It's easy enough to combine data from a `pandas` data frame with shape data in a `geopandas` dataframe.

The `geopandas` dataframe is used to create a geojson file that the `folium` package can render. Each column name in the `geopandas` dataframe is mapped onto a corresponding `feature.properties.COLUMN_NAME` in the created geojson file. (I'm not sure offhand how column names that include space or punctuation characters are handled: simple column names are easiest.)

The data file is also passed in and the key and data columns identified as `columns=[KEYCOL, DATACOL]`.

The rendered choropleth map is then coloured accordingly.

In [16]:
import folium

m =  folium.Map(max_zoom=9, location=[54.5, -0.8])
folium.Choropleth(gdf.head(), key_on='feature.properties.ctyua15cd',
                  data=df['Education'], 
                  columns=['Local Authority District code (2013)',
                           'Education, Skills and Training - Rank of average rank'],
            fill_color='YlOrBr').add_to(m)
m

If we try to render the whole of the UK in a MyBinder session, things crash. (I think `geopandas` is quite heavy on resources.)

## Exploring the Data

We can exploit the notebook environment further by reating a simple application to explore the data more generally.

For example, within the `Education` data sheet, we can explore choropleth maps generated from other columns.

We can create tidied up names for the data selection that then refer back to the original column name:

In [25]:
#We can create a drop down list with values in the list that map onto column names
#A python dict is the data structure that lets us do this

#We use a technique called a dict comprehension to create the dict from a list of column names
#The split separates the column names on '-' elements into two parts
#The parts are refenced by a numercial index value, starting at 0
#Index value 1 is the second item in the split list
#The .strip() command gets rid of leading/trailing whitespace in the string
datacols = {c.split('-')[1].strip():c for c in df['Education'].columns if c.startswith('Education')}
datacols

{'Average rank': 'Education, Skills and Training - Average rank',
 'Rank of average rank': 'Education, Skills and Training - Rank of average rank',
 'Average score': 'Education, Skills and Training - Average score',
 'Rank of average score': 'Education, Skills and Training - Rank of average score',
 'Proportion of LSOAs in most deprived 10% nationally': 'Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally',
 'Rank of proportion of LSOAs in most deprived 10% nationally': 'Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally'}

In [27]:
from ipywidgets import interact

@interact(datacol=datacols)
def plotEducationChoropleth(datacol='Education, Skills and Training - Rank of average rank'):
    m = folium.Map(max_zoom=9, location=[54.5, -0.8])
    folium.Choropleth(gdf.head(), key_on='feature.properties.ctyua15cd',
                      data=df['Education'], 
                      columns=['Local Authority District code (2013)',
                               datacol],
                fill_color='YlOrBr').add_to(m)
    return m

interactive(children=(Dropdown(description='datacol', index=1, options={'Average rank': 'Education, Skills and…