# Getting started with maps

One of the most powerful - and intuitive - visualisation techniques for location-based data is to make use of maps.

The technical subject of Geographic Information Systems (GIS) is a very broad one, but at its core are systems specifically tailored for the storage, display, interrogation and management of geographic, location-related data. The common feature of all GIS is the central role that location plays in the data - acting as a key to draw together data from different sources and forming the main relationship between data elements.  

Latitiude and longitude values can be used to determine a point on the Earth's surface, but these are not the only coordinate systems that could be used.  There is also the problem that the Earth's spherical representation makes it difficult to accurately represent both area and shape when working on a flat surface (screen or paper).  For this reason different projections are available that convert the spherical reality of the Earth's surface to the flat representation of our screens and atlases and, similar to choosing an appropriate visualisation for graphic data, choosing a suitable projection for mapping data can be important.  So, simply plotting the latitude and longitude position on a grid is not sufficient: a mathematical transformation to the chosen projection is required.  These and other complex issues (handling height above sea level, mapping areas as well as points) are concerns in mapping and GIS.

In this introduction we're going to take a GIS-lite approach to maps.  We're going to use a library that has a single projection, and uses 'raw' latitude and longitude values for points on the map. We're also using maps to support the visualisation of data (rather than for capturing data), so we're interested in how to present data on the maps.

There are typically three - or four, depending how you count them - ways of representing data on a map:

- points, or markers
- lines
- regions, or shapes.

The _fourth_ way is to use proportional symbol markers - that is, markers such as circles whose size is proportional to some quantity relevant to the marker location.

There are several libraries available that can help us work with maps. One approach is to generate base maps using graphic visualisation libraries such as matplotlib to form quick, static sketch maps. If we require interactive JavaScript style maps, we can use the mplleaflet library to convert these static map representations to embedded interactive ones using the leaflet JavaScript library. Another is to use a package such as *folium* to create a leaflet map directly.

We'll look at ways of representing location data with the _folium_ package, which integrates Python and _pandas_ data handling interactive, leaflet JavaScript, maps.  folium handles a lot of the underlying details, letting us get on with presenting our location-based data.  

The use of a package like folium is a compromise between the full power of the packages it uses and the flexibility and complexity of the features you can use in the full leaflet library.  

We won't be working directly with leaflet or matplotlib maps in this module.

In [None]:
import folium
import pandas as pd

####  A note on the need for an internet connection when working with folium output
folium doesn't embed the map tiles (the graphic used as the background display on a map) into the HTML document when you create the map.  It is not until the HTML is used to display the map that the tile set is retrieved.  This keeps the map files small, but means you cannot display a map if your are offline.    A network connection is required to retrieve the map tiles, so if you are not connected to the internet you cannot display the map in the HTML document.


It is possible that a map will display if the tiles you require are held in the cache. However, if you move or zoom on that map it is unlikely the cache will contain the required tiles and the display will be blanked or corrupted.

#### A note on the location of the output files
folium generates maps as `HTML` files, which are displayed in the Notebook; by default, the Notebook's own folder is used. This can get a bit untidy.  So, for these examples there is a folder `map-output-files`, in the usual `data/` folder, which is included in the map `path` parameter (which names the `HTML` output file) for each of the maps created in this Notebook.

You're obviously free to save your maps wherever you want but, in keeping with the aims of the module, do try to _manage_ them, rather than just _collect_ them!

# A simple map

Let's start by creating a simple map, supplying a default location for the original centre of the map, and setting the size in pixels used for the map widget display.

When you run the cell, an interactive map will be rendered as the cell output. Note that a map will not be displayed if you are offline: a network connection is required to retrieve the map tiles.

If you put your mouse over the map and scroll, you will see that the map view is zoomable.

In [None]:
# This map is centred on the Open University campus, Walton Hall.
# The location values are the latitude and longitude values for the location of interest.
# The default is to use the OpenStreetMap tiling (how the background looks).
map_OU = folium.Map(location=[52.024696,-0.710710], width=960, height=500)

# The map object returns an embedded interactive HTML map in an iframe
map_OU

We can save an HTML file that will render this map as a standalone file:

In [None]:
#First, check the output directory path exists and create it if it doesn't:
!mkdir -p map-output-files

#Now save the HTML file
map_OU.save('map-output-files/OU.html')

We can further neaten this map: the Walton Hall campus is barely visible at this level of detail, and the map is a bit big for this demonstration. So let's open the map to a rather more zoomed in location using the `zoom_start` parameter to specify the level of detail to show when the map is first opened,
 
In the zoom controls, `0` shows a whole world view, 18 appears to be the highest zoom level allowed. You can also limit the maximum level of zoom allowed using ``max_zoom` parameter.

In [None]:
map_OU2 = folium.Map(location=[52.024696,-0.710710], width=600, height=300,
                     zoom_start=14, max_zoom=16)

map_OU2

# Adding markers to a map

We can make use of markers to show everyone what the map contains, and focus on any specific points of interest, for example by adding markers to the map:

In [None]:

# Add a marker at the location we want
folium.Marker([52.024697,-0.710711]).add_to(map_OU2)

map_OU2

Simply having a marker on it's own isn't very informative. We can include pop-up text for the marker, which is displayed if you click on the marker:  

In [None]:
 # Set the marker popup text using: popup='our text here'
folium.Marker([52.025638,-0.702336], 
              popup='The East Campus of the Open University').add_to(map_OU2)

map_OU2

### Finding latitude and longitude values for locations
If you are manually building a map, as we have done above, you can find the latitude and longitude values interactively using a service such as [latlong.net](https://www.latlong.net/) or [getlatlong.net](https://getlatlong.net/). Alternatively, using Google Maps, right click on a desired location and select "What's here?" to pop up a card at the bottom of the screen showing the corresponding co-ordinates..

You can also find a lot of GPS-information on many electronic devices and device outputs.  If you have a GPS enabled mobile phone it's quite possible that any data you generate from it will be tagged with latitude and longitude information, and the metadata on photographs can include location. 
 
Sometimes you will find datasets that contain latitude and longitude values as column items. In other cases, you may be presented with a geodata file, such as a _KML_ file or _geojson_ file, that includes semantically declared geographical data points. These allow you to ingest the data files and use these values directly.

In other cases, you will have to _geocode_ the data from an address or other form of location, converting it into latitude and longitude values. One common approach is to use postcode data and a lookup service such as such as [postcodes.io](https://postcodes.io/). This site provides an API that allows you to look up UK postcode information programmatically; it also publishes an interactive map that lets you identify lat/long co-ordinates more generally (https://postcodes.io/explore). Postcode lookups typically return the centroid of the area covered by the postcode which allows us to plot a marker within several tens or hundreds of meters of the actual location of an address.  We'll use a lookup table of postcode values in a later example.

You can also use folium itself to give you the latitude and longitude values when you click on a map.

In [None]:
# You can switch on the ability to click anywhere on your map and have 
# folium display the location data, similar to the touchmap service.

folium.LatLngPopup().add_to(map_OU2)

map_OU2

## More interesting markers

We can also put circle markers and polygon markers on our maps.
We can control the radius, colour, outline colour and opacity of the markers drawn.  A polygon marker is a multi-sided shape (triangle, square, pentagon, etc.).  And we can mix these on the same map.

In [None]:
map_OU3 = folium.Map(location=[52.024696,-0.710710],
                    zoom_start=14, max_zoom=16)


# Simple markers
folium.Marker([52.024697,-0.710711]).add_to(map_OU3)
folium.Marker([52.025638,-0.702336], 
              popup='The East Campus of the Open University').add_to(map_OU3)


# Circle markers 
folium.CircleMarker(location=[52.0275,-0.7104], radius=40,
                    popup='The Cricket Pitch at the Open University', 
                    color='#3186ec', fill_color='#3186ec',
                    fill_opacity=0.6).add_to(map_OU3)

folium.Circle(location=[52.0275,-0.7104], radius=40,
                    popup='Within the Cricket Pitch at the Open University', 
                    color='red', fill_color='#3186ec',
                    fill_opacity=0.6).add_to(map_OU3)

# Polygon markers
folium.RegularPolygonMarker(location=[52.0250,-0.7060], 
                            popup='Visitor car park',
                            color='#2563e2', fill_color='#132b5e', 
                            fill_opacity=0.5,
                            number_of_sides=3, radius=20, rotation=20).add_to(map_OU3)

folium.RegularPolygonMarker(location=[52.0244,-0.7107], 
                            popup='3 Disabled parking spaces',
                            fill_color='yellow', 
                            number_of_sides=6, radius=20, rotation=10).add_to(map_OU3)

# Note the 'color' values are either hexadecimal colour values, 
# or a standard set of colour names which include: red, blue, 
# green, yellow, orange, grey, black and white.

map_OU3

Note what happens to the size of the markers if you zoom the map in and out.

In the case of `folium.CircleMarker()` [docs](https://python-visualization.github.io/folium/modules.html#folium.vector_layers.CircleMarker), circles are produced using an absolute pixel radius. As the map is zoomed in and out, the size of the blue `folium.CircleMarker` circle remains fixed while the map zoom level changes below it.

In the case of `folium.Circle()` [docs](https://python-visualization.github.io/folium/modules.html#folium.vector_layers.Circle), the circle radius in meters on the ground. Defined using `folium.Circle`, the red circle scales along with the zoom level of the map so that it covers the same represented area on the map.

# Using proportional sized markers
Simple markers are fixed in size, but the other markers allow some variation in size. 

The _circle_ and _polygon_ markers have a `radius` parameter. This allows the size of the marker symbol to be set proportional to some quantity - the maps are then known as proportional symbol maps. A simple use for this might be to show the relative size of the car parks on the OU campus.

If we used the `folium.Circle` (rather than the `folium.CircleMarker`), then the representation of the proportional mark 'on the ground' will be fixed relative to the underlying map as we zoom the map in and out.

Here's how to do this using a DataFrame holding the latitude and longitude values, and the number of parking spaces.

First, let's create the dataframe:

In [None]:
carparking_df = pd.DataFrame({'latitude': [52.026, 52.0253, 52.0266, 52.0242, 52.0233],
                              'longitude': [-0.7065,-0.7052, -0.7112, -0.7132, -0.7107],
                              'absolutesize': [100, 90, 10, 50, 40]})
carparking_df

We can now create a simple function to add a marker to a map.

Note that we can also define additional parameters that can be passed explicitly into the function from the dataframe `.apply()` method.

In [None]:
def add_marker(row, fmap):
    """Add a marker to a specific map."""
    
    lat = row['latitude']
    lon = row['longitude']
    numspaces = row['absolutesize']
    
    folium.Circle(location=[lat, lon], radius=numspaces,
                  popup=f'Car park with {numspaces} spaces.', 
                  color='orange', fill_color='yellow',
                  fill_opacity=0.8).add_to(fmap)

If we create a base map, we can then call the `add_marker()` function for each row in the carparks dataframe using the `.apply()` method. Observe how the declaration of an additional parameter not contained in the row allows us to pass the map object into the `add_marker()` function.

In [None]:
# Create our base map, centred on Walton Hall,
OU_carparks = folium.Map(location=[52.024696,-0.710710],
                         zoom_start=16, max_zoom=18)
# with a simple marker to say where it is.
folium.Marker([52.024696,-0.710710], 
              popup='The Open University, Walton Hall').add_to(OU_carparks)

# Handle car parks...
# We want to apply a function to each row of the dataframe so use the axis=1 setting 
carparking_df.apply(add_marker, fmap=OU_carparks, axis=1)

OU_carparks

Note that what's missing in the folium mapping is the ability to show a legend indicating how the marker relates to location. We'd really like to add some text saying 'The largest car park has 100 spaces, the smallest 5.' or some other description. (If you have a recipe to do this, please share it on the module forum or in OpenStudio. An approach described [here](https://gitmemory.com/issue/python-visualization/folium/1202/527971941)) shows how to add a title to a map, sort of!)

### Note
In this example the absolute size of the car parks fit our chosen map size.  However, imagine the OU also has a mega car park with 1000 spaces; using the absolute size for the marker radius would result in a circle marker the size of the map!

In this situation you would need to scale the circle sizes to something more manageable, or use a binning approach to choose circle radius (say four bins with the threshold values [50,200,500]).

If you're interested in how to do these (in a handy, reusable, data preparation way) I've included short examples of scaling and binning after the _Whats next?_ section below. (It's always good to collect data recipes for later use.)

## Putting lines on maps

Lines, or more usually sequences of connected lines, are sometimes useful on a map.  For example, tracing a route from location to location or connecting related points can be shown with lines. folium simply takes a list of coordinates and plots a straight line between them. Pop-up text is also permitted on folium lines.

However, there are limitations to using straight lines on maps - they can be misleading.

As a 2-dimensional (flat) map represents a projection from the 3-dimensional surface of the Earth, there will be some distortion depending on which projection is chosen.  The result will be that, for example, two lines of the same visible length may represent different distances depending on their being near the poles or the equator, or when used parallel to latitude or longitude great circles.  Also, as the scale increases the distortion due to the curve of the Earth may becomes more noticeable, resulting in lines that clearly don't follow the great circles (airline routes shouldn't look as if they take the straightest/shortest/most direct route when shown on most 2D projections, as on the surface of the Earth the shortest distance is a 3D curve, for example).  We won't be looking at how to adapt our maps for these distortions - but don't forget these limitations if you explore GIS systems further.

In [None]:
# Our base map.
map_OU4 = folium.Map(location=[52.024696,-0.710710],
                    zoom_start=14, max_zoom=16)
# Some simple markers.
folium.Marker([52.024696,-0.710710],
              popup='Walton Hall, The Open University').add_to(map_OU4)
folium.Marker([52.025638,-0.702336],
              popup='The East Campus of the Open University').add_to(map_OU4)

# Now plot a connected series of lines, showing the walking route between the above locations
# that follows footpaths and underpasses.

# The locations parameter for line() is a list of location points.

folium.PolyLine(locations = [[52.025638,-0.702336], [52.0249,-0.7020], [52.0246,-0.7035],
                          [52.0270,-0.7056], [52.0265,-0.7072], [52.0243,-0.7062],
                          [52.0237,-0.7101 ], [52.0245,-0.7107], [52.024696,-0.710710]], 
                        color='red', weight=10, opacity=0.5,
                        popup='safe route').add_to(map_OU4)

map_OU4

### If we can draw lines, we can draw boundries ...
#### ... and if we have boundaries we can show areas

A boundary around an area is simply a sequence of lines (or curves) that form a closed polygon (multi-sided space).  In our line example above, adding the first point to the end of the list of locations will 'close' the shape.  

Boundaries are useful as they can show the extent of an area, such as a postcode region, or an electoral or census boundary, or school catchment area, for example - some area of interest.  We'll see, in the next example, how to use GeoJSON files (specialist JSON formats) of boundaries to display them on a folium map.

Once we have an area we can relate statistical and other data to that area. Then we can label or shade the areas to summarise data across the map. A range of UK statistical, political and other boundaries are available in geoJSON,  shapefile (`.shp`) and KML formats from sources such as [data.gov.uk](https://data.gov.uk/data/search?theme-primary=Mapping&res_format=GeoJSON), [Edina](http://census.edina.ac.uk/bds.html) and the [ONS](http://geoportal.statistics.gov.uk/).

Maps that use bounded areas on which are shown statistical and other data (either with areas and markers, or shading of areas) are known as *choropleth maps*.

In [None]:
# We start with our standard map.
map_MKPostcode = folium.Map(location=[52.024696,-0.710710], zoom_start=10, max_zoom=16)

# Then overlay a GeoJSON formatted file using the geo_json() call.
# In this case we've used the geo_data parameter to say where to find the GeoJSON file
# and used the default settings for the other parameters (we'll explore these in a later example.)

# This boundary map shows the postcode district boundaries for Milton Keynes
# (the MKxx bit of the Milton Keynes postcodes).

map_MKPostcode.choropleth(geo_data='data/mk.geoJSON.txt')

map_MKPostcode

#  A short worked example

The remainder of this Notebook works through a short example using postcode mapping information, using both marker-based information and then reworking this as a choropleth map.  It starts with data acquisition and preparation and finishes with appropriate mapped output. 

We can think of it as working through an exploration of the following scenario (although we don't offer any 'findings' based on the maps produced).

### The scenario and data capture 
The government compiles data on driving licences which is published under the OGL (Open Government Licence) at https://data.gov.uk/dataset/driving-licence-data.  These are snapshot data files, created at specific points in time, rather than representing real-time data.   The March 2015 spreadsheet bundle includes the DRL0102 dataset containing the number of Provisional and Full driving licences by postcode district.		

I wondered if there was a significant difference in the balance of Full and Provisional licences in the areas in and around Milton Keynes (specifically the MKxx postcode districts).  And, since I'm looking at folium capabilities, I thought I'd see if I could map these.     _The DRL0132 spreadsheet contains the number 'Penalty Points' by postcode district, which might make an interesting second study._

There is a copy of the March 2015 driving licence data in the `data` folder for this Notebook, `Driving-licence-data-Mar15.xls`.  To avoid working with the full spreadsheet, and all six data sheets it contains, I've copied the _MKxx_ postcode district data to a new spreadsheet file `MKxx DRL0102 Driving-licence-data-Mar15.xls`; I've left the header information in the new file.

You can examine both these files in a spreadsheet tool, or via OpenRefine.

In [None]:
!ls data/

### The data preparation
A quick look at the _MKxx_ data will tell you that the spreadsheet data has 27 metadata header lines before we reach the column headers for the data we are interested in. Row 28 has the data header line, and rows 29-54 have the data rows.  This should give us sufficient information to import the spreadsheet data into a DataFrame, and then we can take a quick look at it.

In [None]:
licences_df = pd.read_excel('data/MKxx DRL0102 Driving-licence-data-Mar15.xls',
                            skiprows=27)[:(54-28)]

licences_df.head(5)

A quick check with the original file shows we've got all the rows and columns.
By inspection, the data looks sensible: no odd values or strings in the location cells, sensible column names and the districts are as we expected.

Since I'm interested in the relative number of full to provisional licences I'll add the percentage of full licences for each district.

In [None]:
# For each row calculate the full total as a percentage 
# of the overall total and add this as a new column:
licences_df['fullpercentage'] = (licences_df['Full - Total']/ 
                                (licences_df['Provisional - Total']+
                                         licences_df['Full - Total'])) * 100
licences_df.head(5)

This dataset uses postcode district for location information, but folium uses latitude and longitude. That means we need to find a usable conversion from postcode district to latitude and longitude.  This is always going to be an approximate conversion as a postode district is an 'area' of irregular shape and size, and latitude and longitude is 'point' data.

A quick hunt via a search engine identified the _freemaptools.com_ site which has a CSV file containing postcode districts (aka 'outcodes') and latitude and longitude values. I've copied this file, `postcode-outcodes src freemaptools 18-10-15.csv`, to the `data/` folder. 

A quick look with OpenRefine shows the first line in the file is the header line.  So, let's ingest the file and take a look.

In [None]:
outcodes_df = pd.read_csv('data/postcode-outcodes src freemaptools 18-10-15.csv')
outcodes_df.head(10)

Since I'm only interested in the MK districts, I went back to the spreadsheet at this point and simply removed all the non _MKxx_ district rows, and saved the result in a separate file (`data/postcode-outcodes MKxx src freemaptools 18-10-15.csv`). I could have done this programmatically - and would have done if this was going to be a regular task - but for a one-off exercise the fastest tool was the interactive spreadsheet.  So pragmatism won out!

In [None]:
mk_outcodes_df = pd.read_csv('data/postcode-outcodes MKxx src freemaptools 18-10-15.csv')
mk_outcodes_df.head(5)

The values in the latitude and longitude cells look sensible, but the outcodes are in a different format (or at least some of them are): MK1 should be MK01 to match the driving licence data.
For such a small batch of data, and the one-off nature of the exercise, I could do this by hand.
But instead let's use regular expression pattern matching to correct this.

In [None]:
# The following looks for the pattern: MK followed by one digit and the end of the string.
# It then does an inplace replacement putting MK0 followed by the matched digit. 

mk_outcodes_df.replace('(MK)(\d)$', r'MK0\2',
                       regex=True, inplace=True)
mk_outcodes_df.head(5)

We can now form an equality inner join of the two datasets, using the `District` and `outcode` columns
as the joining column.

In [None]:
licence_and_location_df = pd.merge(licences_df, mk_outcodes_df, 
                                   left_on=['District'], right_on=['outcode'] )
licence_and_location_df.head(5)

## Now let's map this data
We want a zoomable map, centred on the MK region, with a marker for each postcode District location.

That sounds like a variation of the car park pop-up recipe I used earlier. I'll use that as a template by cutting and pasting it into the next cell, and editing it for this purpose (the following is the end result of that work process).

In [None]:
# Create our base map, centred on Milton Keynes:
licenceMap = folium.Map(location=[52.0218,-0.7599], zoom_start=10, max_zoom=15)

def add_license_marker(row, fmap):
    """Add a marker to a specific map."""
    
    lat = row['latitude']
    lon = row['longitude']
    district = row['District']
    
    totalProv = row['Provisional - Total']
    totalFull = row['Full - Total']
    fullPercentage = round(row['fullpercentage'],1)
 
    # Generate a caption using HTML, embedded within an iframe inside the popup
    html="""
        <h1> District: {}</h1>
        <ul><li>Full#: {}</li>
        <li>Provisional#: {}</li> 
        <li>Full%: {}%</li></ul> 
    """.format(district, totalFull, totalProv, fullPercentage)
    iframe = folium.IFrame(html=html, width=400, height=200)
    popup = folium.Popup(iframe, max_width=800)
    
    folium.Marker(location = [lat, lon], popup=popup).add_to(fmap)

# We want to apply a function to each row of the dataframe so use the axis=1 setting 
licence_and_location_df.apply(add_license_marker, fmap=licenceMap, axis=1)

licenceMap

Actually this looks wholly unsatisfactory - we've taken area data about a postcode district and mapped it to point markers, but we've no sense of the areas to which the markers relate. We also have to click on each marker to see the associated values - this doesn't give any sense of the variations between each area.

This looks like a good case in which to use choropleth mapping: we can show the boundary for each postcode area, then shade each postcode area based on the full licence percentage values.

### What do we need for a choropleth map?
We need a shape file for the postcode district boundaries - which is handy, as the `mk.geoJSON` file  is just that.

We need the data used to shade our map - well, we have the DataFrame `licence_and_location_df` which has the percentage of full licences in each postcode district. So, it looks like we're ready to go.

A choropleth layer element is defined using the `folium.Choropleth()` function, and this layer is then added to the map in much the same we add a marker to a map (using the element's `.add_to()` method).

In the following example, the `geo_data` is the same file we used earlier — `mk.geoJSON`, but now we need some additional parameters:

- `data` is the name of the DataFrame with the mappable data items in it;
- `columns` are the two columns that are used to determine the plotted area (we use the outcode value);
- `key_on` tells the choropleth mapping function which property in the GeoJSON files data structure need to join with the 'outcode' column to determine the colour used to fill each area using the `fullpercentage` value; (open the GeoJSON file in OpenRefine, or other tools capable of reading JSON formats, to identify the property name to use);
- and finally we have the fill colour palette, fill colour opacity and legend for the rendering.

The colour scheme is based the following *ColorBrewer* `colormap` scales: `BuGn`, `BuPu`, `GnBu`, `OrRd`, `PuBu`, `PuBuGn`, `PuRd`, `RdPu`, `YlGn`, `YlGnBu`, `YlOrBr`, `YlOrRd`, `RdYlBu`.

In [None]:
# Create our base map, centred on Milton Keynes.
LicenceMap1 = folium.Map(location=[52.1,-0.7599], zoom_start=10, max_zoom=15)

choropleth = folium.Choropleth(geo_data='data/mk.geoJSON.txt',
                               data=licence_and_location_df, 
                               columns=['outcode', 'fullpercentage'], 
                               key_on='feature.properties.Name',
                               fill_color='YlGnBu', fill_opacity=0.4, line_opacity=0.9,
                               legend_name='Full licence percentages' )

choropleth.add_to(LicenceMap1)

LicenceMap1

With a low value for opacity you get the underlying map details and colour showing through, which makes the relationship between the map colouring and the legend colours hard to gauge, particularly with some colour schemes (*that is, color map as assigned to `fill_color`*).

By increasing the opacity, more solid colours are possible - however, you do lose access to the detail underlying the colouring if it is too opaque. Changing the colormap used for the coloured areas (via the `fill_color`) may also make the map easier - or harder - to read as a choropleth map.

In [None]:
LicenceMap2 = folium.Map(location=[52.1,-0.7599], zoom_start=10, max_zoom=15)

choropleth = folium.Choropleth(geo_data='data/mk.geoJSON.txt',
                               data=licence_and_location_df, 
                               columns=['outcode', 'fullpercentage'], 
                               key_on='feature.properties.Name',
                               fill_color='BuGn', fill_opacity=1, line_opacity=0.9,
                               legend_name='Full licence percentages' )

choropleth.add_to(LicenceMap2)

LicenceMap2

If you want to retain the transparency level, but can't find a colour scheme that works effectively against the colouring of the map background (which is to say, the tileset used to render the map), you might want to use a different tileset.

This can be done by passing an appropriate argument to the `folium.Map()` statement.

In [None]:
#The following tilesets are available without requiring an API key
#OpenStreetMap - the default
#Mapbox Bright or Mapbox Control Room (limited levels of zoom for free tiles), 
#Stamen (Terrain, Toner, and Watercolor)
#CartoDB (positron and dark_matter)

LicenceMap3 = folium.Map(location=[52.1,-0.7599], zoom_start=10, max_zoom=15, tiles="Stamen Toner")

choropleth = folium.Choropleth(geo_data='data/mk.geoJSON.txt',
                               data=licence_and_location_df, 
                               columns=['outcode', 'fullpercentage'], 
                               key_on='feature.properties.Name',
                               fill_color='BuGn', fill_opacity=0.3, line_opacity=0.9,
                               legend_name='Full licence percentages' )

choropleth.add_to(LicenceMap3)

LicenceMap3

Using this map style in particular emphasises up an odd, uncoloured section in the middle of the map! Something went wrong here.  

It turns out that I should have taken a closer look at the `mk.geoJSON` file - it uses the single-digit versions (MK1, MK2) of the MK01, MK02, etc. district names.  So when `.geo_json()` joined the `outcode` values and the `properties.Name` values it lost the MK1 to MK01 relationships and then had no `fullpercentage` value to use to colour these map areas.

In [None]:
import json
with open('data/mk.geoJSON.txt') as data_file:    
    jj = json.load(data_file)

areas= [f['properties']['Name'] for f in jj['features']]

print('GeoJSON keys: {} \nData keys: {}'.format(','.join(areas),
                                                 ','.join(licence_and_location_df['District'].unique()) ) )

We have some more data preparation to do!

We need the original `outcode` values to join with the GeoJSON file values.  We also don't need the postcode-to-location dataset, since we're using the GeoJSON mapping information for our display. So, the licence information alone is sufficient for the choropleth map requirements.

So reusing the recipe from above, and adjusting for the new data file and preparation requirements:

In [None]:
# Reload the driving licence data for MK
licences2_df = pd.read_excel('data/MKxx DRL0102 Driving-licence-data-Mar15.xls',
                             skiprows=27)[:(54-28)]

# For each row calculate the full total as a percentage of the overall total 
#    and add this as a new column:
licences2_df['fullpercentage'] = (licences2_df['Full - Total']/ 
                                 (licences2_df['Provisional - Total']+
                                           licences2_df['Full - Total'])) * 100
licences2_df.head(5)

We can add the adjusted District values as an outcode value for each row:

In [None]:
licences2_df['outcode'] = licences2_df['District'].replace("(MK)(0)(\d)$", r'MK\3',
                                               regex=True, inplace=False)

licences2_df.head(5)

Now create an updated map using the newly formatted outcode:

In [None]:
# Create our base map, centred on Milton Keynes.
LicenceMap4 = folium.Map(location = [52.1,-0.7599],
                    zoom_start=10, max_zoom=15, tiles="Stamen Toner")

folium.Choropleth(geo_data='data/mk.geoJSON.txt',
                       data=licences2_df, 
                       columns=['outcode', 'fullpercentage'], 
                       key_on='feature.properties.Name',
                       fill_color='BuGn', fill_opacity=0.5, line_opacity=0.9,
                       legend_name='Full licence percentages' ).add_to(LicenceMap4)

LicenceMap4

One other thing to note is that the legend shows the data packed into *six* separate bins, sized by default. We can override this default by defining up to __five__ bins of our own. This is done by listing up to *six* threshold values for each colour change used in the shading.

The maximum value of the specified  range must be equal to or greater than the maximum value in the dataset to be plotted, although the minimum value may be set to any lesser value.

It is also worth noting that in version 0.5.0 of `folium` (as installed for the 2018J presentation of TM351), if we do use a custom colormap, any data values that are not defined do seem to be coloured using the lowest colour value. SO BE CAREFUL!

In [None]:
# Create our base map, centred on Milton Keynes.
LicenceMap5 = folium.Map(location = [52.1,-0.7599], zoom_start=10, max_zoom=15)

# This is the same map as plotted above, but with a 7 bin colour scale 
# and a Red-Yellow-Blue colour scale.

choropleth = folium.Choropleth(geo_data='data/mk.geoJSON.txt',
                               data=licence_and_location_df, 
                               columns=['outcode', 'fullpercentage'], 
                               key_on='feature.properties.Name',
                               fill_color='RdYlBu', fill_opacity=0.4, line_opacity=0.9,
                               legend_name='Full licence percentages', 
                               #The maximum value must be set at least equal to the maximum data value
                               threshold_scale=[-50, -10, 70, 80, 90, 100])

choropleth.add_to(LicenceMap5)

LicenceMap5

In the above map, the red coloured items, which the legend suggests fall inside the -50 to -10 range, are the original missing values.

Setting bounds inappropriately can also lead maps that don't show the information that you need.

## What next?

This Notebook contains a basic introduction to mapping using folium - a Python library that allows you to create interactive maps using the leaflet JavaScript library and map tiles served from OpenStreetMap. If you have a pressing need to create maps as part of your data investigations *right now*, it provides you with a starting point (but don't get too distracted!)

If you are working through this Notebook as part of an inline exercise, return to the module materials now. If you are working through this set of Notebooks as a whole, move on to Notebook `05.3 Getting started with matplotlib.`

# Optional material

#### From KML to GeoJSON
If you find a repository of KML files you'll find that they can be very large. One reason the http://census.edina.ac.uk/bds.html site is useful is that it allows you to do some narrowing down to the area you want to map without having to do so manually.

So my approach to generating the MK postcode boundaries file was to start at the above site, choosing the _England_, _Postal_, _2012_ boundary data and using the _List Area_ feature to choose just the Milton Keynes data.  When I'd picked the data I wanted, I used the _format_ tab to select KML format for the output file.  Once I'd downloaded the output file I needed to convert the KML to GeoJSON.

The interactive service http://ogre.adc4gis.com/ does this; the output from this is displayed to your browser page but it is possible to save this to a file.  A quick check with OpenRefine (open the file as a JSON format file) shows this parsed correctly, and allowed me to look for the property name used for the Districts - in this case `properties.Name`.

All this is very manual and rather clunky; but for this investigation/example it worked, and didn't take me more than 30 minutes of web searching to find the repository and the ogre service, with a couple of false starts.  Next time it'll take less than 5 minutes if the repository contains the boundary data I need.

If you're familiar with other repositories, or with converting boundary data (either manually, with services or with Python), why not share locations and recipes on the module forum or in OpenStudio for others to see.

### Using proportional markers: takes 2 and 3

Earlier we used proportional markers based on the absolute count of car park spaces.  We also suggested that in most cases you would either need to scale, or bin, data to create markers, especially when absolute values might overwhelm the map.

This is not particularly a mapping technique, but is a good example of two short data-preparation recipes and how mapping can then build on top of properly shaped datasets.

The following example re-builds the DataFrame used earlier, but we've adjusted the absolute size of the car parks, to include one very large one.

In [None]:
carparking2_df = pd.DataFrame({'latitude': [52.026, 52.0253, 52.0266, 52.0242, 52.0233],
                               'longitude': [-0.7065,-0.7052, -0.7112, -0.7132, -0.7107],
                               'absolutesize': [1000, 255, 158, 50, 40]})

carparking2_df

Next we can see what a mess this makes of our map if we try to use absolute values as we did earlier.

In [None]:
# Create our base map, centred on Walton Hall.
OU_carparks2 = folium.Map(location=[52.024696,-0.710710], zoom_start=1, max_zoom=16)

# Add a simple marker to say where we are
folium.Marker([52.024696,-0.710710], popup='The Open University, Walton Hall').add_to(OU_carparks2)

#Create a simple function to add the marker
def add_proportional_circle_marker(row, fmap):
    """Add a proportional circle marker to a folium map."""
    
    lat = row['latitude']
    lon = row['longitude']
    numspaces = row['absolutesize']
    
    # The radius is in pixels, using the absolute value directly,
    folium.CircleMarker(location=[lat, lon], radius=numspaces,
                        popup=f'Car park with {numspaces} spaces.', 
                        color='#3186ec', fill_color='#3186ec',
                        fill_opacity=0.8).add_to(fmap)
    
carparking2_df.apply(add_proportional_circle_marker, fmap=OU_carparks2, axis=1)

OU_carparks2

#### Reshaping the data set with new columns
To our `carparking2_df` DataFrame we can add a column for scaled proportional data values, and then we'll add one for binning values.

The following  recipe scales the absolute sizes into a new range, which I've chosen to have a maximum value of 100 and a minimum value of 10 (you could adjust these by replacing the values in the code below).

The code doesn't handle end conditions where you set the maximum and minimum values to the same value.

In [None]:
import numpy as np

# Find the actual range
maxAbsoluteSize = carparking2_df['absolutesize'].max()
minAbsoluteSize = carparking2_df['absolutesize'].min()
rangeAbsoluteSize = (maxAbsoluteSize - minAbsoluteSize)

# Specify the desired range
maxScaledSize = 100
minScaledSize = 10
rangeScaledSize = (maxScaledSize - minScaledSize)

# Now add the scaled values to the carParking2_df DataFrame as a new column:
carparking2_df['scaledsize'] = ( (((carparking2_df['absolutesize'] - minAbsoluteSize) * 
                                   rangeScaledSize)
                                               / rangeAbsoluteSize) + minScaledSize)

# If the above code is correct, the largest absolute size will now have scaledsize of 100,
# and the smallest absolute size will have scaledsize 10. A quick check:
carparking2_df

Whilst the above code works, it is worth noting that this is quite a common transformation to want to apply, and that thoroughly tested versions of the code are available in other third party data handling libraries.

For example, the `sklearn.preprocessing` package includes a `MinMaxScaler` function for implementing just this sort of scaling operation:

In [None]:
from sklearn.preprocessing import MinMaxScaler

# Create the scaler setting the desired scaled range limits
scaler = MinMaxScaler(feature_range=(minScaledSize, maxScaledSize))

# Fit the scaler to our data (i.e. acsertain the current range limits)
scaler.fit(carparking2_df[['absolutesize']])

# Scale the data
scaler.transform(carparking2_df[['absolutesize']])

# Insert the scaled data into a new column
carparking2_df['quick scaledsize'] = scaler.transform(carparking2_df[['absolutesize']])
carparking2_df

Now let's add a column with bin numbers based on the car park size.

Previously, we suggested bins with boundaries [50, 200 and 500]. We can specify a lower bound (0) and an arbitrary upper bound (10000) to this Series of bin thresholds.

We can use the *pandas* `cut()` function to assign values to bins directly:

In [None]:
carparking2_df['binvalue'] = pd.cut(carparking2_df['absolutesize'],
                                    bins=[0, 50, 200, 500, 1000],
                                    labels=[1, 2, 3, 4])
carparking2_df

Now we can map the chosen values, using either the scaled size, or a function of bin number, to determine the radius of the proportional markers.

In [None]:
# Map using circle markers based on scaled bin number
# Create our base map, centred on Walton Hall.
OU_carparks3 = folium.Map(location=[52.024696,-0.710710],
                     zoom_start=16, max_zoom=18)
# Add a simple marker to say where it is.
folium.Marker([52.024696,-0.710710], 
                           popup='The Open University, Walton Hall').add_to(OU_carparks3)


#Create a simple function to add the marker
def add_proportional_circle_marker2(row, fmap):
    """Add a proportional circle marker to a folium map."""
    
    lat = row['latitude']
    lon = row['longitude']
    numspaces = row['absolutesize']
    binnumber = row['binvalue']
    
    folium.Circle(location=[lat, lon], radius=(binnumber * 10),
                  popup=f'Car park with {numspaces} spaces.', 
                  color='red', fill_color='orange',
                  fill_opacity=0.8).add_to(fmap)
    
carparking2_df.apply(add_proportional_circle_marker2, fmap=OU_carparks3, axis=1)

OU_carparks3