# Mapping accidents

This Notebook will take you through the basics of plotting geospatial data retrieved from a MongoDB collection onto a map.

First, we need to include some essential Python packages for connecting to the MongoDB and reshaping the data we might retrieve from it. We're also going to work with time as well as location data...

In [None]:
# Standard imports
import pandas as pd

import datetime

import seaborn as sns

The `folium` package, you might recall, is a package for generating interactive, embedded map displays.

In [None]:
import folium

We also need to set up a connection to the MondoDB in the usual way:

## Setting up the document database 

In the notebooks for parts 14, 15 and 16, you will be using a document database to manage data. As with the relational database you looked at in previous sections, the data in the database is *persistent*. The document database, MongoDB, is described as "NoSQL" to reflect that it does not use the tabular format of the relational database to store data. However, many of properties of a formal RDBMS apply to MongoDB, including the need to connect to the database server.

As with PostgreSQL, the MongoDB database server runs independently from the Jupyter notebook server. To interact with it, you need to set up an explicit connection.

### Setting your database credentials

In order to work with a database, we need to create a *connection* to the database. A connection allows us to manipulate the database, and query its contents (depending on what usage rights you have been granted). For the SQL notebooks in TM351, the details of your connection will depend upon whether you are using the OU-hosted server, accessed via [tm351.open.ac.uk](https:tm351.open.ac.uk), or whether you are using a version hosted on your own computer, which you should have set up using either Vagrant or Docker.

To set up the connection, you need a login name and a pasword. we will use the variables `DB_USER` and `DB_PWD` to hold the user name and password respectively that you will use to connect to the database. Run the appropriate cell to set your credentials in the following cells.

#### Connecting to the database on [tm351.open.ac.uk](https:tm351.open.ac.uk)

If you are using the Open University hosted server, you should execute the following cell, using your OUCU as the value of `DB_USER`, and the password you were given at the beginning of the module. Note that if the cell is in RAW NBconvert style, you will need to change its type to Code in order to execute it.

The variables `DB_USER` and `DB_PWD` are strings, and so you need to put them in quotes.

In this case, note that the connection string contains an additional option at the end: `?authsource=user-data`. For the MongoDB setup that we are using here, this option tells Mongo where to look for the authentication database.

#### Connecting to the database on a locally hosted machine

If you are running the Jupyter server on your own machine, via Docker or Vagrant, you should execute the following cell. Note that if the cell is in RAW NBconvert style, you will need to change its type to Code in order to execute it.

Note that the locally hosted versions of the environment give you full administrator rights, which is why you do not need to specify a user name or password. Obviously, this would not generally not be granted on a multi-user database, unless you are the database administrator.

### Connecting to the database

We can now set up a connection to the database. As with PostgreSQL, we use a connection string:

In [None]:
print(MONGO_CONNECTION_STRING)

The connection string is made up of several parts:

- `mongodb` : tells `pymongo` that we will use MongoDB as our database engine
- Your user name and (character escaped) password, separated by a colon if you are using the remote server. If you are using a local server, you will be logged on as an adminstrator, and do not need to specify a name or password.
- `localhost:27017` : the port on which the database engine is listening.
- A reference to the authentication file (`?authsource=user-data`), if you are using the remote server.

We now connect to the database with a `pymongo.MongoClient` object.

In [None]:
from pymongo import MongoClient

In [None]:
mongo_client=MongoClient(MONGO_CONNECTION_STRING)

You should now be connected to the MongoDB database server.

## The accidents database

The accidents database takes a long time to set up, so we have already imported it into a MongoDB database so that you can work with it. Note that on the remote VCE, the database is read-only, so you will not be able to alter its contents, although you can copy the contents into your own database space as discussed in the previous MongoDB notebooks, and alter that.

The cells in the earlier section, Setting up the document database, put the name of the accidents database into the variable `ACCIDENTS_DB_NAME`. Use this value to set up the connection to the `accidents` database and collections within it:

In [None]:
accidents_db=mongo_client[ACCIDENTS_DB_NAME]

We can look at the names of the collections in the database:

In [None]:
accidents_db.list_collection_names()

We will introduce some of the different collections in the rest of the materials, but let's start with the `accidents` collection:

In [None]:
accidents_collection=accidents_db['accidents']

This collection contains information on individual accidents. We can see how many examples it contains with the `.count_documents()` method:

In [None]:
accidents_collection.count_documents({})

We will also specify the `labels` collection:

In [None]:
labels=accidents_db['labels']

We'll be plotting some charts, so increase the default plot size to make things easier to read:

In [None]:
# Set a larger plot size than the default
sns.set(rc={'figure.figsize':(11.7,8.27)})

## Plotting some accidents

To start with, let's just plot some accidents on the map to see how it's done.

To plot a location, we need to get some location information.

Review a single record from the accidents database to see whether there is any data we can use:

In [None]:
accidents_collection.find_one({})

There are various bits of information we could use to locate an accident.
    
For example, the `LSOA_of_Accident_Location` identifies the lesser super output area (LSOA) code for the geographical area in which the accident took place, which would allow use to generate something like a choropleth map identifying the number of accidents in a particular area.

For more detail, however, the OSGR Northing (`Location_Northing_OSGR`) and Easting (`Location_Easting_OSGR`) (Ordnance Survey Grid Reference), as well as the `Latitude` and `Longitude` co-ordinates, both provide us with a more exact location.

At the bottom of the document, the `loc` attribute also has a `coordinates` field returning a list that contains the latitude and longitude values rather more conveniently as a pair. (The `Point` type reference identifies the co-ordinates as a pair of values that can be used to represent a single, specific location.)

As many mapping tools work directly with latitude and longitude, let's use those.

We can grab them easily enough from an accidents document by projecting a query response onto the `loc.coordinates` field:

In [None]:
accidents_collection.find_one({}, ['loc.coordinates'])

You may recall from Part 5 how the `folium` library can be used to generate interactive maps overlaid with markers of various sorts.

The code in the following code cell will generate a map centered on a provided location at a specified zoom level using a selection of "randomly" selected accident locations.

The `NUMBER_OF_ACCIDENTS` variable determines the number of accidents retrieved from the database, and the hence the locations we'll be trying to plot on the map.

Try running the cell several times using an increasing `NUMBER_OF_ACCIDENTS` and observe what happens. At some point, you may find your browser becomes rather sluggish as it tries to cope with the number of markers you are trying to plot. (If it does, restart the notebook kernel from the notebook `Kernel` menu.)

In [None]:
NUMBER_OF_ACCIDENTS = 100

m = folium.Map(location=[50, -3], width=500,
               height=800, zoom_start=5)

for a in accidents_collection.find({}, ['loc.coordinates'],
                        limit=NUMBER_OF_ACCIDENTS):
    
    folium.Marker(location=[a['loc']['coordinates'][1],
                            a['loc']['coordinates'][0]]).add_to(m)
    
m

In general, depending on which accidents MongoDB picks, you may need to move and zoom the map to see the points.

One way of getting round this might be to reduce the zoom level so that we are presented with a broader view over the possible area in which the markers might be located.

Another issue with plotting large numbers of markers is that the display can start to get very cluttered.

In each case, as far as user interaction design goes, as well as information design, it's not ideal.

What would be neater would be if we could somehow automatically zoom into area the markers were located, as well as managing the display of the markers a little better.

For example, rather than using the default marker, we might create a small, solid colour-filled circular marker:

```python
folium.Circle(location=LATLON,
              radius=50, color = 'red', fill=True, fill_opacity=1.0)
```

One approach to centering the map is to provide a central location based on the "average" latitude and longitude locations of the sampled accidents.

So let's separate out the query that retrieves the accident locations (if you previously set the `NUMBER_OF_ACCIDENTS` to a browser killing level reduce it to a lower value now!):

In [None]:
accident_locations = accidents_collection.find({},
                                    ['loc.coordinates', 'Accident_Index'],
                                    limit=NUMBER_OF_ACCIDENTS)

accident_locations = pd.json_normalize(list(accident_locations))
# Set an appropriate index
accident_locations.set_index('Accident_Index', inplace=True)

# Split the latitude and longitude into separate columns
accident_locations = accident_locations['loc.coordinates'].apply(pd.Series)
accident_locations.columns = ['Lon', 'Lat']

accident_locations.head()

By way of finding an "average" location, we'll use the median latitude and median longitude:

In [None]:
AVERAGE_LOCATION = accident_locations[['Lat', 'Lon']].median()  # .values.tolist()
AVERAGE_LOCATION

We can now plot the map centered on the actual accidents we wish to map using the *pandas* `.apply()` function to add the markers to the map from each row of the dataframe, along with a reference to the map we wish to add the markers to.

We might also choose to use a different marker style.

In [None]:
m = folium.Map(AVERAGE_LOCATION, width=500, height=800)

def add_marker(row, m):
    folium.Circle(location=[row['Lat'], row['Lon']],
                  color = 'red', radius=50, fill=True, fill_opacity=1.0).add_to(m)

accident_locations.apply(add_marker, m=m, axis=1)

m

## Plotting Accidents on the UK Motorway Network

As you start to work with datasets, you often find that they reveal information about things that perhaps you had intentionally aimed to collect.

For example, let's see if we can generate a map of the UK motorway network based on the location of accidents selected by road type by means of the `Road_Class`.

Let's see if we can find any road class label lookups in the `labels` collection:

In [None]:
labels.distinct('label', {"label":{"$regex": ".*Road_Class.*"}})

The road class labels can be found in the `1st_Road_Class` and `2nd_Road_Class` fields.

Remember, we can run a query that "ORs" several possible conditions using the `{"$or":[CONDITION1, CONDITION2, ...]}` SELECT expression.

Let's preview the labels:

In [None]:
list(labels.find({"$or":[{"label":'1st_Road_Class'},
                         {"label":'2nd_Road_Class'}]},
                 {"label":1, "codes":1, "_id":0}))

This leaves us with a decision: do `A(M)` roads count as motorways? Let's say *yes*, and include them in the query.

But before we do that, lest we kill our browser trying to plot them all, how many accidents are there?

In [None]:
accidents_collection.count_documents({'$or': [{'1st_Road_Class': 1}, 
                                   {'2nd_Road_Class': 1},
                                   {'1st_Road_Class': 2}, 
                                   {'2nd_Road_Class': 2}]})

That's quite a lot. We don't need that many to show the idea. We can restrict the subset by date, picking just the accidents in January 2012. How many of those are there?

In [None]:
query = {'$or': [{'1st_Road_Class': 1},
                 {'2nd_Road_Class': 1},
                 {'1st_Road_Class': 2},
                 {'2nd_Road_Class': 2}],
         'Datetime': {'$lte': datetime.datetime(2012, 1, 31)}}

accidents_collection.count_documents(query)

A better number. Let's plot them.

First get the motorway accident locations:

In [None]:
motorway_locations = pd.json_normalize(list(accidents_collection.find(query,
                                            ['loc.coordinates', 'Accident_Index'])))

# Set an appropriate index
motorway_locations.set_index('Accident_Index', inplace=True)

# Split the latitude and longitude co-ordinates into separate columns
motorway_locations = motorway_locations['loc.coordinates'].apply(pd.Series)
motorway_locations.columns = ['Lon', 'Lat']

motorway_locations.head(3)

Once again, let's take the "average" location as midway point to center the map.

In [None]:
AVERAGE_LOCATION = motorway_locations[['Lat', 'Lon']].median()

Let's plot the map using this value as our central, starting location around which the map zoom is based.

In [None]:
m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=5)

motorway_locations.apply(add_marker, m=m, axis=1)

m

What other "unintended" bits of information do you think might be found from the accidents dataset, either from individual records,or records taken selectively from across the dataset as a whole? Record them below and/or post your ideas to the module forums.

*Add your own thoughts here on other unintended or unexpected bits of information available from the dataset.*

## Add Tooltips and Pop-up Labels to Markers
There's often a lot of boilerplate code associated to do with adding markers to a map, particularly if we want to add a simple tooltip (which pops up when you hover your mouse cursor over a marker) or a richer HTML styled pop-up label (which is raised if you click on a marker).

Let's remind ourselves of how to create such labels.

In [None]:
latlong = [52.0252, -0.7090]
latlong2 = [52.027, -0.7085]

m = folium.Map(location=latlong, width=500, height=800, zoom_start=15)

# Add a pop-up box to a marker
popup = '''
<strong>The Open University</strong><br/><br/>
<em>Walton Hall campus, Milton Keynes.</em>'''

folium.Marker(latlong, popup=popup).add_to(m)

# Add a tooltip to a marker
folium.Circle(location=latlong2, radius=50, fill=True, color='red', fill_opacity=1.0,
              tooltip="OU tennis courts").add_to(m)
m

One thing to note is that clicking on a small circle marker to raise a popup can be quite fiddly; it's much easier click on one of the default markers!

Before we create such pop-ups, let's remind ourselves of what information is available to us. We can then identify which fields which contain information that might be usefully provided in a pop-up legend associated with each plotted point.

In [None]:
 accidents_collection.find_one(query)

If we want to report the severity of the accident, we might use the `Number_of_Vehicles` and the `Number_of_Casualties` as part of our display as well as the road surface conditions.

Let's create a template for reporting those details:

In [None]:
report_template = '''
Number of vehicles: {veh}<br/><br/>
Number of casualties: {cas}<br/><br/>
Road conditions: {cond}'''

Here's a quick preview of the available road surface conditions:

In [None]:
road_surface_conditions = labels.find_one({'label': 'Road_Surface_Conditions'})['codes']
road_surface_conditions

We can test the template report using details from a single accident:

In [None]:
test_accident = accidents_collection.find_one(query)

# Replace the road surface condition code with a human readable label
test_accident['Road_Surface_Conditions'] = road_surface_conditions[str(test_accident['Road_Surface_Conditions'])]

report_template.format(veh = test_accident['Number_of_Vehicles'],
                       cas =  test_accident['Number_of_Casualties'],
                       cond = test_accident['Road_Surface_Conditions'])

If we define templated items using key values that appear in a `dict` used to format the template, we can more easily render the template for a set of values:

In [None]:
report_template = '''
Number of vehicles: {Number_of_Vehicles}<br/><br/>
Number of casualties: {Number_of_Casualties}<br/><br/>
Road conditions: {Road_Surface_Conditions}'''

# We can "unpack" the test_accident dict 
# and make key:value pairs available to the template
report_template.format(**test_accident)

### Activity 1

Plot the locations of all *fatal* and *serious* accidents that occurred on motorways.

HINT: you will need to identify the appropriate road classes and accident severity codes when formulating your query.

Find the road codes associated with motorways.

In [None]:
# Enter your code in this cell

Find the accident severity class / classes.

In [None]:
# Enter your code in this cell

Give the appropriate mongo query.

In [None]:
# Enter your code in this cell

Plot the map.

In [None]:
# Enter your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

We already have the required road codes from a previous query:

```Javascript
{"$or":  [{"1st_Road_Class": 1},
          {"2nd_Road_Class": 1},
          {"1st_Road_Class": 2},
          {"2nd_Road_Class": 2}]}
```

But what about the fatality / serious accident codes?

In [None]:
labels.distinct('label', {"label":{"$regex": ".*Severity.*"}})

Let's preview them:

In [None]:
list(labels.find({"$or":[{"label":'Casualty_Severity'},
                        {"label":'Accident_Severity'}]},
                 {'label':1, 'codes': 1, '_id':0}))

So the accident severity classes we need for fatal and serious accidents are 1 and 2 respectively.

We can now formulate a query to retrieve the required data and cast it into a dataframe form that we can plot the markers from:

In [None]:
query = {'$or': 
         [{'1st_Road_Class': 1},
          {'2nd_Road_Class': 1},
          {'1st_Road_Class': 2},
          {'2nd_Road_Class': 2}],
         'Accident_Severity': {'$in': [1, 2]}}

serious_accidents = pd.json_normalize(list(accidents_collection.find(query,
                                    ['loc.coordinates', 'Accident_Index'])))

# Set an appropriate index
serious_accidents.set_index('Accident_Index', inplace=True)

# Split the latitude and longitude co-ordinates into separate columns
serious_accidents = serious_accidents['loc.coordinates'].apply(pd.Series)
serious_accidents.columns = ['Lon', 'Lat']

serious_accidents.head()

As before, we can apply markers based on the co-ordinates of each location in the dataframe, once again specifying a map origin centred on an "average" location derived from the data we want to map:

In [None]:
AVERAGE_LOCATION = serious_accidents[['Lat', 'Lon']].median()

serious_m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=5)

serious_accidents.apply(add_marker, m=serious_m, axis=1)

serious_m

####  End of Activity 1

-----------------------------------

## Using Coloured Markers to Communicate Additional Information

One way of increasing the "information density" of the map is to use coloured markers to communicate additional information, such as the severity of an accident at a particular location.

In order to do this, we need to run a query that returns the accident severity value as well as the location. Let's run a search to map the severity of accidents across the *A(M)* road network:

In [None]:
query = {'$or': 
         [{'1st_Road_Class': 2},
          {'2nd_Road_Class': 2}]}

accident_severities = pd.json_normalize(list(accidents_collection.find(query,
                                                         ['loc.coordinates',
                                                          'Accident_Severity',
                                                          'Accident_Index'])))

# Set an appropriate index to stash the columns we want to preserve
accident_severities.set_index(['Accident_Index', 'Accident_Severity'], inplace=True)

# Split the latitude and longitude co-ordinates into separate columns
accident_severities = accident_severities['loc.coordinates'].apply(pd.Series)
accident_severities.columns = ['Lon', 'Lat']

# Reset the index to unpack the stashed values
accident_severities.reset_index(inplace=True)

accident_severities.head()

To visualise this data, we will plot the different accident severities using differently coloured markers.

To create the colour mapping, we can create a simple dictionary keyed by the accident severity and associate a desired marker colour. We can then trivially retrieve a colour mapping for a particular accident severity.

In [None]:
# Try to pick colours that are color blind safe
severity_map = {1: 'orange', 2: 'blue', 3: "grey"}

severity_map[1]

We can now plot the map against that coloured marker palette:

In [None]:
AVERAGE_LOCATION = accident_severities[['Lat', 'Lon']].median()

m = folium.Map(AVERAGE_LOCATION, zoom_start=5)

def severity_markers(row, m):
    """Add a marker to a map, colored by severity."""
    color = severity_map[row['Accident_Severity']]
    
    folium.Circle(location=[row['Lat'], row['Lon']],
                  color = color, radius=500, fill=True, fill_opacity=1.0).add_to(m)

accident_severities.apply(severity_markers, m=m, axis=1)
m

### Activity 2

Recall from a previous notebook that we can search for accidents within a particular local highway authority area given an area code (for example, the Milton Keynes code: *E06000042*):

```python
accidents.find({'Local_Authority_(Highway)': 'E06000042'})
```

Colour code the accidents by number of vehicles within this area (or another area of your own choosing).

Find an appropriate set of bins for the number of vehicles involved in an accident (about four should do). Plot the accidents in a region with the points colour coded to show the size of the accident.

Hint: for selecting the bins, you might categorise accident sizes as small, medium, large or very large according to the number of vehicles involved and colour them accordingly.

In [None]:
# Enter your code in this cell

#### Our solution

To reveal our solution, run this cell or click on the triangle symbol on the left-hand side of the cell.

Let's see how the accidents fare by number of vehicles for Milton Keynes.

We'll start by defining a query to search within a specified distance (5km) of the center of Milton Keynes.

In [None]:
query = {'Local_Authority_(Highway)': 'E06000042'}

accident_sizes = pd.json_normalize(list(accidents_collection.find(query,
                                                    ['loc.coordinates',
                                                     'Number_of_Vehicles',
                                                     'Accident_Index'])))

# Set an appropriate index to stash the columns we want to preserve
accident_sizes.set_index(['Accident_Index', 'Number_of_Vehicles'], inplace=True)

# Split the latitude and longitude co-ordinates into separate columns
accident_sizes = accident_sizes['loc.coordinates'].apply(pd.Series)
accident_sizes.columns = ['Lon', 'Lat']

# Reset the index to unpack the stashed values
accident_sizes.reset_index(inplace=True)

accident_sizes.head()

I'm going to define the following colour coded category bins around the number of vehicles involved in an accident:

- <=1 vehicles: grey
- 2-3 vehicles: blue
- 4-6 vehicles: red
- \>=7 vehicles: black

In [None]:
AVERAGE_LOCATION = accident_sizes[['Lat', 'Lon']].median()

# Create the base map
m = folium.Map(AVERAGE_LOCATION, zoom_start=12)

def vehicle_count_markers(row, m):
    """Add a marker to a map, colored by severity."""
    if row['Number_of_Vehicles'] <= 1:
        color = 'grey'
    elif row['Number_of_Vehicles'] > 1 and row['Number_of_Vehicles'] <= 3:
        color = 'blue'
    elif row['Number_of_Vehicles'] > 3  and row['Number_of_Vehicles'] < 7:
        color = 'red'
    else:
        color = 'black'
    
    folium.Circle(location=[row['Lat'], row['Lon']],
                  color = color, radius=50, fill=True, fill_opacity=1.0).add_to(m)


accident_sizes.apply(vehicle_count_markers, m=m, axis=1)
m

#### End of Activity 2

-----------------------------

## What next?
If you are working through this Notebook as part of an inline exercise, return to the module materials now.

If you are working through this set of Notebooks as a whole, move on to `15.2 Searching within a geographical area`.