In [1]:
# Import packages

import plotly.express as px
import numpy as np
import pandas as pd
import geopandas as gpd
import shapely

### Visualisation: Bubble Map of incidents

This part of code is made to make a bubble map of the number of incidents per zones. For the zones, we decided to apply COROP zoning system, which is higher granuality than provinces, but less detailed than municipalities. The zoning of COROP can be found in the website of CBS.  
https://www.cbs.nl/nl-nl/achtergrond/2023/10/kaarten-regionale-indelingen-2023

In [2]:
# Reading incidents data
path_incidents = "../Advanced Data Science Data/Incidents_clean.csv"
df = pd.read_csv(path_incidents)

# Check if loaded correctly
display(df)

Unnamed: 0,type,starttime_new,endtime_new,vild_primair_wegnummer,primaire_locatie_lengtegraad,primaire_locatie_breedtegraad,duration
0,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A1,4.974663,52.346931,151160.933333
1,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A9,4.716725,52.514820,151160.933333
2,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A9,4.738364,52.609730,151160.933333
3,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A35,6.824692,52.204929,151160.933333
4,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A4,4.346407,52.041920,151160.933333
...,...,...,...,...,...,...,...
88846,vehicle_obstruction,2019-12-31 23:43:49,2019-12-31 23:43:49,,4.475721,51.893230,0.000000
88847,accident,2019-12-31 23:46:10,2019-12-31 23:46:10,N246,4.701406,52.443363,0.000000
88848,vehicle_obstruction,2019-12-31 23:47:01,2019-12-31 23:47:01,N31,5.922628,53.165226,0.000000
88849,vehicle_obstruction,2019-12-31 23:55:23,2019-12-31 23:55:23,A4,4.313686,51.442677,0.000000


The geographical information of the COROP zones were obtained from the website of Provincie Verijssel. The source can be found here.  
https://download.geoportaaloverijssel.nl/download/vector/539a949f-1315-45a7-b9a5-9fb00aaf9ecc

In [3]:
# Importing CORUP shape file
path_gpd = "../Advanced Data Science Data/COROP/B14_COROP_gebieden/B14_COROP_gebiedenPolygon.shp"
COROP = gpd.read_file(path_gpd)

The original shape file of COROP had different coordinate system from the incident data. So in the cell below the coordinate system of COROp shape file is translated into the same system as used in the incident data.

In [4]:
# Translate the geometry into epsg 4326 coordinates standard
COROP = COROP.to_crs(epsg=4326)
# Check if translated correctly
print(COROP.head(5))

      SHAPE_LENG RUBRIEK_XS      RUBRIEK JRSTATCODE STATNAAM_X  \
0  102265.901559       None  coropgebied   2018CR19       None   
1  190305.888230       None  coropgebied   2018CR37       None   
2  168527.142803       None  coropgebied   2018CR38       None   
3  158991.993399       None  coropgebied   2018CR39       None   
4  326651.059478       None  coropgebied   2018CR40       None   

              STATNAAM STATCODE_X STATCODE  OBJECTID  \
0  Alkmaar en omgeving       None     CR19      35.0   
1        Noord-Limburg       None     CR37      36.0   
2       Midden-Limburg       None     CR38      37.0   
3         Zuid-Limburg       None     CR39      38.0   
4            Flevoland       None     CR40      39.0   

                                            geometry  
0  POLYGON ((4.72522 52.69385, 4.72885 52.69140, ...  
1  POLYGON ((5.93277 51.74194, 5.93589 51.74103, ...  
2  POLYGON ((5.89041 51.31367, 5.91065 51.30771, ...  
3  POLYGON ((5.82368 51.06673, 5.82677 51.0558

This part is collecting and organising the data so that it can be utilised to identify the zone of each incident.

In [5]:
# Get zone names
zone_names = [i for i in COROP["STATNAAM"]]

# Get the geometry of the COROP
g = [i for i in COROP.geometry]

# Get the coordinates of the geometry
list_poly_coords = [shapely.geometry.mapping(g[i])['coordinates'][0] for i in range(len(g))]

# Turn the coordinates into numpy arrays
list_arr_coords = [np.array(list_poly_coords[i]) for i in range(len(list_poly_coords))]

# Organising dimension of arrays
for i in range(len(list_arr_coords)):
    # some polygon data has 3D array but the first dimention is always 1 so get rid of it
    if list_arr_coords[i].ndim == 3:
        list_arr_coords[i] = list_arr_coords[i][0]

Here, the zone of each incident is identified.

In [6]:
# Identify in which zone each incident is located
for k in range(len(df)):
    for j in range(len(g)):
        if shapely.geometry.Point(df["primaire_locatie_lengtegraad"][k], df["primaire_locatie_breedtegraad"][k]).within(g[j]):
            df.loc[k, "zone"] = zone_names[j]
            break        
        else:
            df.loc[k, "zone"] = "Unknown"

In [7]:
# Check if the zone is correctly identified
display(df)

Unnamed: 0,type,starttime_new,endtime_new,vild_primair_wegnummer,primaire_locatie_lengtegraad,primaire_locatie_breedtegraad,duration,zone
0,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A1,4.974663,52.346931,151160.933333,Groot-Amsterdam
1,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A9,4.716725,52.514820,151160.933333,IJmond
2,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A9,4.738364,52.609730,151160.933333,Alkmaar en omgeving
3,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A35,6.824692,52.204929,151160.933333,Twente
4,vehicle_obstruction,2019-08-28 12:11:32,2019-12-11 11:32:28,A4,4.346407,52.041920,151160.933333,Agglomeratie 's-Gravenhage
...,...,...,...,...,...,...,...,...
88846,vehicle_obstruction,2019-12-31 23:43:49,2019-12-31 23:43:49,,4.475721,51.893230,0.000000,Groot-Rijnmond
88847,accident,2019-12-31 23:46:10,2019-12-31 23:46:10,N246,4.701406,52.443363,0.000000,Zaanstreek
88848,vehicle_obstruction,2019-12-31 23:47:01,2019-12-31 23:47:01,N31,5.922628,53.165226,0.000000,Noord-Friesland
88849,vehicle_obstruction,2019-12-31 23:55:23,2019-12-31 23:55:23,A4,4.313686,51.442677,0.000000,West-Noord-Brabant


The zone distribution of incidents are aggregated per zones to show the geographical distribution of the incidents.

In [8]:
# Set up a dataframe for bubble map
columns_bubble = ["Zone", "Lat", "Lon", "Count"]
df_bubble = pd.DataFrame(columns=columns_bubble)

# input the zones and the coordinates of the centroid (centre of gravity)
df_bubble["Zone"] = zone_names
df_bubble["Lat"] = [list(shapely.geometry.Polygon(list_arr_coords[i]).centroid.coords)[0][1] for i in range(len(zone_names))]
df_bubble["Lon"] = [list(shapely.geometry.Polygon(list_arr_coords[i]).centroid.coords)[0][0] for i in range(len(zone_names))]

# Count the number of incidents in each zone
for i in range(len(zone_names)):
    df_bubble.loc[i, "Count"] = len(df[df["zone"] == zone_names[i]])


df_bubble["Count"] = df_bubble["Count"].astype(int)

In [9]:
# Check if the dataframe is correctly set up for the bubble map
display(df_bubble)

Unnamed: 0,Zone,Lat,Lon,Count
0,Alkmaar en omgeving,52.639448,4.758368,496
1,Noord-Limburg,51.464509,6.035926,988
2,Midden-Limburg,51.200997,5.875367,1196
3,Zuid-Limburg,50.8838,5.862185,2175
4,Flevoland,52.523277,5.860541,1654
5,Agglomeratie Haarlem,52.377009,4.611586,468
6,Oost-Groningen,53.080286,7.048793,307
7,IJmond,52.464935,4.576024,1017
8,Delfzijl en omgeving,53.328281,6.840129,47
9,Overig Groningen,53.369105,6.196435,1501


In [62]:
import folium
from folium.plugins import HeatMap
import branca

COROP["COUNT"] = df_bubble["Count"]

colors = branca.colormap.LinearColormap(
    ['green', 'yellow', 'red'], 
    vmin=df_bubble.Count.min(), 
    vmax=df_bubble.Count.max()
    )

def raster_choropleth(row):
    return {
        "fillColor": colors(row['properties']['COUNT']),
        "color": "white",
        "weight": 1,
        "fillOpacity": 0.75,
    }

#Make the list of Lat an Lng
lat = df_bubble.Lat.tolist()
lng = df_bubble.Lon.tolist()

#Create the Map
map = folium.Map(
    location=[52.5, 4.19],
    tiles='OpenStreetMap',
    zoom_start=7.4
)

# Add the COROP layer
gjson = folium.features.GeoJson(COROP,style_function=raster_choropleth,).add_to(map)

# Add legend
colors.caption = 'Number of Incidents'


# Add the color bar
map.add_child(colors)


map

The map above illustrates the number of incidents occurred in each zone. It can be observed that the number of incidents varies significantly between the zones. For instance, the "Groot-Rijmond" has 11575 incidents, while "Delfzijl en omgeving" only has 47 incidents occured. The geographical distribution of the incidents will be usuful to get ideas of where the road inspectors should be located in high priority because it is possible to know at which area incidents occur frequently.

In [10]:
# Plot the bubble map
fig = px.scatter_mapbox(df_bubble, lat='Lat', lon='Lon',
                        size='Count', size_max=30, zoom=5.7, color='Count', 
                        color_continuous_scale=px.colors.sequential.Turbo,
                        mapbox_style='open-street-map', hover_name='Zone')

fig.update_layout(title='Number of incidents per COROP zones',
                  title_x=0.5, font=dict(size=12))
fig.show()

The bubble map showes the number of incidents recorded in each COROP zones. It can be observed that the concentration of incidents