# 911 Emergencies 

A US-county would like to know what are the main cases they need to focus on to protect their citizens. They hired you to get that kind of recommandations. In addition they give you a map with all the 911 calls they received over the past years. 

1. Import common libraries (including plotly) 

In [1]:
import pandas as pd 
import numpy as np 

from sklearn.preprocessing import  OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "iframe_connected"

2. Import the dataset here 👉👉 <a href="https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv" target="_blank">911.csv</a>

In [2]:
data = pd.read_csv("https://full-stack-bigdata-datasets.s3.eu-west-3.amazonaws.com/Machine+Learning+non+Supervis%C3%A9/DBSCAN/Datasets/911.csv")
data.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:10:52,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:29:21,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 14:39:21,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 16:47:36,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 16:56:52,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


3. The dataset is quite big, take a sample of 10 000 observations

In [3]:
data_sample = data.sample(10000)
data_sample.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
337789,40.069832,-75.316295,RAMP I76 WB TO MATSONFORD RD & SCHUYLKILL EXPY...,,EMS: VEHICLE ACCIDENT,2018-04-22 07:54:21,WEST CONSHOHOCKEN,RAMP I76 WB TO MATSONFORD RD & SCHUYLKILL EXPY WB,1
79229,40.142332,-75.240999,W BUTLER PIKE; WHITPAIN; Station 385; 2016-07...,19002.0,EMS: VEHICLE ACCIDENT,2016-07-05 07:18:39,WHITPAIN,W BUTLER PIKE,1
114687,40.100344,-75.293955,CHEMICAL RD & GALLAGHER RD; PLYMOUTH; 2016-10-...,19462.0,Traffic: VEHICLE ACCIDENT -,2016-10-04 12:09:12,PLYMOUTH,CHEMICAL RD & GALLAGHER RD,1
56311,40.244066,-75.614662,HIGH ST; POTTSTOWN; Station 329; 2016-05-06 @...,19464.0,EMS: ASSAULT VICTIM,2016-05-06 01:55:57,POTTSTOWN,HIGH ST,1
312642,40.074619,-75.151182,SHOPPERS LN & WASHINGTON LN; CHELTENHAM; Stat...,19027.0,EMS: CARDIAC EMERGENCY,2018-02-22 22:04:21,CHELTENHAM,SHOPPERS LN & WASHINGTON LN,1


5. Using plotly scatter mapbox, visualize your data points on a map. You should also differentiate colors depending on `title`

In [4]:
fig = px.scatter_mapbox(
        data_sample, 
        lat="lat", 
        lon="lng",
        color="title",
        mapbox_style="carto-positron"
)

fig.show()

6. The dataset is quite big, let's try to use DBSCAN to help us out. First, create a variable `X` that only includes `lat`, `lng` and `title` columns.

In [5]:
data_sample = data_sample.loc[:, ["lat", "lng", "title"]]
data_sample.head()

Unnamed: 0,lat,lng,title
337789,40.069832,-75.316295,EMS: VEHICLE ACCIDENT
79229,40.142332,-75.240999,EMS: VEHICLE ACCIDENT
114687,40.100344,-75.293955,Traffic: VEHICLE ACCIDENT -
56311,40.244066,-75.614662,EMS: ASSAULT VICTIM
312642,40.074619,-75.151182,EMS: CARDIAC EMERGENCY


7. Create dummy variables column `title`.

In [6]:
numeric_features = [0, 1] # Positions des colonnes quantitatives dans X
numeric_transformer = StandardScaler()

# Création du transformer pour les variables catégorielles
categorical_features = [2] # Positions des colonnes catégorielles dans X
categorical_transformer = OneHotEncoder(drop='first')

# On combine les transformers dans un ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Preprocessings sur le dataset
print("Preprocessing sur le train set...")
print(data_sample.head())
X = preprocessor.fit_transform(data_sample) # fit_transform !!
print('...Terminé.')
print(X[0:5, :])
print()

Preprocessing sur le train set...
              lat        lng                        title
337789  40.069832 -75.316295        EMS: VEHICLE ACCIDENT
79229   40.142332 -75.240999        EMS: VEHICLE ACCIDENT
114687  40.100344 -75.293955  Traffic: VEHICLE ACCIDENT -
56311   40.244066 -75.614662          EMS: ASSAULT VICTIM
312642  40.074619 -75.151182       EMS: CARDIAC EMERGENCY
...Terminé.
  (0, 0)	-0.4378400257210608
  (0, 1)	-0.00994390977028755
  (0, 52)	1.0
  (1, 0)	-0.07939192618219695
  (1, 1)	0.0359255005121839
  (1, 52)	1.0
  (2, 0)	-0.28698718836751924
  (2, 1)	0.003665475903821895
  (2, 86)	1.0
  (3, 0)	0.42358748690654885
  (3, 1)	-0.19170443987663652
  (3, 6)	1.0
  (4, 0)	-0.41417262821743117
  (4, 1)	0.09064083177312553
  (4, 12)	1.0



8. Let's start using DBSCAN, import the module and fit DBSCAN to your data. You should use `eps=0.2`, `min_samples=100` and `metric="manhattan"` as parameters

In [8]:
from sklearn.cluster import DBSCAN

db = DBSCAN(eps=0.2, min_samples=100, metric="manhattan")

db.fit(X)

DBSCAN(eps=0.2, metric='manhattan', min_samples=100)

9. Find out how many clusters DBSCAN created. 

In [9]:
np.unique(db.labels_)

array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

10. Add a new column `"cluster"` to `data_sample` where each observations are going to be the label of the corresponding cluster.

In [10]:
data_sample["cluster"] = db.labels_
data_sample.head()

Unnamed: 0,lat,lng,title,cluster
337789,40.069832,-75.316295,EMS: VEHICLE ACCIDENT,4
79229,40.142332,-75.240999,EMS: VEHICLE ACCIDENT,4
114687,40.100344,-75.293955,Traffic: VEHICLE ACCIDENT -,0
56311,40.244066,-75.614662,EMS: ASSAULT VICTIM,-1
312642,40.074619,-75.151182,EMS: CARDIAC EMERGENCY,3


11. Visualize all the clusters on a map except all the ones that DBSCAN considered as outliers.

In [11]:
fig = px.scatter_mapbox(
        data_sample[data_sample.cluster != -1], 
        lat="lat", 
        lon="lng",
        color="cluster",
        mapbox_style="carto-positron"
)

fig.show()

12. Visualize all data points on a map except outliers using plotly. You should have different colors per `title`. 

13. What would then be your recommandations for this US county politicians? 

In [12]:
px.scatter_mapbox(
    data_sample.loc[data_sample.cluster != -1, :],
    lat="lat",
    lon="lng",
    color="title",
    mapbox_style="carto-positron"
)

**The map shows the main topics to focus on and the main areas where this events occur. Therefore these are the areas that politics should focus on.** 