![](https://i.insider.com/55f2ea869dd7cc16008b99d5?width=1136&format=jpeg)**THANK YOU 911, HERE'S A SMALL HELPER TOOL FOR YOU**

In [None]:
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon
import fiona
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

Few things first:
Thankyou 911 and all the officers for helping the ones in need everytime. We love you!
Here's a small tribute to help you understand better how many calls you take in just 1 county and how many people in dismay are helped by you. This should also help you plan fire stations, Medical facilities and Traffic regulation modifications!

Okay, that said, let us start looking at the data very quickly...

In [None]:
warnings.filterwarnings("ignore")

In [None]:
df = pd.read_csv("../input/montcoalert/911.csv")

In [None]:
print(df.describe())
print("the columns are: \n ",df.columns)
print("Sample Data: \n", df.head())

In [None]:
df1 = df[df["twp"]=="LOWER POTTSGROVE"]

In [None]:
df1

In [None]:
for i in df.iloc[:,6]:
    if (i=="LOWER POTTSGROVE"):
        df["zip"] = 19464.0

Hmm...The data is interesting. We have also filled the missing values in! Let's see what out dataframe looks like now

How many Missing values do we have now?

In [None]:
df['zip'].isna().sum()

Lets make sure that we dont consider points that lie way outside out beautiful county

In [None]:
df = df[(df['lng']>=-75.7) & (df['lng']<=-75.0)]
df = df[(df['lat']>=39.8) & (df['lat']<=40.5)]
print(df['lat'].max())
print(df['lat'].min())
print(df['lng'].max())
print(df['lng'].min())

How many different descriptions do we have?

In [None]:
df['title'].nunique()

Time to categorize them and form larger chunks

In [None]:
df['Reason']=df['title'].apply(lambda x:x.split(':')[0])
df['Reason'].unique()

Looks much better now. Just 3 categories!

Time to plot everything!

In [None]:
f,ax=plt.subplots(1,2,figsize=(18,8))
df['Reason'].value_counts().plot.pie(explode=[0,0.1,0.1],autopct='%1.1f%%',ax=ax[0],shadow=True)
ax[0].set_title('Reason for Call')
ax[0].set_ylabel('Count')
sns.countplot('Reason',data=df,ax=ax[1],order=df['Reason'].value_counts().index)
ax[1].set_title('Count of Reason')
plt.show()

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])
df['Hour']=df['timeStamp'].apply(lambda x:x.hour)
df['Month']=df['timeStamp'].apply(lambda x:x.month)
df['DayOfWeek']=df['timeStamp'].apply(lambda x:x.dayofweek)
byMonth=df.groupby('Month').count()
byMonth['lat'].plot();
plt.title("line graph of 911 calls distribution per month")
sns.lmplot(x='Month',y='twp',data=byMonth.reset_index());
plt.title("linear Model of 911 calls distribution per month")

In [None]:
df.head()

In [None]:
df_1 = df[df['Reason']=="EMS"]
df_1['timeStamp']=pd.to_datetime(df_1['timeStamp'])
df_1['Hour']=df_1['timeStamp'].apply(lambda x:x.hour)
df_1['Month']=df_1['timeStamp'].apply(lambda x:x.month)
df_1['DayOfWeek']=df_1['timeStamp'].apply(lambda x:x.dayofweek)
byMonth_1=df_1.groupby('Month').count()
byMonth_1['lat'].plot();
plt.title("line graph of EMS calls distribution per month")
sns.lmplot(x='Month',y='twp',data=byMonth_1.reset_index());
plt.title("linear Model of EMS calls distribution per month")

In [None]:
df_2 = df[df['Reason']=="Fire"]
df_2['timeStamp']=pd.to_datetime(df_2['timeStamp'])
df_2['Hour']=df_2['timeStamp'].apply(lambda x:x.hour)
df_2['Month']=df_2['timeStamp'].apply(lambda x:x.month)
df_2['DayOfWeek']=df_2['timeStamp'].apply(lambda x:x.dayofweek)
byMonth_2=df_2.groupby('Month').count()
byMonth_2['lat'].plot();
plt.title("line graph of Fire calls distribution per month")
sns.lmplot(x='Month',y='twp',data=byMonth_2.reset_index());
plt.title("linear Model of Fire calls distribution per month")

In [None]:
df_3 = df[df['Reason']=='Traffic']
df_3['timeStamp']=pd.to_datetime(df_3['timeStamp'])
df_3['Hour']=df_3['timeStamp'].apply(lambda x:x.hour)
df_3['Month']=df_3['timeStamp'].apply(lambda x:x.month)
df_3['DayOfWeek']=df_3['timeStamp'].apply(lambda x:x.dayofweek)
byMonth_3=df.groupby('Month').count()
byMonth_3['lat'].plot();
plt.title("line graph of Traffic calls distribution per month")
sns.lmplot(x='Month',y='twp',data=byMonth_3.reset_index());
plt.title("linear Model of Traffic calls distribution per month")

In [None]:
street_map = gpd.read_file(r"../input/map-files/tl_2018_42091_roads.shp")

In [None]:
df.drop(['title'], axis = 1, inplace= True)
df.head()

In [None]:
fig,ax = plt.subplots(figsize = (15,15))
street_map.plot(ax = ax)
df_new = df

This is the map of Montgomery County in PA

In [None]:
crs = {'init':'EPSG:4326'}
#setting our coordinate system

In [None]:
geometry = [Point(xy) for xy in zip(df['lng'], df['lat'])]
geometry[:3]

In [None]:
geo_df = gpd.GeoDataFrame(df,crs = crs, geometry = geometry)
geo_df.drop(['lat','lng', 'desc', 'addr', 'e', 'timeStamp', 'zip', 'twp'], axis = 1, inplace = True)
geo_df.head()
geo_df = geo_df.iloc[5000:10000,:]
# Randomly taking 5000 entries to map. Looks very untidy otherwise
len(geo_df)

In [None]:
fig,ax = plt.subplots(figsize = (15,15))
street_map.plot(ax = ax,alpha = 0.4, color = "grey")
geo_df[geo_df['Reason']=='Fire'].plot(ax = ax, markersize=20, color = "orange", marker = "*",label = "Fire")
geo_df[geo_df['Reason']=='EMS'].plot(ax = ax, markersize=20, color = "green", marker = "+",label = "Medical")
geo_df[geo_df['Reason']=='Traffic'].plot(ax = ax, markersize=20, color = "blue", marker = "o",label = "Traffic")
plt.legend(prop = {'size' : 15})
plt.title("Distribution of all 5000 distress calls")

In [None]:
fig,ax = plt.subplots(figsize = (15,15))
street_map.plot(ax = ax,alpha = 0.4, color = "grey")
geo_df[geo_df['Reason']=='Fire'].plot(ax = ax, markersize=20, color = "orange", marker = "*",label = "Fire")
#geo_df[geo_df['Reason']=='EMS'].plot(ax = ax, markersize=20, color = "green", marker = "+",label = "Medical")
#geo_df[geo_df['Reason']=='Traffic'].plot(ax = ax, markersize=20, color = "blue", marker = "o",label = "Traffic")
plt.legend(prop = {'size' : 15})
plt.title("Distribution of Fire related distress calls")

In [None]:
fig,ax = plt.subplots(figsize = (15,15))
street_map.plot(ax = ax,alpha = 0.4, color = "grey")
#geo_df[geo_df['Reason']=='Fire'].plot(ax = ax, markersize=20, color = "orange", marker = "*",label = "Fire")
geo_df[geo_df['Reason']=='EMS'].plot(ax = ax, markersize=20, color = "red", marker = "+",label = "Medical")
#geo_df[geo_df['Reason']=='Traffic'].plot(ax = ax, markersize=20, color = "blue", marker = "o",label = "Traffic")
plt.legend(prop = {'size' : 15})
plt.title("Distribution of EMS related distress calls")

In [None]:
fig,ax = plt.subplots(figsize = (15,15))
street_map.plot(ax = ax,alpha = 0.4, color = "grey")
#geo_df[geo_df['Reason']=='Fire'].plot(ax = ax, markersize=20, color = "orange", marker = "*",label = "Fire")
#geo_df[geo_df['Reason']=='EMS'].plot(ax = ax, markersize=20, color = "green", marker = "+",label = "Medical")
geo_df[geo_df['Reason']=='Traffic'].plot(ax = ax, markersize=20, color = "blue", marker = "o",label = "Traffic")
plt.legend(prop = {'size' : 15})
plt.title("Distribution of Traffic related distress calls")

In [None]:
dayHour=df.groupby(by=['DayOfWeek','Hour']).count()['Reason'].unstack()
dayHour_1  = df_1.groupby(by=['DayOfWeek','Hour']).count()['Reason'].unstack()
dayHour_2  = df_2.groupby(by=['DayOfWeek','Hour']).count()['Reason'].unstack()
dayHour_3  = df_3.groupby(by=['DayOfWeek','Hour']).count()['Reason'].unstack()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHour,cmap='viridis');
plt.title("Hour vs day of the week busy-ness")

In [None]:
#Thursday 3-5 a lot of calls are reported

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayHour,cmap='viridis');
plt.title("Cluster map distribution busy-ness per hour vs per day ")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayHour_1,cmap='viridis');
plt.title("Cluster map distribution busy-ness per hour vs per day for EMS")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayHour_2,cmap='viridis');
plt.title("Cluster map distribution busy-ness per hour vs per day for Fire")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayHour_3,cmap='viridis');
plt.title("Cluster map distribution busy-ness per hour vs per day for Traffic")

In [None]:
dayMonth=df.groupby(by=['DayOfWeek','Month']).count()['Reason'].unstack()
dayMonth_1 = df_1.groupby(by=['DayOfWeek','Month']).count()['Reason'].unstack()
dayMonth_2 = df_2.groupby(by=['DayOfWeek','Month']).count()['Reason'].unstack()
dayMonth_3 = df_3.groupby(by=['DayOfWeek','Month']).count()['Reason'].unstack()

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayHour_1,cmap='coolwarm');
plt.title("Cluster map distribution busy-ness day of the week vs hour ")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayMonth_1,cmap='coolwarm');
plt.title("Cluster map distribution busy-ness per month vs day of the week for EMS")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayMonth_2,cmap='coolwarm');
plt.title("Cluster map distribution busy-ness per month vs day of the week for Fire")

In [None]:
plt.figure(figsize=(12,6));
sns.clustermap(dayMonth_3,cmap='coolwarm');
plt.title("Cluster map distribution busy-ness per month vs day of the week for Traffic")

Key insights from the Data:
1. March Fridays are abnormally busy
2. EMS, covering a majority of Data governs the distribution of 911 calls.
3. Most traffic calls come from nodes ( Ofcourse because there are signals there)
4. In all the cases , 911 calls generally significantly drop after the month of August
5. In general, 911 calls for all emergencies significantly drop from 10 pm to 6 am.
6. In general, Fridays 3pm to 5pm are the busiest
7. As a general trend, cases go up from September through December, drop from January to April, Increase significantly from April through August and then steeply fall from August to September
8. Latitudinal Standard Deviation is much lower than Longitudinal. Meaning that the cases are concentrated vertically and scattered horizontally across the county
9. Fire is a huge problem in the month of June in adition to the generic Fridays of March
10. Most cases occur between 9 A.M and 10 P.M during weekdays (basically working hours! Interesting!)
11. EMS problems are most called for from 11 A.M-1 P.M
12. Traffic problems are most called for from 3 P.M - 5 P.M
13. Fire problems are most called for from 4 P.M - 6 P.M