# 911 Calls Analysis Project


* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)


## Data and Setup

____
** Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

** Import visualization libraries and set %matplotlib inline. **

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Read in the csv file as a dataframe called df 

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

In [None]:
df.head()

Check the info() of the df 

In [None]:
df.info()

<h4><b>Dealing with the null value</b></h4>

In [None]:
df.isnull().sum()

In [None]:
df['zip'] = df['zip'].fillna(df.groupby('twp')['zip'].transform('max'))

In [None]:
df.isnull().sum()

<h4><b>Now total null values are 462 which we will drop directly</b></h4>

In [None]:
df.shape

In [None]:
df = df.dropna()

In [None]:
df.shape

In [None]:
df.isnull().sum()

In [None]:
df.head()

## Basic Questions

What are the top 5 zipcodes from which most 911 calls recived? 

In [None]:
df['zip'].value_counts().head()

What are the top 5 townships (twp) from which most 911 calls recived? 

In [None]:
df['twp'].value_counts().head()

How many unique title codes are there? 

In [None]:
df['title'].nunique()

## Creating new features

In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. We will create a new column called "Reason" that contains this string value. 

For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.

In [None]:
reson=[]
def reasons(x):
    reson.append(x.split(':')[0])
df['title'].apply(reasons)
df['reason'] = reson

In [None]:
Sub_reason=[]
def sub_reasons(x):
    sub_reason = x.split(':')[1]
    Sub_reason.append(sub_reason.replace(' -',''))
df['title'].apply(sub_reasons)
df['sub_reasons'] = Sub_reason

What is the most common Reason for a 911 call based off of this new column? 

In [None]:
df['reason'].value_counts()

create a countplot of 911 calls by Reason.

In [None]:
plt.figure(figsize=(8,5),dpi=100)
sns.countplot(x ='reason',data= df)
plt.title('Reason count',fontsize=10)
plt.ylabel('Emergency reason count',fontsize=10)
plt.xlabel('Reasons',fontsize=10);

In [None]:
type(df['timeStamp'].iloc[0])

**We will create 3 new columns called Hour, Month, and Day of Week based off of the timeStamp column.**
**We will map the actual string names to the day of the week: **

    dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

In [None]:
days=[]
hours =[]
months =[]
dayofweek =[]
def hdmcol(x):
    days.append(x.day)
    hours.append(x.hour)
    months.append(x.month)
    dayofweek.append(x.dayofweek)
pd.to_datetime(df['timeStamp'])
df['timeStamp'].apply(hdmcol)
df['day'] = days
df['hour'] = hours
df['month'] = months
df['dayofweek'] = dayofweek
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['day_of_week'] = df['dayofweek'].map(dmap)
df.drop('dayofweek', inplace = True,axis =1)
df.head()

In [None]:
plt.figure(figsize=(12,5),dpi=200)
sns.countplot(x='day_of_week',data = df, hue = 'reason')
plt.title('Day of week vs. reason count on that day',fontsize=17)
plt.legend(loc=(1.05,0.5))
plt.ylabel('Emergency reson count',fontsize=13)
plt.xlabel('Day of week',fontsize=13);

<h5><b>EMS emergency is high throughout the week while fire emergency is lower and quiet similiar for all days</b></h5> 

**Now do the same for Month:**

In [None]:
plt.figure(figsize=(12,5),dpi=200)
sns.countplot(x='month',data=df,hue='reason')
plt.title('Month vs. reason count on that month',fontsize=17)
plt.legend(loc=(1.05,0.5))
plt.ylabel('Emergency reson count',fontsize=13)
plt.xlabel('Month',fontsize=13);

<h4><b>what are the emergency reason count in day and night?</b></h4>
<ul>
    <li>For day hour = 7 - 19</li>
    <li>For night hour = 00-6 and 18-00</li>
</ul>

In [None]:
def day_night(x):
    if x in range(7,20):
        return 'Day hour'
    elif x in range(0,7):
        return 'Night hour'
    else:
        return 'Night hour'

In [None]:
df['day_night']=df['hour'].apply(day_night)

In [None]:
plt.figure(figsize=(12,5),dpi=200)
sns.countplot(x='day_night',data=df,hue='reason')
plt.title('Day and Night vs. Emergency reason count in day and night',fontsize=17)
plt.legend(loc=(1.05,0.5))
plt.ylabel('Emergency reson count',fontsize=13)
plt.xlabel('Day and Night hours',fontsize=13);

In [None]:
plt.figure(figsize=(18,6),dpi=200)
sns.countplot(x='hour',data=df,hue='reason')
plt.title('Hours vs. Emergency reason count in day and night',fontsize=17)
plt.legend(loc=(1.05,0.5))
plt.ylabel('Emergency reson count',fontsize=13)
plt.xlabel('Hours',fontsize=13);

<ul>
<li><h5><b>Emergency calls in day time is large due to the high crowd in the prime time</b></h5></li>
<li><h5><b>Prime time is between 8:00 to 18:00 where most of the emergency calls occurs</b></h5></li>
<li><h5><b>Prime time for the traffic emergency is between the 11:00 to 17:00 due to the office crowd</b></h5></li>

<h3> Observations for Emergency reasons and their respective sub reasons with township</h3>

<h4><b>Township with most Fire emergency</b></h4>

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[df['reason']=='Fire'].sort_values(by='twp'))
plt.title('Township vs. Fire emergency count',fontsize=20)
plt.ylabel('Fire Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

<h3><b> Sub reasons due to which fire emergency occurs</b></h3>

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='sub_reasons',data=df[df['reason']=='Fire'])
plt.title('Sub_reasons count for Fire reason',fontsize=20)
plt.ylabel('Sub reason count for Fire',fontsize=15)
plt.xlabel('Sub reasons count',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[df['sub_reasons']==' FIRE ALARM'].sort_values(by='twp'))
plt.title('Township vs. FIRE ALARM count',fontsize=20)
plt.ylabel('FIRE ALARM Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

<li><h5><b>Lower merion</b> has most Fire emergencies</h5></li>
<li><h5><b>Fire alarm</b> is the sub reason due to which most of the fire emergency occurs</h5></li>
<li><h5><b>Lower Merion and Abington</b> are the township for the most fire alarm</h5></li>

<h4><b>Township with most EMS emergency</b></h4>

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[df['reason']=='EMS'].sort_values(by='twp'))
plt.title('Township vs. EMS emergency count',fontsize=20)
plt.ylabel('EMS Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[(df['sub_reasons']==' FALL VICTIM')].sort_values(by='twp'))
plt.title('Township vs. FALL VICTIM count',fontsize=20)
plt.ylabel(' FALL VICTIM Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[(df['sub_reasons']==' RESPIRATORY EMERGENCY')].sort_values(by='twp'))
plt.title('Township vs. RESPIRATORY EMERGENCY count',fontsize=20)
plt.ylabel(' RESPIRATORY EMERGENCY Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

<li><h5><b>Norristown</b> has most EMS emergencies</h5></li>
<li><h5><b>Fall victim and Respiratory</b> are the sub reason due to which most of the EMS emergency occurs</h5></li>
<li><h5><b>Lower Providence,Abington and Lower Merion</b> are the township for the most fall victims</h5></li>
<li><h5><b>Norristown and Lower Merion</b> are the township for the most Respiratory emergency implies that air pollution is higher than other cities</h5></li>

<h4><b>Township with most Traffic emergency</b></h4>

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[df['reason']=='Traffic'].sort_values(by='twp'))
plt.title('Township vs. traffic emergency count',fontsize=20)
plt.ylabel('Traffic Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='sub_reasons',data=df[df['reason']=='Traffic'])
plt.title('Sub_reasons count for Traffic reason',fontsize=20)
plt.ylabel('Sub reason count for Traffic',fontsize=15)
plt.xlabel('Sub reasons count',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
sns.countplot(x='twp',data=df[df['sub_reasons']==' VEHICLE ACCIDENT'].sort_values(by='twp'))
plt.title('Township vs. VEHICLE ACCIDENT count',fontsize=20)
plt.ylabel(' VEHICLE ACCIDENT Count',fontsize=15)
plt.xlabel('Township',fontsize=15)
plt.xticks(rotation=90);

<li><h5><b>Lower Merion and Upper Merion</b> has most Traffic emergencies</h5></li>
<li><h5><b>Vehicle accidents</b> is the sub reason due to which most of the Traffic emergency occurs</h5></li>
<li><h5><b>Lower Merion</b> are the township for the most Vehicle accidents occurs</h5></li>

<h3>Key Observation for the township</h3>

<li><h5><b>Lower Merion and Abington</b> has most emergencies occurs.</h5></li>
<li><h5>These cities needs better medical service and traffic control systems</h5></li>
<li><h5>One way is to encourage the people to use public transport this will decrease the no of vehicles on road and thus lower the traffic emergency also reduce the pollution.</h5></li>
<li><h5>Provide the guidelines for the maintain good health and regular health chekups.</h5></li>

**Create a new column called 'Date' that contains the date from the timeStamp column.** 

In [None]:
def date(x):
    return x.date()
df['date'] = df['timeStamp'].apply(date)
df.head()

<h3> 3 separate plots with each plot representing a Reason for the 911 calls grouping dates</h3>

In [None]:
plt.figure(figsize=(20,5),dpi=200)
df[df['reason']=='Fire'].groupby('date').count()['twp'].plot()
plt.title('Date vs. fire emergency count',fontsize=20)
plt.ylabel('Fire Count',fontsize=15)
plt.xlabel('Date',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
df[df['reason']=='EMS'].groupby('date').count()['twp'].plot()
plt.title('Date vs. EMS emergency count',fontsize=20)
plt.ylabel('EMS Count',fontsize=15)
plt.xlabel('Date',fontsize=15)
plt.xticks(rotation=90);

In [None]:
plt.figure(figsize=(20,5),dpi=200)
df[df['reason']=='Traffic'].groupby('date').count()['twp'].plot()
plt.title('Date vs. Traffic emergency count',fontsize=20)
plt.ylabel('Traffic Count',fontsize=15)
plt.xlabel('Date',fontsize=15)
plt.xticks(rotation=90);

<li><h5><b>Fire and Traffic</b> is peak between the 2018 to 2019</h5></li>
<li><h5><b>EMS</b> emergency is quiet same from 2016</h5></li>

<h3>Lets see the emergency count on specific day in specific time frame</h3>

In [None]:
data1 = df.groupby(by=['day_of_week','hour']).count()['reason'].unstack()
data1

In [None]:
plt.figure(figsize=(15,10),dpi=200)
sns.heatmap(data1,linewidths=1)
plt.xlabel('Hours',fontsize=15)
plt.ylabel('Day of week',fontsize=15)
plt.title('Emergency count on specific day in specific time frame',fontsize=20);

<h4><b>It is clerly seen that most of the emergency occurs during day hours for each day of the week except on weekend it is less than any other day</b></h4>