# **Emergency 911 Calls Analysis and Visualization**


### August 16,2018

## About the 911 dataset:  
Emergency (911) Calls: Fire, Traffic, EMS for Montgomery County, PA is the third most populous country in the Pennsylvania state. 

<table>
  <tr>
    <th id="name" >Variable Name</font></th>
    <th id="email">Type</th>
    <th id="phone">Description</th>
  </tr>
  <tr>
    <td headers="name">lat</td>
    <td headers="email">String</td>
    <td headers="phone">Latitude</td>
  </tr>
      <tr>
    <td headers="name">lng</td>
    <td headers="email">String</td>
    <td headers="phone">Longitude</td>
  </tr>
  <tr>
    <td headers="name">desc</td>
    <td headers="email">String</td>
    <td headers="phone">Description of the Emergency Call</td>
  </tr>
     <tr>
    <td headers="name">zip</td>
    <td headers="email">String</td>
    <td headers="phone">Zipcode</td>
  </tr>
      <tr>
    <td headers="name">title</td>
    <td headers="email">String</td>
    <td headers="phone">Title</td>
  </tr>
  <tr>
    <td headers="name">timeStamp</td>
    <td headers="email">String</td>
    <td headers="phone">YYYY-MM-DD HH:MM:SS (data and time for the call) </td>
  </tr>
      <tr>
    <td headers="name">twp</td>
    <td headers="email">String</td>
    <td headers="phone">Township</td>
  </tr>
   <tr>
    <td headers="name">addr</td>
    <td headers="email">String</td>
    <td headers="phone">Address</td>
    </tr>
</table>  

### **New Columns as a Features**  

<table>
  <tr>
    <th id="name" >Variable Name</font></th>
    <th id="email">Type</th>
    <th id="phone">Description</th>
  </tr>
  <tr>
    <td headers="name">Reason</td>
    <td headers="email">String</td>
    <td headers="phone">The reson behind the call</td>
  </tr>
      <tr>
    <td headers="name">Hour</td>
    <td headers="email">Numeric</td>
    <td headers="phone">Hour of the call</td>
  </tr>
  <tr>
    <td headers="name">Month</td>
    <td headers="email">Numeric</td>
    <td headers="phone">Month of the call</td>
  </tr>
     <tr>
    <td headers="name">Day of Wekk</td>
    <td headers="email">String</td>
    <td headers="phone">The day of the call</td>
  </tr>
  <tr>
    <td headers="name">Date</td>
    <td headers="email">Date</td>
    <td headers="phone">The date of the call</td>
  </tr>
</table>  

## Loading the dataset

In [None]:
import numpy as np 
import pandas as pd 
import os
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("../input/911.csv")
df.head()

### Display the dataframe information

In [None]:
df.info()

## Basic Questions

**What are the top 5 zipcodes for 911 calls?**

In [None]:
df['zip'].value_counts().head(5)

**What are the top 5 townships (twp) for 911 calls?**

In [None]:
df['twp'].value_counts().head(5)

## Creating new features  

In the title column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic.  
So we will use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value that tell us what's the reason from this call.  
For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.  


In [None]:
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

So the "Reason" column is added on the right

In [None]:
df.head()

**What is the most common Reason for a 911 call based off of this new column?**

In [None]:
df['Reason'].value_counts()

Now we will display the count plot of 911 calls by Reason

In [None]:
sns.countplot(x='Reason',data=df,palette='viridis')

So now we will convert the timeStamp variable from string to time object to give us the ability to use the time information in our analysis
Also we will create 3 new columns for adding (Hour, Month and Day of Week)
And you can see the 3 columns on the right

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)
df.head()

Count plot of the Day of Week column with the hue based off of the Reason column

In [None]:
sns.countplot(x='Day of Week',data=df,hue='Reason',palette='viridis')

# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Count plot of the Month column with the hue based off of the Reason column

In [None]:
sns.countplot(x='Month',data=df,hue='Reason',palette='viridis')

# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

Line plot indicating the count of calls per month

In [None]:
byMonth = df.groupby('Month').count()
byMonth['twp'].plot()

 Using seaborn's lmplot() to display a linear model plot fit on the number of calls per month

In [None]:
sns.lmplot(x='Month',y='twp',data=byMonth.reset_index())

Create new column date containing the date from the timeStamp coulmn, and then we will display the count of calls per date for every Reason

In [None]:
df['Date']=df['timeStamp'].apply(lambda t: t.date())
df.groupby('Date').count()['twp'].plot()
plt.tight_layout()

In [None]:
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

In [None]:
df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()

In [None]:
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

## HeatMaps  
First we need to restructure the dataframe, so that the columns become the Hours and the Index becomes the Day of the Week.

In [None]:
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHour)

Creating the cluster map based on the dayHour data

In [None]:
sns.clustermap(dayHour)

Creating the heat map and cluster map that shows month as a column

In [None]:
dayMonth = df.groupby(by=['Day of Week','Month']).count()['Reason'].unstack()
dayMonth.head()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayMonth)

In [None]:
sns.clustermap(dayMonth)