# Analysis of 911 Calls

I will be analyzing some 911 call data from [Kaggle](https://www.kaggle.com/mchirico/montcoalert). The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## Data and Setup

Setting up the environment

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

Loading the dataset

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

In [None]:
df.info()

In [None]:
df.head()

## Exploring the dataset

What are the top 5 zipcodes for 911 calls?

In [None]:
df['zip'].value_counts().head(5)

What are the top 5 townships (twp) for 911 calls?

In [None]:
df['twp'].value_counts().head(5)

How many unique title codes are there?

In [None]:
df['title'].nunique()

## Creating new features

In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. I will create a new column called "Reason" using this information. 

**For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS. **

In [None]:
df.head()

In [None]:
df['Reason'] = df['title'].apply(lambda r: r.split(":")[0])

In [None]:
df.head()

What is the most common Reason for a 911 call based off of this new column?

In [None]:
df['Reason'].value_counts()

In [None]:
sns.countplot(x='Reason', data=df)

## Time based analysis

Now let us begin to focus on time information. Checking the timestamp column type to see how it can be used.

In [None]:
type(df['timeStamp'][0])

Convering the column from strings to DateTime objects using [pd.to_datetime](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html).

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

In [None]:
df.info()

In [None]:
df['timeStamp'].iloc[0].weekday()

Now that the timestamp column are actually DateTime objects, we can create 3 new columns called Hour, Month, and Day of Week.

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda h: h.hour)

In [None]:
df['Month'] = df['timeStamp'].apply(lambda m: m.month)

In [None]:
df['Day of Week'] = df['timeStamp'].apply(lambda d: d.weekday())

In [None]:
df.head()

We can also use .map() with this dictionary to map the actual string names to the day of the week:

    dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['Day of Week'] = df['Day of Week'].map(dmap)

In [None]:
df.head()

Let's check the different reasons for 911 calls per day of the week

In [None]:
sns.countplot(x='Day of Week', hue='Reason', data=df)
plt.legend(bbox_to_anchor=(1.05,1))

And also for the month

In [None]:
sns.countplot(x='Month', hue='Reason', data=df)
plt.legend(bbox_to_anchor=(1.25,1))

The plot is missing some Months, let's see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months.

We can create a gropuby object called byMonth, where the DataFrame is grouped by the month column and use the count() method for aggregation.

In [None]:
byMonth = df.groupby('Month').count()

In [None]:
byMonth.head()

In [None]:
byMonth['twp'].plot()

In [None]:
byMonth.reset_index(inplace=True)
byMonth

In [None]:
sns.lmplot(x='Month', y='twp', data=byMonth)

Let's do some more time based analysis, this time on dates.

In [None]:
df['Date'] = df['timeStamp'].apply(lambda d: d.date())

In [None]:
df.head()

** We can group the number of calls by date using the Date column with the count() aggregate.

In [None]:
df.groupby('Date').count()

In [None]:
byDate = df.groupby('Date').count()
byDate['twp'].plot(figsize=(10,3))

Let's check the daily trend per reason.

In [None]:
df[df['Reason']=='Traffic'].groupby('Date')['twp'].count().plot(figsize=(10,3),title='Traffic')


In [None]:
df[df['Reason']=='Fire'].groupby('Date')['twp'].count().plot(figsize=(10,3),title='Fire')

In [None]:
df[df['Reason']=='EMS'].groupby('Date')['twp'].count().plot(figsize=(10,3),title='EMS')

Now let's move on to creating  heatmaps with seaborn and our data. We'll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week.

In [None]:
df.head()

In [None]:
df_matrix_day_hour = df.groupby(['Day of Week','Hour']).count().pivot_table(index='Day of Week', columns='Hour', values='twp')
df_matrix_day_hour

In [None]:
plt.figure(figsize=(10,5))
sns.heatmap(df_matrix_day_hour, cmap='viridis')

In [None]:
sns.clustermap(df_matrix_day_hour, cmap='viridis')

We can also recreate the heatmaps above using the months.

In [None]:
df_matrix_day_month = df.groupby(['Day of Week', 'Month']).count().pivot_table(index='Day of Week',
                                                                          columns = 'Month', values='twp')
df_matrix_day_month

In [None]:
plt.figure(figsize=(10,5))
sns.heatmap(df_matrix_day_month, cmap='viridis')

In [None]:
sns.clustermap(df_matrix_day_month, cmap='viridis')