# Emergency - 911 Calls Project

The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)


# Evaluation:
During the notebook try to answer the following questions:
* Which features are available in the dataset?
* How many rows and columns does the dataset have?
* Which features are categorical?
* Which features are numerical?
* Which features contain blank, null or empty values?
* What are the data types for various features?
* How many zip codes does the dataset have?
* What are the top 5 zip codes for 911 calls? 
* What are the top 5 townships (twp) for 911 calls? 
* How many unique title of emergency codes are there?
* What is the most common Reason for a 911 call based off of this new column?

## Data and Setup

____
** Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

** Import visualization libraries and set %matplotlib inline. **

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
sns.set_style('whitegrid')

** Read in the csv file as a dataframe called df **

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

** Check the info() of the df **

** How many rows and columns does the dataset have? **

** Which features are available in the dataset? **

** Which features contain blank, null or empty values? **

In [None]:
df.info()

** Check the head of df **

In [None]:
df.head()

## Basic Questions

** How many zip codes does the dataset have? **

In [None]:
df['zip'].value_counts().count()

** What are the top 5 zipcodes for 911 calls? **

In [None]:
df['zip'].value_counts().head(5)

** What are the top 5 townships (twp) for 911 calls? **

In [None]:
df['twp'].value_counts().head(5)

** How many unique title codes are there in the 'title' column? **

In [None]:
df['title'].nunique()

## Creating new features

* In the titles column there can be "Reasons/Departments" specified before the title code: EMS, Fire, and Traffic. 
* Use a custom lambda expression to create a new column called "Reason" that contains this string value. 
* Eg. the title column value is EMS: BACK PAINS/INJURY , then the Reason column value would be EMS.

In [None]:
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])
df['type'] = df['title'].apply(lambda title: title.split(':')[1]).apply(lambda title: title.split('/')[0])

** What is the most common Reason for a 911 call based off of this new column? **

In [None]:
df['Reason'].value_counts()

** How many unique title of emergency codes are there? **

In [None]:
df[df['Reason']=='EMS']['title'].apply(lambda title: title.split(':')[1]).nunique()

** Create the pivot table to count different types and reasons of 911 calls** 

In [None]:
table = pd.pivot_table(df, 
                       values='e', 
                       index=['Reason'], 
                       columns=['type'], 
                       aggfunc=np.sum,
              )
table

** Use seaborn to create a countplot of 911 calls by Reason. **

In [None]:
sns.countplot(x=df['Reason'], data=df, palette='rainbow')

** What are the data types for various features? **

** Which features are numerical?**

In [None]:
type(df['lat'].iloc[0])

In [None]:
type(df['lng'].iloc[0])

In [None]:
type(df['zip'].iloc[0])

In [None]:
type(df['e'].iloc[0])

** Which features are categorical? **

In [None]:
type(df['desc'].iloc[0])

In [None]:
type(df['title'].iloc[0])

In [None]:
type(df['timeStamp'].iloc[0])

In [None]:
type(df['twp'].iloc[0])

In [None]:
type(df['addr'].iloc[0])

** Convert the column from strings to DateTime objects. **

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

* Now the timestamp column are DateTime objects.
* Create 3 new columns called Hour, Month, and Day of Week based off of the timeStamp column. 

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda time:time.hour)
df['Month'] = df['timeStamp'].apply(lambda time:time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time:time.dayofweek)

* Define the Day of Week is an integer 0-6. 
* Map the actual string names to the day of the week with dictionary

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['Day of Week'] = df['Day of Week'].map(dmap)

** Use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **

In [None]:
sns.countplot(x='Day of Week', data=df, hue='Reason', palette='rainbow')

# Relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), 
           loc='upper left', 
           borderaxespad=0, 
           edgecolor='white')

**Now do the same for Month:**

In [None]:
sns.countplot(x='Month', data=df, hue='Reason', palette='rainbow')

# Relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), 
           loc='upper left', 
           borderaxespad=0, 
           edgecolor='white')

**Something strange about the Plot**
_____
* There is missing some Months: 9,10, and 11 are not here. 
* Try to see if we could fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months with pandas.
* Create a gropuby object called byMonth to group the DataFrame by the month column.
* Use count method for aggregation.

In [None]:
byMonth = df.groupby('Month').count()
byMonth.head()

** Create a simple plot off of the dataframe indicating the count of calls per month. **

In [None]:
byMonth['twp'].plot()

** Use seaborn to create a linear fit on the number of calls per month. May need to reset the index to a column. **

In [None]:
sns.lmplot(x='Month', y='twp', data=byMonth.reset_index())

** Create a new column called 'Date' that contains the date from the timeStamp column. **

In [None]:
df['Date'] = df['timeStamp'].apply(lambda time:time.date())
byDate = df.groupby(['Date']).count()
byDate.head()

* Groupby the Date column with the count aggregate.
* Create a plot of counts of 911 calls.

In [None]:
plt.figure(figsize=(12, 8))
byDate['twp'].plot()
plt.tight_layout()

** Recreate 3 separate plots with each plot representing a Reason for the 911 call**

In [None]:
plt.figure(figsize=(12, 8))
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

In [None]:
plt.figure(figsize=(12, 8))
df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()

In [None]:
plt.figure(figsize=(12, 8))
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

____
* Create  heatmaps with seaborn and data. 
* Restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. 
* Try to combine groupby with an [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method. 

In [None]:
# Solution 1
dayHour = df.groupby(['Day of Week', 'Hour']).count()['twp'].unstack(level=-1)
dayHour.head()

In [None]:
# Solution 2
table2 = pd.pivot_table(df, 
                       values='e', 
                       index=['Day of Week'], 
                       columns=['Hour'], 
                       aggfunc=np.sum,
              )
table2

** Create a HeatMap with new DataFrame. **

In [None]:
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='coolwarm')

** Create a clustermap whit new DataFrame. **

In [None]:
sns.clustermap(dayHour, cmap='coolwarm')

** Repeat these same plots and operations for a DataFrame that shows the Month as the column. **

In [None]:
dayMonth = df.groupby(['Day of Week', 'Month']).count()['twp'].unstack(level=-1)
dayMonth.head()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayMonth, cmap='coolwarm')

In [None]:
sns.clustermap(dayMonth, cmap='coolwarm')