# '911 Calls' EDA Project

The data contains the following fields (after checking the info of the dataframe):

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## 1. Data and Setup

**Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

**Import visualization libraries and set %matplotlib inline. **

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

**Read in the csv file as a dataframe called df **

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

**Check the info() of the df **

In [None]:
df.info()

**Check the head of df **

In [None]:
df.head(3)

## 2. Exploratory Data Analysis (EDA) Questions

**What are the top 5 zipcodes for 911 calls? **

In [None]:
df['zip'].value_counts().head(5)

**What are the top 5 townships (twp) for 911 calls? **

In [None]:
df['twp'].value_counts().head(5)

**How many unique title codes are there? **

In [None]:
len(df['title'].unique())

## 3. Creating new features/columns

**In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. Use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value.** 

**For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS. **

In [None]:
df['Reason'] = df['title'].apply(lambda title: title.split(':')[0])

**What is the most common Reason for a 911 call based off of this new column? **

In [None]:
df['Reason'].value_counts()

**Now use seaborn to create a countplot of 911 calls by Reason. **

In [None]:
sns.countplot(x='Reason',data=df,palette='coolwarm')

**What is the data type of the objects in the timeStamp column? **

In [None]:
type(df['timeStamp'].iloc[0])

**You should have seen that these timestamps are still strings. Use [pd.to_datetime] to convert the column from strings to DateTime objects. **

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

**Use .apply() to create 3 new columns called Hour, Month, and Day of Week based off of the timeStamp column. **

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)

**Translate dayofweek from numeric 1-7 to the abbreviation (i.e. Wed): **

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'} # CREATE DICTIONARY

In [None]:
df['Day of Week'] = df['Day of Week'].map(dmap)

**Use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **

In [None]:
sns.countplot(x='Day of Week',data=df,hue='Reason',palette='viridis')

# TO PUT LEGEND OUTSIDE OF THE PLOT
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

**Now do the same for Month: **

In [None]:
sns.countplot(x='Month',data=df,hue='Reason',palette='viridis')

# TO PUT LEGEND OUTSIDE OF THE PLOT
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

**Notice it is missing some Months! **

## 4. Line Plots of 911 Calls Frequency

**Now create a gropuby object called byMonth, where you group the DataFrame by the month column. **

In [None]:
byMonth = df.groupby('Month').count()
byMonth.head()

**Create a simple plot of the dataframe indicating the count of calls per month. **

In [None]:
byMonth['lat'].plot()

**Use seaborn's lmplot() to create a linear fit on the number of calls per month. **

In [None]:
sns.lmplot(x='Month',y='lat',data=byMonth.reset_index())

**Create a new column called 'Date' that contains the date from the timeStamp column. ** 

In [None]:
df['Date']=df['timeStamp'].apply(lambda t: t.date())
df

**Groupby this Date column with the count() aggregate and create a plot of counts of 911 calls. **

In [None]:
df.groupby('Date').count()['lat'].plot()
plt.tight_layout()

**Create 3 separate plots with each plot representing a Reason for the 911 call. **

In [None]:
df[df['Reason']=='Traffic'].groupby('Date').count()['lat'].plot()
plt.title('Traffic')
plt.tight_layout()

In [None]:
df[df['Reason']=='Fire'].groupby('Date').count()['lat'].plot()
plt.title('Fire')
plt.tight_layout()

In [None]:
df[df['Reason']=='EMS'].groupby('Date').count()['lat'].plot()
plt.title('EMS')
plt.tight_layout()

## 5. Heatmaps of 911 Calls Frequency based on Day/Month/Hour

**Creating  heatmaps with seaborn and the data. We'll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. **

In [None]:
dayHourGrid = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHourGrid.head()

** Now create a HeatMap using this new DataFrame. **

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHourGrid,cmap='viridis')

**Create a clustermap using this DataFrame. **

In [None]:
sns.clustermap(dayHourGrid,cmap='viridis')

**Repeat these same plots and operations, for a DataFrame that shows the Month as the column. **

In [None]:
dayMonthGrid = df.groupby(by=['Day of Week','Month']).count()['Reason'].unstack()
dayMonthGrid.head()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayMonthGrid,cmap='viridis')

In [None]:
sns.clustermap(dayMonthGrid,cmap='viridis')

## Completed the Exploratory Data Analysis (EDA) of 911 Calls History.