# 911 Calls Capstone Project

**By Dariana Vielma G.**

**In this project I will be working with Kaggle data set, analyzing information from calls to 911, cleaning the data in order to obtain the main reasons that originate the calls to 911, the zip codes that most calls from where they receive calls, and I will also create some heat maps showing the relationship between days of the week and hours in which the most calls are received, as well as the months with the most calls.**

For this capstone project I will be analyzing some 911 call data from Kaggle. The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## Data and Setup

I will be using numpy and pandas to process the data, as well as matplotlib, seaborn and plotly for the graphs

** Import libraries **

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
%matplotlib inline

** Import visualization libraries and set %matplotlib inline. **

In [None]:
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

In [None]:
init_notebook_mode(connected=True)

Using Pandas package lets get the data from 911.cvs, and work with this as a DataFrame

** Read in the csv file as a dataframe called df **

In [None]:
df = pd.read_csv('../input/911csv/911.csv')

In [None]:
df.info()

It is important to review the information with which we will be working, for this we can quickly check the header

** Check the head of df **

In [None]:
df.head()

## Basic Questions

** What are the top 5 zipcodes for 911 calls? **

In [None]:
df['zip'].value_counts().head(5)

** What are the top 5 townships (twp) for 911 calls? **

In [None]:
df['twp'].value_counts().head(5)

** Take a look at the 'title' column, how many unique title codes are there? **

In [None]:
df['title'].unique().size

## Creating new features

** In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. 
Let's apply a Lambda expression to separate that information and create a new column that contains the specific reason for the call.**


In [None]:
reasons = df['title'].apply(lambda x: x.split(':')[0])

In [None]:
reasons

In [None]:
df['reason'] = reasons

In [None]:
df

** Based on the new column, the most common reason for calling 911 is EMS (Emergency medical services) **

In [None]:
df['reason'].value_counts()

## Visualization

** With the help of seaborns let's create a countplot of 911 calls by Reason. **

In [None]:
sns.countplot(x='reason', data = df)

** Now let us begin to focus on time information. What is the data type of the objects in the timeStamp column? **

In [None]:
type(df['timeStamp'][0])

** The timestamps are still strings. Using [pd.to_datetime](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) to convert the column from strings to DateTime objects. **

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])

** Now that the timestamp column are actually DateTime objects, I will use .apply() to create 3 new columns called Hour, Month, and Day of Week, and then create these columns based off of the timeStamp column. **

In [None]:
time = df['timeStamp'][0]

In [None]:
time.dayofweek

In [None]:
hour = df['timeStamp'].apply(lambda x: x.hour)

In [None]:
df['hour'] = hour

In [None]:
month = df['timeStamp'].apply(lambda x: x.month)

In [None]:
df['month'] = month

In [None]:
day = df['timeStamp'].apply(lambda x: x.dayofweek)

In [None]:
df['day of week'] = day

In [None]:
df.head(1)

** Notice how the Day of Week is an integer 0-6. Use the .map() with this dictionary to map the actual string names to the day of the week: **

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
#df.replace({"day of week": dmap})
df['day of week'] = df['day of week'].map(dmap)

In [None]:
df.head(1)

** Now using seaborn let's create a countplot of the Day of Week column with the hue based off of the Reason column. **

** Weekly calls
* With this we can know that the days with the most calls are Friday, Monday and Tuesday **

In [None]:
sns.countplot(x='day of week', data=df, hue='reason', palette='summer')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)

** Now do the same for Month: **

In [None]:
sns.countplot(x='month', data=df, hue='reason', palette='pastel')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)

** Now with groupby let's group the DataFrame by month, using count() for aggregation. **

In [None]:
byMonth = df.groupby('month').count()

In [None]:
byMonth.head()

** Plot of counts of 911 calls by reason, using the group object. **

In [None]:
byMonth['twp'].plot()

** Using Implot to create a linear fit on the number of calls per month.**

In [None]:
sns.lmplot(y='twp', x='month', data=byMonth.reset_index())

** Now a new column called "date", its for separate the date from timeStamp. **

In [None]:
df['timeStamp'][0]

In [None]:
date = df['timeStamp'].apply(lambda x: x.date())

In [None]:
df['date']=date

In [None]:
df.head()

** Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.**

In [None]:
byDate = df.groupby('date').count()

In [None]:
byDate['twp'].plot()
plt.tight_layout()

** Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call**

In [None]:
df[df['reason']=='Traffic'].groupby('date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()

In [None]:
df[df['reason']=='EMS'].groupby('date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()

In [None]:
df[df['reason']=='Fire'].groupby('date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()

** Now let's move on to creating heatmaps with seaborn and our data. We'll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week.**

In [None]:
dayHour = df.groupby(by=['day of week', 'hour']).count()['reason'].unstack()

In [None]:
dayHour

** HeatMap using this new DataFrame. **

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHour, cmap='YlGnBu')

** Now create a clustermap using this DataFrame. **

In [None]:
sns.clustermap(dayHour, cmap='YlGnBu')

** Now same plots and operations for a DataFrame that shows the Month as the column. **

In [None]:
dayMonth = df.groupby(by=['day of week', 'month']).count()['reason'].unstack()

In [None]:
dayMonth

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayMonth, cmap='YlGnBu')

In [None]:
sns.clustermap(dayMonth, cmap='YlGnBu')