## 911 Calls Capstone Project
#### By: Dana Cassidy

*This notebook was based off questions and instructions from the Python for Data Science and Machine Learning Bootcamp by Jose Portilla. **I solved all of the questions on my own merit and time.** I adjusted some of the markdown instructions to make more sense with my submission. *

## Data and Setup

____
** Import numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

** Import visualization libraries and set %matplotlib inline. **

In [None]:
from plotly import __version__
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode,plot, iplot
init_notebook_mode(connected = True)
cf.go_offline()

** Read in the csv file as a dataframe called df **

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

** Check the info() of the df **

In [None]:
df.info()

** Check the head of df **

In [None]:
df.head()

## Starting out

** What are the top 5 zipcodes for 911 calls? **

In [None]:
df['zip'].value_counts().head(5)

** What are the top 5 townships (twp) for 911 calls? **

In [None]:
df['twp'].value_counts().head(5)

** Looking at the 'title' column, how many unique title codes are there? **

In [None]:
len(df.groupby('title').nunique())

## Creating new features

** In the titles column there are "Reasons/Departments" specified before the title code. These are EMS, Fire, and Traffic. I will use .apply() with a custom lambda expression to create a new column called "Reason" that contains this string value.** 

In [None]:
def get_reason(title):
    return title.split(':')[0]

df['Reason'] = df['title'].apply(lambda x: get_reason(x))

df.head()

** What is the most common Reason for a 911 call based off of this new column? **

In [None]:
df['Reason'].value_counts().head()

** Now use seaborn to create a countplot of 911 calls by Reason. **

In [None]:
import seaborn as sns

sns.countplot(x= df['Reason'], data= df)


___
** Now let us begin to focus on time information. What is the data type of the objects in the timeStamp column? **

In [None]:
type(df['timeStamp'].iloc[0])

** You should have seen that these timestamps are still strings. I will use pd.to_datetime to convert the column from strings to DateTime objects. **

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
time = df['timeStamp'].iloc[3]
time.hour

**  Now that the timestamp column are actually DateTime objects, I will use .apply() to create 3 new columns called Hour, Month, and Day of Week. I will create these columns based off of the timeStamp column.

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)

** I will use the .map() with this dictionary to map the actual string names to the day of the week: **

In [None]:
# Here is my dictionary for days of the week
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['Day of Week'] = df['Day of Week'].map(dmap)

** Now I will use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **

In [None]:
sns.countplot(x= df['Day of Week'], data= df, hue= df['Reason'])

** Now I will do the same for Month **

In [None]:
sns.countplot(x= df['Month'], data= df, hue= df['Reason'])

** I noticed the data was missing some months, so I will fill in the blanks by plotting the information in another way **

** I will create a gropuby object called byMonth, where I group the DataFrame by the month column and use the count() method for aggregation. **

In [None]:
byMonth = df.groupby('Month').count()
byMonth.head()

** Now I will create a simple plot off of the dataframe indicating the count of calls per month. **

In [None]:
byMonth['twp'].plot()

** Now see if you can use seaborn's lmplot() to create a linear fit on the number of calls per month. Keep in mind you may need to reset the index to a column. **

In [None]:
sns.lmplot(x='Month',y='twp',data=byMonth.reset_index())

**I will create a new column called 'Date' that contains the date from the timeStamp column.** 

In [None]:
df['Date']= df['timeStamp'].apply(lambda time: time.date())
df.head()

Now I will groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

In [None]:
byDate = df.groupby('Date').count()

In [None]:
byDate['twp'].plot()

** Now I will recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call**

In [None]:
import matplotlib.pyplot as plt
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
plt.tight_layout()


In [None]:
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
plt.tight_layout()


In [None]:
df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
plt.tight_layout()


** Now I will create heatmaps with seaborn and the data. I first will restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. **

In [None]:
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reason'].unstack()
dayHour.head()

** Now I will create a HeatMap using this new DataFrame. **

In [None]:
plt.figure(figsize=(12,6))
heat = sns.heatmap(dayHour)

** Now create a clustermap using this DataFrame. **

In [None]:
plt.figure(figsize=(9,9))

sns.clustermap(dayHour)

** Now I will repeat these same plots and operations, for a DataFrame that shows the Month as the column. **

In [None]:
dayMonth = df.groupby(by=['Day of Week','Month']).count()['Reason'].unstack()
dayMonth.head()

In [None]:
plt.figure(figsize=(10,6))
heat = sns.heatmap(dayMonth)

In [None]:
plt.figure(figsize=(8,6))
sns.clustermap(dayMonth)