# 911 Calls Data Visualization


** Import libraries and set %matplotlib inline.  (Chart display within Jupyter Notebook)**
 * ** numpy  **
 * ** pandas  **
 * ** visualization  **
 * ** plotly   ** (offine use for notebook)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import plotly.graph_objs as go 
from plotly.offline import init_notebook_mode,iplot
init_notebook_mode(connected=True)

** Load Dataset using panda's read_csv function called callsDataFrame **

In [None]:
callsDataFrame = pd.read_csv('../input/911.csv')

** Check the info() of the callsDataFrame **

In [None]:
callsDataFrame.info()

** Check the head of callsDataFrame **

In [None]:
callsDataFrame.head()

** Lets find top 5 zipcodes for 911 calls. **

In [None]:
callsDataFrame['zip'].value_counts().head(5)

** Lets find top 5 townships (twp) for 911 calls **

In [None]:
callsDataFrame['twp'].value_counts().head(5)

** Lets find 5 latitude and longitude for 911 calls **

In [None]:
callsDataFrame['lat'].value_counts().head(5)

In [None]:
callsDataFrame['lng'].value_counts().head(5)

** Take a look at the 'title' column, Lets find unique title codes are there? **
** There are two ways we can find it. Lets see**

In [None]:
callsDataFrame['title'].nunique()

In [None]:
len(callsDataFrame['title'].unique())

## Creating new features

** In the titles column there are "Reasons/Departments" specified before the title code. <br/>These are EMS, Fire, and Traffic. <br/>Use <font size="3" color="red">.apply()</font> with a custom lambda expression to create a new column called "Reason" that contains this string value.** 

**For example, if the <font size="3" color="red">title</font> column value is <font size="3" color="red">EMS: BACK PAINS/INJURY</font> , the Reason column value would be <font size="3" color="red">EMS</font>. **

In [None]:
x = callsDataFrame['title'].iloc[0]

In [None]:
x

In [None]:
x.split(':')[0]

In [None]:
callsDataFrame['Reason'] = callsDataFrame['title'].apply(lambda title: title.split(':')[0])

In [None]:
callsDataFrame['Reason'].head()

In [None]:
callsDataFrame['Reason'].value_counts()

** Let's use seaborn to create a countplot of 911 calls by Reason. **

In [None]:
sns.countplot(x='Reason', data=callsDataFrame)

* ** Let's use plotly to create same above chart of 911 calls by Reason. **

In [None]:
data = [go.Bar(
            x=['EMS', 'Fire', 'Traffic'],
            y=callsDataFrame['Reason'].value_counts()
    )]

iplot(data, filename='basic-bar')


** Lets look into the <font color="red">timeStamp</font> columns**

In [None]:
type(callsDataFrame['timeStamp'].iloc[0])

** Use [pd.to_datetime](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) to convert the column from strings to DateTime objects. **

In [None]:
callsDataFrame['timeStamp'] = pd.to_datetime(callsDataFrame['timeStamp'])

In [None]:
callsDataFrame['timeStamp'].iloc[0]

** You can now grab specific attributes from a Datetime object by calling them. For example:**

    time = df['timeStamp'].iloc[0]
    time.hour

**You can use Jupyter's tab method to explore the various attributes you can call. Now that the timestamp column are actually DateTime objects, use .apply() to create 3 new columns called Hour, Month, and Day of Week. You will create these columns based off of the timeStamp column, reference the solutions if you get stuck on this step.**

In [None]:
time = callsDataFrame['timeStamp'].iloc[0]
time.hour

In [None]:
time.month

In [None]:
time.dayofweek

In [None]:
callsDataFrame['Hour'] = callsDataFrame['timeStamp'].apply(lambda time: time.hour)

In [None]:
callsDataFrame['Month'] = callsDataFrame['timeStamp'].apply(lambda time: time.month)

In [None]:
callsDataFrame['Day Of Week'] = callsDataFrame['timeStamp'].apply(lambda time: time.dayofweek)

In [None]:
callsDataFrame.head(5)

** Notice how the Day of Week is an integer 0-6. Use the .map() with this dictionary to map the actual string names to the day of the week: **

    dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
callsDataFrame['Day Of Week'] = callsDataFrame['Day Of Week'].map(dmap)

In [None]:
callsDataFrame.head()

** Now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column. **

In [None]:
sns.countplot(x = 'Day Of Week', data = callsDataFrame, hue = 'Reason', palette="rocket")

# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

**Now do the same for Month:**

In [None]:
sns.countplot(x = 'Month', data = callsDataFrame, hue = 'Reason', palette="rocket")

# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

** Here noticed that, it was missing some Months, let's see if we can fill in the missing months**

** Now create a gropuby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation.**

In [None]:
byMonth = callsDataFrame.groupby('Month').count()

In [None]:
byMonth.head()

** Now create a simple plot off of the dataframe indicating the count of calls per month. **

In [None]:
byMonth['lat'].plot()

In [None]:
sns.countplot(x='Month', data=callsDataFrame, palette='rocket')

# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

** Now see if you can use seaborn's lmplot() to create a linear fit on the number of calls per month. Keep in mind you may need to reset the index to a column. **

In [None]:
## Resert Or convert the Month index to Column
byMonth.reset_index()

In [None]:
## Here I'm using seaborn liner plot passing Month Column as DATA
sns.lmplot(x='Month', y='twp', data=byMonth.reset_index())

**Create a new column called 'Date' that contains the date from the timeStamp column. You'll need to use apply along with the .date() method. ** 

In [None]:
callsDataFrame['timeStamp'].iloc[0]

In [None]:
callsDataFrame['timeStamp'].iloc[0].date()

In [None]:
# Creating new 'Date' column using timeStamp column
callsDataFrame['Date'] = callsDataFrame['timeStamp'].apply(lambda ts : ts.date())

In [None]:
callsDataFrame['Date'].head()

In [None]:
## Here, we can see Date Column inside DataFrame
callsDataFrame.head()

** Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.**

In [None]:
callsDataFrame.groupby('Date').count()['lat'].plot()
plt.tight_layout()

** Now recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call**

In [None]:
callsDataFrame[callsDataFrame['Reason'] == 'Traffic'].groupby('Date').count()['lat'].plot()
plt.title('Traffic')
plt.tight_layout()

In [None]:
callsDataFrame[callsDataFrame['Reason'] == 'Fire'].groupby('Date').count()['lat'].plot()
plt.title('Fire')
plt.tight_layout()

In [None]:
callsDataFrame[callsDataFrame['Reason'] == 'EMS'].groupby('Date').count()['lat'].plot()
plt.title('EMS')
plt.tight_layout()

____
** Now let's move on to creating  heatmaps with seaborn and our data. We'll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. There are lots of ways to do this, but I would recommend trying to combine groupby with an [unstack](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.unstack.html) method. Reference the solutions if you get stuck on this!**

In [None]:
# Multilevel index count
callsDataFrame.groupby(by=['Day Of Week', 'Hour']).count().head()

In [None]:
callsDataFrame.groupby(by=['Day Of Week', 'Hour']).count()['Reason'].head()

In [None]:
# Having matric level table (* There is an alternate ways like pivot table)
callsDataFrame.groupby(by=['Day Of Week', 'Hour']).count()['Reason'].unstack()

In [None]:
dayHour = callsDataFrame.groupby(by=['Day Of Week', 'Hour']).count()['Reason'].unstack()

** Now create a HeatMap using this new DataFrame. **

In [None]:
## Day Of Week Vs Hour
sns.heatmap(dayHour)

** Now create a clustermap using this DataFrame. **

In [None]:
sns.clustermap(dayHour)

** Now repeat these same plots and operations, for a DataFrame that shows the Month as the column. **

In [None]:
dayMonth = callsDataFrame.groupby(by=['Day Of Week', 'Month']).count()['Reason'].unstack()

In [None]:
sns.heatmap(dayMonth)

In [None]:
sns.clustermap(dayMonth)

**That's It!**