Hello,

This is a walkthrough of the 911 calls received dataset. As we go we'll interactivel visualize this data and try and find some interesting facts in it!

## Data and Setup

____
** Importing numpy and pandas **

In [None]:
import numpy as np
import pandas as pd

** Importing visualization libraries **

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

** Reading the csv file as a dataframe called df **

In [None]:
df= pd.read_csv('../input/911.csv')

** Checking the info() of the df **

In [None]:
df.info()

** Checking the head of df **

In [None]:
df.head()

## Lets retrive some basic facts from the data

** What are the top 5 zipcodes for 911 calls? **

In [None]:
df['zip'].value_counts().head(5)

** What are the top 5 townships (twp) for 911 calls? **

In [None]:
df['twp'].value_counts().head(5)

** For how many unique problems were the calls received|? **

In [None]:
df['title'].nunique()

## Creating new features

** In the titles column there are "Reasons/Departments" specified before the title code. We can create a new feature 'reason' which stores 3 major categories of incidents: 'EMS, Traffic, Fire **

In [None]:


df['reason']=df['title'].apply(lambda x: x.split(':')[0])
df

** What is the most common Reason for a 911 call based on  this 'reason' column? **

In [None]:
df['reason'].value_counts()

** It is very clear that most 911 calls were for 'EMS' i.e Health emergency, followed by 'Traffic', and 'Fire' **

** Let's Visualize the data **

In [None]:
df['reason'].iplot(kind='histogram', size=8)

___
** Now let's extract some information by day/time of the call made **

In [None]:
type(df['timeStamp'].iloc[0])

** You must have noticed that these timestamps are still strings. Let's convert the column from strings to DateTime objects. **

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])

In [None]:
df['hour']= df['timeStamp'].apply( lambda x: x.hour)
df['month']=df['timeStamp'].apply(lambda x: x.month)
df['dayofweek']=df['timeStamp'].apply(lambda x:x.dayofweek)


** Notice how the Day of Week is an integer 0-6. We'll have to change it to their corresponding strings, like: **

    dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
    dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}

In [None]:
df['dayofweek']=df['dayofweek'].map(dmap)
df

** Now let's see when and for what reason the calls were made on the respective days of the week **

In [None]:
sns.countplot(x='dayofweek',hue='reason',data=df,palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2,borderaxespad=0)

**Now do the same for Month:**

In [None]:
sns.countplot(x='month',hue='reason',data=df,palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0)


**Did you notice something strange about the Plot?**

_____

** You must have noticed it was missing some Months, let's see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months**

In [None]:
callsByMonth=df.groupby('month').count()
callsByMonth.head()


** Now create a simple plot off of the dataframe indicating the count of calls per month. **

In [None]:
callsByMonth['reason'].iplot(kind='line')


**Now this looks much better. As you can see this plot has filled in for the missing months**

**Let's Create a new column called 'Date' that contains the date from the timeStamp column. **

In [None]:
df['date']=df['timeStamp'].apply(lambda x: x.date())


** Let's now visually analyze the number of 911 calls made according to date/months **

In [None]:
callsByDate=df.groupby('date').count()['reason']
callsByDate.iplot(kind='line',size=8)


** Now let's see the reason for those calls by date**

In [None]:
df[df['reason']=='Traffic'].groupby('date').count()['twp'].iplot(kind='line',size=8,title='traffic')


In [None]:
df[df['reason']=='Fire'].groupby('date').count()['twp'].iplot(kind='line',size=8,title='Fire')


In [None]:
df[df['reason']=='EMS'].groupby('date').count()['twp'].iplot(kind='line',size=8,title='EMS')


____
**Let's see if heatmaps can give us some more insights about our data. To do so we'll have to restructure our dataframe's index and column first**

In [None]:
newData= df.groupby(['dayofweek','hour']).count()['reason'].unstack()
newData

In [None]:
newData.iplot(kind='heatmap',size=8,colorscale='RdYlBu',title='Total Calls received by hours')

** The heatmap above clearly tells us the picture. The number of calls slightly decrease on weekends, and are generally higher around evening/afternoon time, i.e. from 1200 hrs to 1800 hrs **

** Now let's get similar insight but this time let's compare months by calls **

In [None]:
newData2= df.groupby(['dayofweek','month']).count()['reason'].unstack()
newData2


In [None]:
newData2.iplot(kind='heatmap',size=8,colorscale='RdYlBu',title='Total Calls received by days and months')

# Thanks for visiting!