**911 Calls Dataset - Basic Work**

On this notebook, I will work on Emergency - 911 Calls data set. The study will include basic data mining and interpretation operations and data visualisation using various libraries 

Let's start with importing libraries that we will use

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as plt
%matplotlib inline

Read datafile

In [None]:
df = pd.read_csv('../input/montcoalert/911.csv')

Take a quick look at data set

In [None]:
df.info()

Check the head of dataset

In [None]:
df.head()

Let's play with the data to make it ready for analysis

Firstly, I want to see all of the categories that exist under 'title'

In [None]:
df['title'].unique()

It seems that we have three main categories as 'EMS', 'Fire' and 'Traffic'. I want to keep them at a separate columns called 'Category' and 'Detail'. After that I want to delete 'title' column

In [None]:
df['Category']=df['title'].apply(lambda x: x.split(':')[0])
df['Detail']=df['title'].apply(lambda x: x.split(':')[1])
df=df.drop('title',1)

In [None]:
df['Category'].unique()

In [None]:
df['Detail'].unique()

I want to also keep dates and time in separate columns so that I will visualise them in the following sections. First thing I'll do is to see the type of timeStamp column, and then break it to days, months, years, and hour

In [None]:
type(df['timeStamp'].iloc[0])

The type is string, then I need to use pandas function that converts string to date and time and break the strings into four

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])

In [None]:
df['Month']=df['timeStamp'].apply(lambda x: x.month)
df['Day']=df['timeStamp'].apply(lambda x: x.dayofweek)
df['Hour']=df['timeStamp'].apply(lambda x: x.hour)
df['Year']=df['timeStamp'].apply(lambda x: x.year)

In [None]:
df.head()

At last, let's map the days and months

In [None]:
days = {0: 'Mon', 1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
df['Day']=df['Day'].map(days)
months = {1:'Jan',2:'Feb',3:'Mar',4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sep',10:'Oct',11:'Nov',12:'Dec'}
df['Month']=df['Month'].map(months)

I think I get now the data structure that I need for my analysis. I will now continue with some basic ones

**Analysis time!!**

Let's see the top 5 zip codes and townships by number of cases

In [None]:
df['zip'].value_counts().head(5)

In [None]:
df['twp'].value_counts().head(5)

Let's find out the days and months with the most calls are made

In [None]:
df['Day'].value_counts()

In [None]:
df['Month'].value_counts()

Let's also find out the top call categories and details

In [None]:
df['Category'].value_counts()

In [None]:
df['Detail'].value_counts().head(5)

**Data visualisation time**

Number of calls per year - we have to first group the number of calls by year and then make a plot

In [None]:
byYear = df.groupby('Year').count()
byYear['twp'].plot()

Let's take a look at yearly call categories

In [None]:
sns.countplot(x='Year',hue='Category',data=df)

Let's also look at monthly and daily split if we can see any trends there

In [None]:
sns.countplot(x='Day',hue='Category',data=df,order=['Mon','Tue','Wed','Thu','Fri','Sat','Sun'])

In [None]:
sns.countplot(x='Month',hue='Category',data=df,order=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

While daily stats show that traffic related calls increase during the end of the weekdays and decrease during the weekend, the other call reasons stayed more or less the same. There is no obvious monthly trends except we see that there are spikes in Jan, Mar, and Oct for Traffic related calls

As a final analysis, we will take a look at three different analysis using heatmaps:
* The call distribution per day and the hour
* The call distribution per month and day
* The call distribution per hour and category
* The call distribution per day and category

By this way, we will be able to see the likeliness of getting a call in a given day or hour in 4 different perspectives.

For the following sections, we will firstly restructure the dataframe according to two dimensions we are investigating

**Day and Hour Heatmap and Clustermap**

In [None]:
dH = df.groupby(by=['Day','Hour']).count()['Category'].unstack()
dH.head()

In [None]:
sns.heatmap(dH)

In [None]:
sns.clustermap(dH)

So it seems that most 911 calls take place in working days between 15-17h

**Month and Day Heatmap and Clustermap**

In [None]:
dM = df.groupby(by=['Day','Month']).count()['Category'].unstack()
dM.head()

In [None]:
sns.heatmap(dM)

In [None]:
sns.clustermap(dM)

There is not a significant observation here except Fridays in March seems like a busy period

**Category and Hour Heatmap and Clustermap**

In [None]:
cH = df.groupby(by=['Category','Hour']).count()['Detail'].unstack()
cH.head()

In [None]:
sns.heatmap(cH)

In [None]:
sns.clustermap(cH)

While it seems that the calls for fire  were not affected by hours that much, we can see that calls regarding traffic were mostly between 15-17h and EMS related calls were mostly made between 10-13h

**Day and Category Heatmap and Clustermap**

In [None]:
cC=df.groupby(by=['Category','Day']).count()['Detail'].unstack()
cC.head()

In [None]:
sns.heatmap(cC)

In [None]:
sns.clustermap(cC)

The trend we see here shows even a less significance in terms of interpretation. Number of calls received for fire events were not affected by the day, while we can see that most of the calls for EMS and Traffic are still made during weekdays

**Conclusion**

This notebook was created to take a look at 911 calls and I used the very basic libraries and commands. I hope it will encourage the ones who are in the beginning of their journeys to take a step further and help them realise how much can be done even with a smaller database and smaller set of commands. Hope you liked it!