In [None]:
# Data and Setup

import numpy as np 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')

#Reading in the CSV file

df = pd.read_csv('../input/911.csv')


In [None]:
# Checking the info of the 'df'

df.info()

**The above result primarily shows that there is 9 columns in the 911 calls dataset. There are almost 423909 entries.  It further shows the type of each column such as latitude ('lat') is a float object etc.**

In [None]:
# Checking the head of the 911 calls dataset and asking for the first 5 results

df.head(5)

**Analysing some basic aspects of the dataset:**

In [None]:
# Top five Zip Codes for the 911 calls

df['zip'].value_counts().head(5)

19401.0 and 19464,0 were the top two zip codes from where 28656 and 27948 calls were made to 911.

In [None]:
# Top Five townships (twp) for the 911 calls

df['twp'].value_counts().head(5)

Most of the calls (36441 calls) were made from Lower Merion township to the 911.

**New Variables need to be created in order to dive deep into the data and better analyse it. The title column of the 911 calls dataset shows the reasons for the calls and departments where calls were made (see the example below). 'EMS' is the department where call was made and 'Back Pains/Injury' was the reason for the call. New columns need to be made to separate the 'Reasons' from the 'Departments'. **

In [None]:
df['title'].iloc[0] # shows the first instance in the title column

**Creating the 'Reasons' Column in the 911 calls dataset**

In [None]:
df['Reasons'] = df['title'].apply(lambda title: title.split(':')[1])

In [None]:
df['Reasons'].value_counts().head(5)

Vehichle Accident was the most common reason for the calls to the 911

**Creating the 'Departments' Column in the 911 calls dataset**

In [None]:
df['Departments'] = df['title'].apply(lambda title: title.split(':')[0])

In [None]:
df['Departments'].value_counts()

EMS - Emergency department was the most common department for which calls were made to the 911 as the result shows above.

****The above findings can also be visualised such as:**

In [None]:
sns.countplot(x='Departments',data=df,palette='coolwarm')
plt.tight_layout()

**Let's focus now on the time information given in the 911 calls dataset**

In [None]:
df['timeStamp'].iloc[0] # the timeStamp is a string

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp']) # converting timeStamp into a Datetime Object

In [None]:
time = df['timeStamp'].iloc[0] # extracting first entry of the timeStamp


In [None]:
time.hour # can grab specific attributes from a Datetime object by calling them - hours in the first entry in the timeStamp column

**Let's create 3 more columns - Hour, Month, and Day of Week from the 'timeStamp' datetime object in order to further analyse the data from the 'timeStamp' aspect.**

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'} # 'Day of Week' is an integer from 0-6 which need to be converted into actual days of week.

In [None]:
df['Day of Week'] = df['Day of Week'].map(dmap)

**Now we can visualize these newly created variables**

In [None]:
sns.countplot(x='Day of Week',data=df,hue='Departments',palette='coolwarm')
# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

**Most of the calls made to 911 on Friday were for Emergency department . Sunday had the less traffic related calls which is normal due to less traffic on the roads. Traffic-related 911 calls spike up on Friday.**

In [None]:
plt.figure(figsize=(8,6))
sns.countplot(x='Month',data=df,hue='Departments',palette='coolwarm') # For month coloumn now

# To relocate the legend
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

**In January, EMS-related calls increased sharply.  Did festive period (Christmas and New Year) play any role in spike-up in calls for Emergencies in anyway for instance excessive drinking during festive period and on New Year could lead to number of health-related issues  ?  - This question could not be answered and analysed through this dataset. **

In [None]:
byMonth = df.groupby('Month').count() # Groupby object called byMonth
byMonth.head()

In [None]:
# Could be any column
byMonth['twp'].plot()

**911 calls increased sharply in Otober and then greatly reduced in November. What was the reason for this sudden sharp increase and then downfall is not clear**

In [None]:

sns.lmplot(x='Month',y='twp',data=byMonth.reset_index())

**Linear Model Fit does show the spikes in October and then shows sharp decrease. The error  steadily increases and grows from mid -year (June) till end of year (December) as Seaborn tries to indicate it through the shaded area. May be Linear Model Fit is not actually the best choice for this.**

**Heatmaps**

In [None]:
dayHour = df.groupby(by=['Day of Week','Hour']).count()['Reasons'].unstack()
dayHour.head()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHour,cmap='coolwarm')