This is a 'Work in progress' project - I am continuing updating this notebook.

**Setup**

Load Packages and Data

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('whitegrid')
%matplotlib inline
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

In [None]:
df = pd.read_csv('../input/911.csv')

**Part I: Data**

First, let us take a look at how many rows and columns in the data and what are the data types?

In [None]:
df.info()

In [None]:
df.head(5)

Notice that 'timeStamp' column is Pandas object datatype ( = Python Str datatype), let us keep this in mind.

**Basic Questions**

What are the top 5 zipcodes for 911 calls?

In [None]:
df['zip'].value_counts().head()

What are the top 5 townships (twp) for 911 calls?

In [None]:
df['twp'].value_counts().head()

**Creating new features**

Seperate reason code in 'title' column and create a new column called 'Reason'

In [None]:
df['Reason'] = df['title'].apply(lambda x: x.split(':')[0])

In [None]:
df['Reason'].value_counts()

In [None]:
sns.countplot(x = 'Reason', data = df, palette = 'viridis')

Let us do some analysis and find out when and what day of the week people made the 911 Calls.

In order to do that, we need to first convert the 'timeStamp' column from strings to DateTime objects.

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda x: x.hour)
df['Month'] = df['timeStamp'].apply(lambda x: x.month)
df['Day of Week'] = df['timeStamp'].apply(lambda x: x.dayofweek)

In [None]:
df['Day of Week'].head()

In [None]:
dmap = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}
df['Day of Week'] = df['Day of Week'].map(dmap)

In [None]:
sns.countplot(x = 'Day of Week', data = df)
plt.legend(bbox_to_anchor = (1,1))
plt.title('911 Calls per Day of Week')

In [None]:
sns.countplot(x = 'Day of Week', data = df, hue = 'Reason')
plt.legend(bbox_to_anchor = (1,1))

We can see that Sunday has the least calls compared to other days in a week. (In all three types of calls)

Do the same thing for Month

In [None]:
sns.countplot(x='Month',data = df)
plt.legend(bbox_to_anchor = (1,1))

In [None]:
sns.countplot(x='Month',data = df,hue = 'Reason')
plt.legend(bbox_to_anchor = (1,1))

In [None]:
byMonth = df.groupby('Month').count()
byMonth.head()

In [None]:
byMonth['twp'].plot()

November and December have the lowest number of calls. Is there any linear relationship between Month and number of calls? 

In [None]:
sns.lmplot(x = 'Month', y='twp', data = byMonth.reset_index())

Create a new column called "Date" that contains the date from the timeStamp column.

In [None]:
df['Date'] = df['timeStamp'].apply(lambda x: x.date())

In [None]:
df.head()

Groupby Date column with count() to aggregate and create a plot of counts of 911 Calls.

In [None]:
df.groupby('Date').count()['twp'].plot.line(figsize = (15,4))
plt.tight_layout()

Seperate above plot to 3 plots and each is representing a Reason for 911 Calls.

In [None]:
df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot.line(figsize = (15,4))
plt.title('Traffic')
plt.tight_layout()

In [None]:
df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot.line(figsize = (15,4))
plt.title('Fire')
plt.tight_layout()

In [None]:
df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot.line(figsize = (15,4))
plt.title('EMS')
plt.tight_layout()