# 911 Calls Data Analysis

* In this exploratory data analysis We will be analyzing some 911 call data. The data contains the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address
* e: String variable, Dummy variable (always 1)

## Data and Setup
____

### Importing Libraries

In [None]:
# Import statements
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns

### Reading Dataset

In [None]:
dataset = pd.read_csv('../input/911.csv')
dataset.head()

lets check info about data available in the dataset

In [None]:
dataset.info()

- We are having total 99492 data entries
- There are some missing values in columns zip,twp,addr

## Data Analysis
------

In [None]:
dataset.apply(lambda x:x.nunique())

- Calls are from 104 different zip codes
- Calls are made for 110 different reasons
- Calls are from 68 different Townships

Lets checkout number of call for each reason

In [None]:
dataset.title.value_counts()

From the above data it is clear that calls are divided into three main categories
- EMS
- Fire
- Traffic<br>
lets confirm it by counting values for each category

In [None]:
dataset['type'] = dataset.title.apply(lambda x: x.split(':')[0])

In [None]:
dataset.type.value_counts()

- Most of the calls were regarding EMS(Emergency Medical Services)

In [None]:
plt.figure(figsize=(7,4),dpi=100)
sns.countplot(x=dataset.type)
plt.title("Call Distribution by type")

In [None]:
type(dataset.timeStamp[0])

Timestamps are in string format. So we have to convert them into DateTime objects using pd.to_datetime

In [None]:
dataset.timeStamp = pd.to_datetime(dataset.timeStamp)

In [None]:
#create three new columns for Month, Hour, Day
dataset['Month'] = dataset.timeStamp.apply(lambda x:x.month)
dataset['Hour'] = dataset.timeStamp.apply(lambda x:x.hour)
dataset['Day'] = dataset.timeStamp.apply(lambda x:x.dayofweek)
dataset.head()

Here dayofweek is in integer format, so we have to map it into regular format

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
dataset['Day'] = dataset['Day'].map(dmap)

In [None]:
plt.figure(figsize=(7,4),dpi=100)
sns.countplot(x=dataset.Day)
plt.title('No of Calls per dayofweek')

- Tuesday has the most no calls
- Saturday and Sunday has relatively less calls

In [None]:
# No of calls per Day of Week by Reason 
plt.figure(figsize=(7,4),dpi=100)
sns.countplot(x=dataset.Day,hue=dataset.type)
plt.title('No of Calls per dayofweek')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

- Number of EMS cases is most all over the week
- Number of Fire cases is approximately same for all days
- Number of Traffic cases is less on Saturday and Sunday

In [None]:
# No of calls by Month
plt.figure(figsize=(7,4),dpi=100)
sns.countplot(x=dataset.Month,hue=dataset.type)
plt.title('No of Calls by Month')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

- January month has most calls for EMS, Fire and Traffic
- Whereas number calls has been dropped in Augest,December

**You should have noticed it was missing some Months, let's see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months.**

In [None]:
byMonth = dataset.groupby('Month').count()
byMonth

In [None]:
# Plotting Calls per month using Line Plot
plt.figure(figsize=(7,4),dpi=100)
plt.title('no of calls per month')
byMonth.lat.plot()

Now lets see if we can create a linear fit on the number of call per month 

In [None]:
g = sns.lmplot(x='Month',y='lat',markers='x',data=byMonth.reset_index(),size=8)
plt.title('linear fit to the no. of calls per Month')

Lets create a date column from timestamp

In [None]:
dataset['Date'] = dataset.timeStamp.apply(lambda x:x.date())
dataset.Date.value_counts()

Now groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

In [None]:
plt.figure(figsize=(7,4),dpi=100)
dataset.groupby('Date').count().lat.plot()
plt.title('No. of calls by Date')
plt.tight_layout()

In [None]:
plt.figure(figsize=(7,4),dpi=100)
dataset[dataset['type'] == 'Traffic'].groupby('Date').count()['lat'].plot()
plt.title('Reason -> Traffic (by Dates)')
plt.tight_layout()

In [None]:
plt.figure(figsize=(7,4),dpi=100)
dataset[dataset['type'] == 'Fire'].groupby('Date').count()['lat'].plot()
plt.title('Reason -> Fire (by Dates)')
plt.tight_layout()

In [None]:
plt.figure(figsize=(7,4),dpi=100)
dataset[dataset['type'] == 'EMS'].groupby('Date').count()['lat'].plot()
plt.title('Reason -> EMS (by Dates)')
plt.tight_layout()

In [None]:
dayHour = dataset.groupby(by=['Day','Hour']).count()['type'].unstack()
dayHour

We can make Heatmaps to easily understand the above data.

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayHour)

- Most of the calls were between 8 to 18 hours
- 0 to 6 hours has ralatively lower number of phone calls (less than 300) i.e at night time
- On Sat, Sun there are more number of calls at night time and lower number of  calls througout the day (Compared to other days of week)

In [None]:
dayMonth = dataset.groupby(by=['Day','Month']).count()['type'].unstack()
dayMonth

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(dayMonth)

- In January month Saturday has highest number of calls of the year
- December has the relatively lower number of calls