# 911 Calls Exploratory Analysis

## Data and Set Up

###  ** Import libraries ** 

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

###  ** Read Data **

In [2]:
data = pd.read_csv('data.csv')

###  ** Checking the info of the 'df'  ** 


In [3]:

data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   lat        99492 non-null  float64
 1   lng        99492 non-null  float64
 2   desc       99492 non-null  object 
 3   zip        86637 non-null  float64
 4   title      99492 non-null  object 
 5   timeStamp  99492 non-null  object 
 6   twp        99449 non-null  object 
 7   addr       98973 non-null  object 
 8   e          99492 non-null  int64  
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB


#### The above result shows that there is 9 columns in the 911 calls dataset. There are almost 99492 entries. It further shows the type of each column.

In [4]:
# Check head
data.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:40:00,NEW HANOVER,REINDEER CT & DEAD END,1
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:40:00,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 17:40:00,NORRISTOWN,HAWS AVE,1
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 17:40:01,NORRISTOWN,AIRY ST & SWEDE ST,1
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 17:40:01,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1


## Basic Questions

###  ** What are the top 5 zipcodes for 911 calls?** 

In [5]:
# data['zip'].value_counts().head()
# or
data['zip'].value_counts().nlargest()


19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: zip, dtype: int64

###  ** What are the top 5 townships(twp) for 911 calls?**

In [6]:
data['twp'].value_counts().head()

LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: twp, dtype: int64

### ** How many unique title codes are there?**


In [7]:
data['title'].nunique()

110

#### The 'title' column of the 911 calls dataset shows the reasons for the calls.and there are 110 reasons are availables.

## Creating New Features

###  ** In the title column there are 'Reasons/Departments' specified before the title code.  These are EMS, Fire, and Traffic.Use .apply() with a custom lambda expression to create a new column called 'Reason' that contains this string value.**

### creating the 'reason' column in the 911 calls dataset.

In [8]:
data['reason']=data['title'].apply(lambda x: x.split(':')[0])
data.head()

Unnamed: 0,lat,lng,desc,zip,title,timeStamp,twp,addr,e,reason
0,40.297876,-75.581294,REINDEER CT & DEAD END; NEW HANOVER; Station ...,19525.0,EMS: BACK PAINS/INJURY,2015-12-10 17:40:00,NEW HANOVER,REINDEER CT & DEAD END,1,EMS
1,40.258061,-75.26468,BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...,19446.0,EMS: DIABETIC EMERGENCY,2015-12-10 17:40:00,HATFIELD TOWNSHIP,BRIAR PATH & WHITEMARSH LN,1,EMS
2,40.121182,-75.351975,HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...,19401.0,Fire: GAS-ODOR/LEAK,2015-12-10 17:40:00,NORRISTOWN,HAWS AVE,1,Fire
3,40.116153,-75.343513,AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...,19401.0,EMS: CARDIAC EMERGENCY,2015-12-10 17:40:01,NORRISTOWN,AIRY ST & SWEDE ST,1,EMS
4,40.251492,-75.60335,CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...,,EMS: DIZZINESS,2015-12-10 17:40:01,LOWER POTTSGROVE,CHERRYWOOD CT & DEAD END,1,EMS


###  ** What is the most common reason for a 911 call based on this new column?**

In [9]:
data['reason'].value_counts()

EMS        48877
Traffic    35695
Fire       14920
Name: reason, dtype: int64

####  EMS is the most common reason for the calls to 911. 

###  ** Use seaborn to create a countplot of 911 calls by Reason**

In [None]:
sns.countplot(x = 'reason',data = data)

<AxesSubplot:xlabel='reason', ylabel='count'>

###  ** What is the data type of the objects in the timeStamp column?**

In [None]:
type(data['timeStamp'][0])


####  string is the data type of the objects in the timeStamp column.

###  ** Convert timeStamp from strings to DateTime object** 

In [None]:
data['timeStamp'] = pd.to_datetime(data['timeStamp'])


In [None]:
data['timeStamp']

#### now the data type of 'timeStamp' column is datetime.

###  ** Now that the timestamp column are actually DateTime objects, use .apply() to create 3 new columns called Hour, Month, and Day of Week. Create these columns based on of the timeStamp column.** 

### lets create 3 new columns called  'Hour',  'Month' and 'Day of Week'.

In [None]:
# Create hour column
data['hour'] = data['timeStamp'].apply(lambda x: x.hour)

In [None]:
# Create month column
data['month'] = data['timeStamp'].apply(lambda x: x.month)

In [None]:
# Create day of week
data['day of week'] = data['timeStamp'].apply(lambda x: x.dayofweek)

In [None]:
# Confirm columns were added to dataframe or not?
data.head()

#### here we create new 3 columns hour,month and day of week.

###  ** Notice how the Day of Week is an integer 0-6. Use the .map() with a dictionary to map the actual string names to the day of the week** like this: {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'} 

In [None]:
data['day of week'] = data['day of week'].map({0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'})


In [None]:
data.head(5)

### ** Use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column **

In [None]:
sns.countplot(x = 'day of week',hue = 'reason',data = data)
plt.legend(bbox_to_anchor=(1.1, 1),borderaxespad=0) # this is used to manage position of legend.

#### from the above fig. we can say that on friday most of the calls were made to 911 because of the EMS reasons.sunday had the less traffic related calls which is normal due to less traffic on the roads.but calls spike up on tuesday.fire related calls were fie related calls were slightly same for all days. 

###  ** Use seaborn to create a countplot of the Month column with the hue based off of the Reason column** 

In [None]:
sns.countplot(x='month', data=data, hue = 'reason')
plt.legend(bbox_to_anchor=(1,1))

#### In january and july, EMS related calls spike up but in the december month EMS related calls were decreased.
#### in the december month fire elated calls were decreased.
#### in january had more traffic on the road so traffic related calls spike up in this month.
#### from above plot we can also obeserved that the months sup , oct and nov were missing.

### ** Do you notice something strange about this Plot? **    Plot is missing some months.  May need to plot this information another way, possibly a simple line plot, that fills in the missing data.

#### From above plot we can  obeserved that plot was missing some months. May need to plot this information another way, possibly a simple line plot, that fills in the missing data.

###  ** Create a groupby object called byMonth that groups the DataFrame by month and uses the count() method for aggregation. **

In [None]:
bymonth = data.groupby('month').count()
bymonth.head()

###  ** Create a simple plot off of the dataframe indicating the count of calls per month**

In [None]:
bymonth['lat'].plot()

#### In july 911 calls increased sharply and then greatly reduced from august. What was the reason for this sudden sharp increase and then downfall is not clear