# 911 Calls Data Preprocessing

#### The database is a record of all records the emergency 911 calls over an interval of time. each call is recorded as an instance while recording features of each call. The features are broken down as follows:

##### These two features represent the location as identified by the Opearator

1. lat : String variable, Latitude

2. lng: String variable, Longitude

3. desc: String variable, Description of the Emergency Call, reason and nature of emergency

4. zip: String variable, Zipcode of the reporter as provided by the caller

5. title: String variable, Title

6. timeStamp: String variable, YYYY-MM-DD HH:MM:SS

7. twp: String variable, Township

8. addr: String variable, Address

9. e: String variable, Dummy variable (always 1)

## Data and Set Up

In [None]:
# Import libraries
import numpy as np
import pandas as pd

# Import visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Read data
df = pd.read_csv('911.csv')

# Check dataframe info()
df.info()

In [None]:
df.lat = df.lat.astype('float16')

In [None]:
df.info()

In [None]:
df.twp = df.twp.astype('category')

In [None]:
df.info()

In [None]:
# Check first 5 entries
df.head()

In [None]:
#check descr column
df.desc

## Basic Questions

In [None]:
#check columns
df.columns

In [None]:
df.rename(columns={'zip':'zipcode'})

In [None]:
#check nan value
df.isna().sum()

In [None]:
df.isna().sum().plot(kind='bar')

In [None]:
#check missing values
sns.heatmap(df.isna())

In [None]:
# Check for unique titles
df.title.unique()

## Creating New Features

** In the title column there are 'Reasons/Departments' specified before the title code.  These are EMS, Fire, and Traffic.   
Use .apply() with a custom lambda expression to create a new column called 'Reason' that contains this string value.**

In [None]:
# Select example
x = df['title'][0]
x

In [None]:
x.split(':')[0]

In [None]:
df['title'].apply(lambda title : title.split(':')[0])

In [None]:
# Create reason column
df['Reason']=df['title'].apply(lambda title : title.split(':')[0])
df['Reason'].head()

## ** What is the most common reason for a 911 call based off this new column?**

In [None]:
df.Reason

In [None]:
df['Reason'].value_counts()

In [None]:
df['Reason'].value_counts().plot(kind='pie',autopct='%.2f%%')

From above graph, we can see, most calls are from EMS category and percentage is 49%. Out of total 3,
summary is as follows:
    - EMS: 49%
    - Traffi: 36%
    - Fire: 15%

# ** Use seaborn to create a countplot of 911 calls by Reason**

In [None]:
sns.countplot(x = 'Reason',data = df, palette = 'rainbow')

## ** What is the data type of the objects in the timeStamp column?**

In [None]:
df.timeStamp #combination of Date and time

In [None]:
df.dtypes

In [None]:
type(df['timeStamp'].iloc[0])

## ** Convert timeStamp from strings to DateTime object**

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])

In [None]:
df.timeStamp

** Now that the timestamp column are actually DateTime objects, use .apply() to create 3 new columns called Hour, Month, and Day of Week.  
Create these columns based off of the timeStamp column.**

In [None]:
df['timeStamp'].iloc[0]

In [None]:
time = df['timeStamp'].iloc[0]
time

In [None]:
time.day

In [None]:
time.hour

In [None]:
time.day_name()

In [None]:
df['timeStamp'].apply(lambda time : time.hour)

In [None]:
# Create hour column
df['Hour'] = df['timeStamp'].apply(lambda time : time.hour)

In [None]:
df['Hour'].value_counts()

In [None]:
df[:2]

In [None]:
# Create month column
df['Month'] = df['timeStamp'].apply(lambda time : time.month)
df['Month'].value_counts()

In [None]:
# Create day of week
df['Day of Week'] = df['timeStamp'].apply(lambda time : time.dayofweek)
df['Day of Week'].value_counts()

In [None]:
day = df['timeStamp'].apply(lambda time : time.day_name())
day.value_counts()

## ** Notice how the Day of Week is an integer 0-6. Use the .map() with a dictionary to map the actual string names to the day of the week**

In [None]:
df

In [None]:
df.columns

In [None]:
df['Day of Week'].unique()

In [None]:
# Create dictionary
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
dmap

In [None]:
# Map string names
df['Day of Week'] = df['Day of Week'].map(dmap)
df['Day of Week']

In [None]:
df['Day of Week'][:4]

In [None]:
df['Day of Week'].value_counts()