# 911 Calls Exploratory Analysis

#### The database is a record of all records the emergency 911 calls over an interval of time. each call is recorded as an instance while recording features of each call. The features are broken down as follows:

##### These two features represent the location as identified by the Opearator

1. lat : String variable, Latitude

2. lng: String variable, Longitude

3. desc: String variable, Description of the Emergency Call, reason and nature of emergency

4. zip: String variable, Zipcode of the reporter as provided by the caller

5. title: String variable, Title

6. timeStamp: String variable, YYYY-MM-DD HH:MM:SS

7. twp: String variable, Township

8. addr: String variable, Address

9. e: String variable, Dummy variable (always 1)

## Data and Set Up

In [None]:
# Import libraries


# Import visualization libraries


In [None]:
# Read data


# Check dataframe info


In [None]:
# Check first 5 entries


## Basic Questions

In [None]:
#check columns

In [None]:
#check uniqu zip

**What are the top 5 zipcodes for 911 calls?**

## ** What are the top 5 townships(twp) for 911 calls?**

## ** How many unique title codes are there?**

## Creating New Features

** In the title column there are 'Reasons/Departments' specified before the title code.  These are EMS, Fire, and Traffic.   
Use .apply() with a custom lambda expression to create a new column called 'Reason' that contains this string value.**

In [None]:
# Select example
x = df['title'][0]
x

In [None]:
x.split(':')[0]

In [None]:
# Create reason column
df['Reason']=df['title'].apply(lambda title : title.split(':')[0])
df['Reason'].head()

## ** What is the most common reason for a 911 call based off this new column?**

In [None]:
df.Reason

In [None]:
df['Reason'].value_counts()

In [None]:
df['Reason'].value_counts().plot(kind='pie',autopct='%.2f%%')

From above graph, we can see, most calls are from EMS category and percentage is 49%. Out of total 3, 
summary is as follows:
    - EMS: 49%
    - Traffi: 36%
    - Fire: 15%

# ** Use seaborn to create a countplot of 911 calls by Reason**

In [None]:
sns.countplot(x = 'Reason',data = df, palette = 'rainbow')

## ** What is the data type of the objects in the timeStamp column?**

In [None]:
df.timeStamp #combination of Date and time

In [None]:
df.info()

In [None]:
type(df['timeStamp'].iloc[0])

## ** Convert timeStamp from strings to DateTime object**

In [None]:
df['timeStamp']=pd.to_datetime(df['timeStamp'])
type(df['timeStamp'].iloc[0])

In [None]:
df.timeStamp

** Now that the timestamp column are actually DateTime objects, use .apply() to create 3 new columns called Hour, Month, and Day of Week.  
Create these columns based off of the timeStamp column.**

In [None]:
time = df['timeStamp'].iloc[0]
time

In [None]:
df['timeStamp'].apply(lambda time : time.hour)

In [None]:
# Create hour column
df['Hour'] = df['timeStamp'].apply(lambda time : time.hour)
df['Hour'].value_counts().head()

In [None]:
df[:2]

In [None]:
# Create month column
df['Month'] = df['timeStamp'].apply(lambda time : time.month)
df['Month'].value_counts()

In [None]:
# Create day of week
df['Day of Week'] = df['timeStamp'].apply(lambda time : time.dayofweek)
df['Day of Week'].value_counts()

In [None]:
# Confirm columns were added to dataframe
df.head()

## ** Notice how the Day of Week is an integer 0-6. Use the .map() with a dictionary to map the actual string names to the day of the week**

In [None]:
# Create dictionary
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
dmap

In [None]:
# Map string names
df['Day of Week'] = df['Day of Week'].map(dmap)

In [None]:
df['Day of Week'][:4]

In [None]:
df['Day of Week'].value_counts()

** Use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column **

In [None]:
# Create count plot
sns.countplot(x = 'Day of Week', data = df, hue = 'Reason',palette= 'viridis')

# Show edge lines
plt.rcParams["patch.force_edgecolor"] = True

# Relocate legend
plt.legend(bbox_to_anchor =(1.05,1), loc = 2, borderaxespad= 0.)

**Use seaborn to create a countplot of the Month column with the hue based off the Reason column**

In [None]:
#df.timeStamp.dt.hour
df.timeStamp.dt.year

In [None]:
# Create count plot
sns.countplot(x = 'Month', data = df, hue = 'Reason', palette = 'viridis')

# Relocate legend
plt.legend(bbox_to_anchor = (1.05,1), loc = 2, borderaxespad = 0.)

** Do you notice something strange about this Plot? **    
Plot is missing some months.  May need to plot this information another way, possibly a simple line plot, that fills in the missing data.

** Create a groupby object called byMonth that groups the DataFrame by month and uses the count() method for aggregation. **

In [None]:
df.groupby('Month').count()

In [None]:
# Create group by object
byMonth = df.groupby('Month').count()

# View first few rows
byMonth.head()

**Create a simple plot off of the dataframe indicating the count of calls per month**

In [None]:
byMonth['lat'].plot()

** Use seaborn's lmplot() to create a linear fit on the number of calls per Month.   
Keep in mind you may need to reset the index to a column.**

In [None]:
# Reset index to make month a column
byMonth.reset_index()

In [None]:
# Create linear model
sns.lmplot(x = 'Month', y = 'twp', data = byMonth.reset_index()) 
# there was no 'Month' column in byMonth, need to reset the index for this code to work

**Use apply along with the .date() method to create a new column called 'Date' that contains the date from the timeStamp column.**

In [None]:
t = df['timeStamp'].iloc[0]
t

In [None]:
t.date()

In [None]:
df['Date'] = df['timeStamp'].apply(lambda t : t.date())
df.head()

**Grouby the Date column with the count() aggregate and create a plot of counts of 911 calls**

In [None]:
# Plot all instances per date
plt.figure(figsize=(12,8))
df.groupby('Date').count()['lat'].plot()
plt.xticks(rotation='vertical')
plt.tight_layout
plt.savefig('graph_1.png',dpi=400)

In [None]:
df.Date.nunique()

** Recreate plot above but create 3 separate plots representing reason for the 911 Call**

In [None]:
# Create Traffic plot
df[df['Reason']=='Traffic'].groupby('Date').count()['lat'].plot()
plt.title('Traffic')
plt.xticks(rotation='vertical')
plt.tight_layout()
plt.show()

In [None]:
# Create Fire plot
df[df['Reason']=='Fire'].groupby('Date').count()['lat'].plot()
plt.title('Fire')
plt.tight_layout()

In [None]:
# Create EMS plot
df[df['Reason']=='EMS'].groupby('Date').count()['lat'].plot()
plt.title('EMS')
plt.tight_layout()

# Heat Maps
## By Day of Week

In [None]:
# Create multi-level index and unstack to re-structure dataframe as matrix
dayHour = df.groupby(['Day of Week','Hour']).count()['Reason'].unstack() # columns become Hours and the Index becomes Day of the Week
dayHour

In [None]:
# Create heatmap
plt.figure(figsize=(12,6))
sns.heatmap(dayHour,cmap = 'coolwarm',linewidths=2)

** Create Clustermap using this DataFrame**

In [None]:
# Clustermap
sns.clustermap(dayHour,cmap = 'coolwarm',linewidths =2)

In [None]:
print(sns.get_dataset_names())

In [None]:
d = sns.load_dataset('diamonds')

In [None]:
d.head(2)

In [None]:
d.cut.unique()

In [None]:
sns.countplot(x= 'cut',data=d)

In [None]:
sns.pairplot(d)

In [None]:
d.columns

In [None]:
sns.pairplot(d[['carat', 'cut', 'color','price']],size=8,diag_kind='kde')

# Discussion and findings
- Most of the calls are for the reason of EMS followed by Traffic.


- January and December are the most emergency charged months of the year. Moreover, the reason for call is mainly EMS and Traffic Emergencies.


- The reporting of EMS across the hours of the day mainly spikes during working hours.


- Friday is the most day of the week with calls reporting EMS and Traffic Emergencies.


- The late hours of the night has the least emergency calls However on the weekends, there is a more emergency calls after midnight.