This notebook is a work in progress based on the really great Udemy course [Python for Data Science and Machine Learning Bootcamp](https://www.udemy.com/share/101WaUAEISd1lURnQ=/)

In [None]:
import numpy as np
import pandas as pd

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 

In [None]:
df = pd.read_csv('/kaggle/input/montcoalert/911.csv')

`info()` will tell us the column names and how many data points are in each column. A handy way to find out if there are missing values in some columns. We see that 'zip' only has 558,757 non-null records, so there are a lot of missing values in this column.

In [None]:
df.info()

In [None]:
df.head(10)

I use `value_counts()` to get a list of zip codes, and `head()` to get the top ten from that list. These zip codes could be high crime areas and/or have a lot of elderly people leading to a higher number of 911 calls?

In [None]:
df['zip'].value_counts().head(10)

In [None]:
I use the same technique to find the top 10 townships ('twp' column) by number of 911 calls made.

In [None]:
df['twp'].value_counts().head(10)

`nunique()` will return the number of unique elements. This tells me there are 147 unique types of 911 call in the data set.

In [None]:
df['title'].nunique()

Next we'll use a Python lambda expression to create a new column of simplified 911 calls. Right now in the title column, we have 147 different types of EMS (emergency services, like an ambulance), fire, and traffic calls, and we want to simplify that into a new feature just called 'reason'. The lambda expression splits the title at the colon and grabs the piece of the string before the colon: EMS, fire, or traffic.

In [None]:
df['reason'] = df['title'].apply(lambda title: title.split(':')[0])

In [None]:
df['reason'].value_counts()

In [None]:
sns.countplot(x='reason',data=df,palette='viridis')

We check to find out what the data type is in the 'timeStamp' column. The data type is string, so we'll need to convert that into a more usable DateTime format.

In [None]:
type(df['timeStamp'].iloc[0])

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

In [None]:
df['hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['month'] = df['timeStamp'].apply(lambda time: time.month)
df['dayOfWeek'] = df['timeStamp'].apply(lambda time: time.day)

In [None]:
dayMap = {0: 'Mon', 1: 'Tue', 2: 'Wed', 3: 'Thu', 4: 'Fri', 5: 'Sat', 6: 'Sun'}

In [None]:
df['dayOfWeek'] = df['dayOfWeek'].map(dayMap)

In [None]:
sns.countplot(x='dayOfWeek',data=df, hue='reason', palette='viridis')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

In [None]:
sns.countplot(x ='month', data=df, hue='reason', palette='plasma')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

In [None]:
byMonth = df.groupby('month').count()
byMonth.head()

In [None]:
byMonth['twp'].plot()