<span style="color:red;font-weight: bold;font-size:40px">Assignment 1 - Pandas, Matplotlib, and Seaborn</span>

Let's analyze 911 call data from Kaggle. The dataset includes the following fields:

* lat : String variable, Latitude
* lng: String variable, Longitude
* desc: String variable, Description of the Emergency Call
* zip: String variable, Zipcode
* title: String variable, Title
* timeStamp: String variable, YYYY-MM-DD HH:MM:SS
* twp: String variable, Township
* addr: String variable, Address

a) Import numpy,pandas, matplotlib, and seaborn.

In [None]:
import pandas as pd
import matplotlib as mp
import seaborn as sns

b) Read the CSV file into a DataFrame.

In [None]:
df = pd.read_csv('911.csv')

c) Check the info() and the head().

In [None]:
df.head()

In [None]:
df.info()

# Questions and exercises

1- What are the top 5 zipcodes for 911 calls? (5 pts)

2- What are the top 5 townships (twp) for 911 calls? (5 pts)

In [None]:
top_townships = df['twp'].value_counts().head(5)

In [None]:
print("Top 5 townships for 911 calls: \n", top_townships)

3- In the title column, the 'Reason/Department' is specified before the title code, such as EMS, Fire, and Traffic. Use the .apply() method with a custom lambda function to create a new column called Reason that extracts this value.

For example, if the value in the title column is EMS: BACK PAINS/INJURY, the corresponding value in the Reason column should be EMS. (5 pts)

In [None]:
df['Reason'] = df['title'].apply(lambda x: x.split(':')[0])

4- What is the most common reason for a 911 call based on the values in the new Reason column? (5 pts)

In [None]:
most_common_reason = df['Reason'].value_counts().idxmax()
print(f"\nMost Common Reason for 911 Calls: {most_common_reason}")

5- Create a count plot to visualize the number of 911 calls for each Reason. (5 pts)

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Reason', data=df, palette='viridis')
plt.title('Number of 911 Calls by Reason')
plt.show()

6- Use pd.to_datetime() to convert the timeStamp column from strings to DateTime objects. (5 pts)

In [None]:
df['timeStamp'] = pd.to_datetime(df['timeStamp'])

7- Now that the timeStamp column has been converted to DateTime objects, you can extract specific attributes by calling them. For example: 

time = df['timeStamp'].iloc[0]
time.hour

You can use Jupyter's tab-completion feature to explore the various attributes available for a DateTime object.

Use the .apply() method to create three new columns: Hour, Month, and Day of Week. These columns should be derived from the timeStamp column. (10 pts)

In [None]:
df['Hour'] = df['timeStamp'].apply(lambda time: time.hour)
df['Month'] = df['timeStamp'].apply(lambda time: time.month)
df['Day of Week'] = df['timeStamp'].apply(lambda time: time.dayofweek)

8- Notice that the Day of Week column contains integers ranging from 0 to 6, representing the days of the week. Use the .map() method with the following dictionary to map these integers to their corresponding day names:

day_map = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}

(5 pts)

In [None]:
day_map = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
df['Day of Week'] = df['Day of Week'].map(day_map)

9- Create a count plot of the Day of Week column, using the hue parameter to differentiate the data based on the Reason column. (5 pts)

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Day of Week', data=df, hue='Reason', palette='viridis')
plt.title('Number of 911 Calls by Day of Week and Reason')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

10- Create a count plot of the Month column, using the hue parameter to differentiate the data based on the Reason column. (5 pts)

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(x='Month', data=df, hue='Reason', palette='viridis')
plt.title('Number of 911 Calls by Month and Reason')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()


11- Create a groupby object called byMonth, grouping the DataFrame by the Month column and using the .count() method for aggregation. (5 pts)

In [None]:
byMonth = df.groupby('Month').count()

12- Create a simple line plot from the DataFrame to show the count of calls per month. (5 pts)

In [None]:
plt.figure(figsize=(10, 6))
byMonth['lat'].plot()
plt.title('Number of 911 Calls per Month')
plt.xlabel('Month')
plt.ylabel('Number of Calls')
plt.show()

13- Use Seaborn's lmplot() to create a linear fit showing the trend in the number of calls per month. You may need to reset the index of the byMonth DataFrame to convert it into a column for plotting. (5 pts)

In [None]:
sns.lmplot(x='Month', y='lat', data=byMonth.reset_index())
plt.title('Trend in Number of 911 Calls per Month')
plt.show()

14- Next, we'll create heatmaps using Seaborn and our data. First, restructure the DataFrame so that:

* The columns represent the Hours,
* The index represents the Day of Week,
* The values represent the count of calls.
    
You can achieve this by grouping the data using groupby and then reshaping it with the .pivot_table() method. (15 pts)

In [None]:
dayHour = df.groupby(['Day of Week', 'Hour']).count()['Reason'].unstack()

15- Create a heatmap using the newly structured DataFrame. (5 pts)

In [None]:
plt.figure(figsize=(12, 6))
sns.heatmap(dayHour, cmap='viridis')
plt.title('Heatmap of 911 Calls by Day of Week and Hour')
plt.show()

16- Repeat steps #14 and #15, but this time structure the DataFrame so that:

* The Month is set as the columns,
* The Day of Week is set as the index,
* The values represent the count of calls.

    Then, create a heatmap using this newly structured DataFrame. (10 pts)

In [None]:
dayMonth = df.groupby(['Day of Week', 'Month']).count()['Reason'].unstack()

In [None]:
plt.figure(figsize=(12, 6))
sns.heatmap(dayMonth, cmap='viridis')
plt.title('Heatmap of 911 Calls by Day of Week and Month')
plt.show()