## (Core) Project 3 - Part 1
For this assignment, you will be starting a new portfolio project that you will continue to build on over the remaining weeks of this course.

* For Part 1, you will be downloading and preparing a crime dataset for time series analysis.
* You will then answer the provided stakeholder questions and support your answers with visualizations.

### Chicago Crime Data
* #### We have prepared a zip file with the Chicago crime data which you can download here.
* Original Source: Chicago Data Portal: Crimes 2001 to Present
    * Data Description:
        * All Crimes that were reported in the city of Chicago and their details View Preview
    * Includes:
        * type of crime, exact date/time, lat/long, District/ward, was there an arrest, etc.
    * Note: The original .csv is very large and needs to be processed into smaller files to fit inside a GitHub repository.
    * We have provided a .zip file (linked above) with the data in a repo-friendly format. For those who are curious the code for converting the downloaded file to the .zip file of individual years, please see this helper notebook.
* #### Supplemental Data: Holiday Data
    * Check the lesson on "Feature Engineering: Holidays" to see how to use the Python 'holidays' package to add holidays to your dataset.
* #### Notes/Considerations:
    * You may need to keep 2 forms of the data:
        * The original individual crime data with a datetime index. (Each row is 1 crime)
        * A resampled/converted crime counts version (Each row is 1 day)

### Task
Your task is to answer a series of questions about trends in crimes in Chicago for a reporter for the local newspaper.

#### Stakeholder Questions to Answer (Pick at least 3 topics):

Select 3 or more of the following topics to analyze:

* #### Topic 1) Comparing Police Districts
    * Which district had the most crimes in 2022?
    * Which had the least?

* #### Topic 2) Crimes Across the Years:
    * Is the total number of crimes increasing or decreasing across the years?
    * Are there any individual crimes that are doing the opposite (e.g., decreasing when overall crime is increasing or vice-versa)?

* #### Topic 3) Comparing AM vs. PM Rush Hour:
    * Are crimes more common during AM rush hour or PM rush hour?
        * You can consider any crime that occurred between 7 AM - 10 AM as AM rush hour
        * You can consider any crime that occurred between 4 - 7 PM as PM rush hour.
    * Answer the question: What are the top 5 most common crimes during AM rush hour? What are the top 5 most common crimes during PM rush hour?
    * Answer the question: Are Motor Vehicle Thefts more common during AM rush hour or PM Rush Hour?

* #### Topic 4) Comparing Months:
    * Answer the question: What months have the most crime? What months have the least?
    * Answer the question: Are there any individual crimes that do not follow this pattern? If so, which crimes?

* #### Topic 5) Comparing Holidays:
    * Answer the question: What are the top 3 holidays with the largest number of crimes?
    * Answer the question:  For each of the top 3 holidays with the most crime, what are the top 5 most common crimes on that holiday?

* #### Topic 6) What cycles (seasonality) can you find in this data?
    * ##### Make sure to select the data of interest and that it is resampled to the frequency you want. 
        * (See the "Suggested data to check for seasons" list at the bottom of this topic for suggestions) .
    * #### Use statsmodels.tsa.seasonal.seasonal_decompose() to decompose the time series.
        * Note: seasonal_decompose cannot read data resampled as minutes or smaller, and if you try seconds, you will crash your computer. Keep your resampling at hours or more.

    * #### Show and describe each cycle you can find.
        * (Hint: If your seasonal results are too dense to read, try zooming in to look at just one year or one month and try different levels of resampling).
        * Answer the question: How long is a cycle?
        * Answer the question: What is the magnitude of the cycle? (Compare min and max).

    * #### Suggested data to check for seasons:
        * Total Crime (Daily)
        * Total Crime (Weekly)
        * Total Crime (Monthly)
        * Select a Primary Type of interest to you (Daily)
        * Select a Primary Type of interest to you (Weekly)
        * Select a Primary Type of interest to you (Monthly)

### Final Submission
Submit the link to your GitHub Repository (make sure it is public!)

In [1]:
# imports
import glob
import pandas as pd

In [10]:
folder = r"C:\Users\bandi\Data\Chicago\\"
crime_files = sorted(glob.glob(folder+"*.csv"))
df = pd.concat([pd.read_csv(f) for f in crime_files])
df

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,1326041,01/01/2001 01:00:00 AM,BATTERY,SIMPLE,RESIDENCE,False,False,1624,16.0,,41.957850,-87.749185
1,1319931,01/01/2001 01:00:00 PM,BATTERY,SIMPLE,RESIDENCE,False,True,825,8.0,,41.783892,-87.684841
2,1324743,01/01/2001 01:00:00 PM,GAMBLING,ILLEGAL ILL LOTTERY,STREET,True,False,313,3.0,,41.780412,-87.611970
3,1310717,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2424,24.0,,42.012391,-87.678032
4,1318099,01/01/2001 01:00:00 AM,BATTERY,SIMPLE,RESIDENCE PORCH/HALLWAY,False,True,214,2.0,,41.819538,-87.620020
...,...,...,...,...,...,...,...,...,...,...,...,...
238853,12938029,12/31/2022 12:50:00 PM,ASSAULT,SIMPLE,APARTMENT,False,False,1014,10.0,24.0,41.855911,-87.719966
238854,12937822,12/31/2022 12:50:00 PM,ASSAULT,AGGRAVATED - HANDGUN,APARTMENT,False,True,733,7.0,6.0,41.766546,-87.645669
238855,12937583,12/31/2022 12:52:00 AM,BATTERY,SIMPLE,BARBERSHOP,False,False,1135,11.0,28.0,41.868829,-87.686098
238856,12938420,12/31/2022 12:52:00 PM,ASSAULT,SIMPLE,COMMERCIAL / BUSINESS OFFICE,False,False,1432,14.0,32.0,41.930693,-87.685657


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 7713109 entries, 0 to 238857
Data columns (total 12 columns):
 #   Column                Dtype  
---  ------                -----  
 0   ID                    int64  
 1   Date                  object 
 2   Primary Type          object 
 3   Description           object 
 4   Location Description  object 
 5   Arrest                bool   
 6   Domestic              bool   
 7   Beat                  int64  
 8   District              float64
 9   Ward                  float64
 10  Latitude              float64
 11  Longitude             float64
dtypes: bool(2), float64(4), int64(2), object(4)
memory usage: 662.0+ MB


In [15]:
chicago_df = df.copy()
chicago_df

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,1326041,01/01/2001 01:00:00 AM,BATTERY,SIMPLE,RESIDENCE,False,False,1624,16.0,,41.957850,-87.749185
1,1319931,01/01/2001 01:00:00 PM,BATTERY,SIMPLE,RESIDENCE,False,True,825,8.0,,41.783892,-87.684841
2,1324743,01/01/2001 01:00:00 PM,GAMBLING,ILLEGAL ILL LOTTERY,STREET,True,False,313,3.0,,41.780412,-87.611970
3,1310717,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2424,24.0,,42.012391,-87.678032
4,1318099,01/01/2001 01:00:00 AM,BATTERY,SIMPLE,RESIDENCE PORCH/HALLWAY,False,True,214,2.0,,41.819538,-87.620020
...,...,...,...,...,...,...,...,...,...,...,...,...
238853,12938029,12/31/2022 12:50:00 PM,ASSAULT,SIMPLE,APARTMENT,False,False,1014,10.0,24.0,41.855911,-87.719966
238854,12937822,12/31/2022 12:50:00 PM,ASSAULT,AGGRAVATED - HANDGUN,APARTMENT,False,True,733,7.0,6.0,41.766546,-87.645669
238855,12937583,12/31/2022 12:52:00 AM,BATTERY,SIMPLE,BARBERSHOP,False,False,1135,11.0,28.0,41.868829,-87.686098
238856,12938420,12/31/2022 12:52:00 PM,ASSAULT,SIMPLE,COMMERCIAL / BUSINESS OFFICE,False,False,1432,14.0,32.0,41.930693,-87.685657


### Convert the date column to datetime dtype

In [22]:
pd.to_datetime(chicago_df['Date'])

0        2001-01-01 01:00:00
1        2001-01-01 13:00:00
2        2001-01-01 13:00:00
3        2001-01-01 01:00:00
4        2001-01-01 01:00:00
                 ...        
238853   2022-12-31 12:50:00
238854   2022-12-31 12:50:00
238855   2022-12-31 00:52:00
238856   2022-12-31 12:52:00
238857   2022-12-31 12:59:00
Name: Date, Length: 7713109, dtype: datetime64[ns]

In [24]:
chicago_df['Date'] = pd.to_datetime(chicago_df['Date'], format='%m/%d/%Y %I:%M:%S %p')
chicago_df['Date']

0        2001-01-01 01:00:00
1        2001-01-01 13:00:00
2        2001-01-01 13:00:00
3        2001-01-01 01:00:00
4        2001-01-01 01:00:00
                 ...        
238853   2022-12-31 12:50:00
238854   2022-12-31 12:50:00
238855   2022-12-31 00:52:00
238856   2022-12-31 12:52:00
238857   2022-12-31 12:59:00
Name: Date, Length: 7713109, dtype: datetime64[ns]

In [27]:
chicago_df = chicago_df.set_index('Date')
chicago_df

Unnamed: 0_level_0,ID,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2001-01-01 01:00:00,1326041,BATTERY,SIMPLE,RESIDENCE,False,False,1624,16.0,,41.957850,-87.749185
2001-01-01 13:00:00,1319931,BATTERY,SIMPLE,RESIDENCE,False,True,825,8.0,,41.783892,-87.684841
2001-01-01 13:00:00,1324743,GAMBLING,ILLEGAL ILL LOTTERY,STREET,True,False,313,3.0,,41.780412,-87.611970
2001-01-01 01:00:00,1310717,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2424,24.0,,42.012391,-87.678032
2001-01-01 01:00:00,1318099,BATTERY,SIMPLE,RESIDENCE PORCH/HALLWAY,False,True,214,2.0,,41.819538,-87.620020
...,...,...,...,...,...,...,...,...,...,...,...
2022-12-31 12:50:00,12938029,ASSAULT,SIMPLE,APARTMENT,False,False,1014,10.0,24.0,41.855911,-87.719966
2022-12-31 12:50:00,12937822,ASSAULT,AGGRAVATED - HANDGUN,APARTMENT,False,True,733,7.0,6.0,41.766546,-87.645669
2022-12-31 00:52:00,12937583,BATTERY,SIMPLE,BARBERSHOP,False,False,1135,11.0,28.0,41.868829,-87.686098
2022-12-31 12:52:00,12938420,ASSAULT,SIMPLE,COMMERCIAL / BUSINESS OFFICE,False,False,1432,14.0,32.0,41.930693,-87.685657


In [28]:
chicago_df.index

DatetimeIndex(['2001-01-01 01:00:00', '2001-01-01 13:00:00',
               '2001-01-01 13:00:00', '2001-01-01 01:00:00',
               '2001-01-01 01:00:00', '2001-01-01 01:00:00',
               '2001-01-01 01:00:00', '2001-01-01 01:00:00',
               '2001-01-01 01:00:00', '2001-01-01 01:00:00',
               ...
               '2022-12-31 12:41:00', '2022-12-31 00:42:00',
               '2022-12-31 00:44:00', '2022-12-31 00:45:00',
               '2022-12-31 12:45:00', '2022-12-31 12:50:00',
               '2022-12-31 12:50:00', '2022-12-31 00:52:00',
               '2022-12-31 12:52:00', '2022-12-31 12:59:00'],
              dtype='datetime64[ns]', name='Date', length=7713109, freq=None)