# Overview

Data Source:
- Use data from the [Chicago Data Portal](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2): Crimes 2001 to Present, which includes type of crime, date/time, lat/long, District/ward, arrests, etc.

Helper Notebook:
- Use the helper notebook in this [repository](https://github.com/coding-dojo-data-science/preparing-chicago-crime-data) to process your manually-downloaded csv into several .csv.gz files

Supplemental Data:
- To answer some of the possible questions, you may need to perform some feature engineering, like adding holiday information from an API on this [Holiday Data](https://docs.google.com/spreadsheets/d/1d8hoZzDAhbWx6EwNjrMTTOE5-23Pr1VxJeUxVj1JL9U/edit?usp=sharing).

## Possible Questions to consider:
1) Comparing Police Districts:
 - Which district has the most crimes? Which has the least?
2) Crimes Across the Years:
  - Is the total number of crimes increasing or decreasing across the years?
  - Are there any individual crimes that are doing the opposite (e.g decreasing when overall crime is increasing or vice-versa)?
3) Comparing AM vs. PM Rush Hour:
  - Are crimes more common during AM rush hour or PM rush hour?
    - You can consider any crime that occurred between 7 AM - 10 AM as AM rush hour
    - You can consider any crime that occurred between 4 -7 PM as PM rush hour.
  - What are the top 5 most common crimes during AM rush hour? What are the top 5 most common crimes during PM rush hour?
  - Are Motor Vehicle Thefts more common during AM rush hour or PM Rush Hour?
4) Comparing Months:
  - What months have the most crime? What months have the least?
  - Are there any individual crimes that do not follow this pattern? If so, which crimes?
5) Comparing Holidays:
  - Are there any holidays that show an increase in the # of crimes?
  - Are there any holidays that show a decrease in the # of crimes?

# Imports

In [30]:
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

# Functions

In [4]:
# import data into variables
def import_files():
    """Reads in .csv files from Data folder and names them
    dynamically with appropriate year"""
    
    
    file_name_template = "Chicago-Crime_{}"
    years = range(2001, 2024)
    
    # initialize dictionary to save files to
    data = {}
    
    # iterate through years
    for year in years:
        
        # recreate the file name
        file_name = file_name_template.format(year)
        
        # save file path based on file name
        file_path = f"Data/{file_name}.csv"
        
        # import as df
        df = pd.read_csv(file_path)
        
        # store in dictionary
        data[file_name] = df
        
        # replace dash (not allowed in python variable
        # names) with underscore
        # initialize new dictionary
        new_data = {}
            
        # loop through dict items
        for key, value in data.items():
            
            # if dash in key
            if "-" in key:
                
                # replace dash with underscore and save as new key
                new_key = key.replace("-", "_")
                
            # else new key is same as old key
            else:
                new_key = key
                
            # add new key with corresponding value to new dict
            new_data[new_key] = value
        
    # return dictionary
    return new_data

# Data Loading

In [6]:
# call function and store in variable
imported_data = import_files()

# assign values to variables based on key names
for key, value in imported_data.items():
    globals()[key] = value
    
Chicago_Crime_2023

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,12939189,01/01/2023 01:00:00 AM,OTHER OFFENSE,OTHER VEHICLE OFFENSE,STREET,False,True,423,4.0,7.0,41.736726,-87.556955
1,12944345,01/01/2023 01:00:00 AM,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,633,6.0,9.0,41.728305,-87.613136
2,12938688,01/01/2023 01:00:00 AM,MOTOR VEHICLE THEFT,THEFT / RECOVERY - AUTOMOBILE,STREET,False,False,1632,16.0,38.0,41.944491,-87.787524
3,12944392,01/01/2023 01:00:00 PM,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,915,9.0,11.0,41.833927,-87.641312
4,12943227,01/01/2023 01:00:00 AM,ROBBERY,STRONG ARM - NO WEAPON,CTA TRAIN,False,False,1132,11.0,24.0,41.873907,-87.725430
...,...,...,...,...,...,...,...,...,...,...,...,...
122001,13122057,06/28/2023 12:47:00 AM,ROBBERY,ARMED - HANDGUN,CAR WASH,False,False,1221,12.0,36.0,41.890695,-87.684644
122002,13122505,06/28/2023 12:50:00 PM,DECEPTIVE PRACTICE,ATTEMPT - FINANCIAL IDENTITY THEFT,APARTMENT,False,False,1931,19.0,32.0,41.934429,-87.675762
122003,13122064,06/28/2023 12:50:00 AM,ROBBERY,ARMED - HANDGUN,STREET,False,False,1135,11.0,28.0,41.867514,-87.686846
122004,13122624,06/28/2023 12:50:00 PM,ROBBERY,AGGRAVATED,SIDEWALK,False,False,513,5.0,9.0,41.694309,-87.620794


In [26]:
years = range(2001, 2024)
df_list = []
for year in years:
    df_list.append(f'Chicago_Crime_{year}')
    
df_list

['Chicago_Crime_2001',
 'Chicago_Crime_2002',
 'Chicago_Crime_2003',
 'Chicago_Crime_2004',
 'Chicago_Crime_2005',
 'Chicago_Crime_2006',
 'Chicago_Crime_2007',
 'Chicago_Crime_2008',
 'Chicago_Crime_2009',
 'Chicago_Crime_2010',
 'Chicago_Crime_2011',
 'Chicago_Crime_2012',
 'Chicago_Crime_2013',
 'Chicago_Crime_2014',
 'Chicago_Crime_2015',
 'Chicago_Crime_2016',
 'Chicago_Crime_2017',
 'Chicago_Crime_2018',
 'Chicago_Crime_2019',
 'Chicago_Crime_2020',
 'Chicago_Crime_2021',
 'Chicago_Crime_2022',
 'Chicago_Crime_2023']

In [38]:
df_list = [Chicago_Crime_2001, Chicago_Crime_2002, Chicago_Crime_2003,
           Chicago_Crime_2004, Chicago_Crime_2005, Chicago_Crime_2006,
           Chicago_Crime_2007, Chicago_Crime_2008, Chicago_Crime_2009,
           Chicago_Crime_2010, Chicago_Crime_2011, Chicago_Crime_2012,
           Chicago_Crime_2013, Chicago_Crime_2014, Chicago_Crime_2015,
           Chicago_Crime_2016, Chicago_Crime_2017, Chicago_Crime_2018,
           Chicago_Crime_2019, Chicago_Crime_2020, Chicago_Crime_2021,
           Chicago_Crime_2022, Chicago_Crime_2023]

df = pd.DataFrame()
for i in df_list:
    df = pd.concat([df, i])
    
df

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,1323184,01/01/2001 01:00:00 PM,OTHER OFFENSE,HARASSMENT BY TELEPHONE,RESIDENCE,False,False,2213,22.0,,41.707671,-87.666996
1,1328315,01/01/2001 01:00:00 AM,DECEPTIVE PRACTICE,FRAUD OR CONFIDENCE GAME,RESIDENCE,False,False,725,7.0,,41.771269,-87.662929
2,1311933,01/01/2001 01:00:00 AM,SEX OFFENSE,CRIMINAL SEXUAL ABUSE,RESIDENCE,True,False,1434,14.0,,41.910797,-87.682214
3,1330412,01/01/2001 01:00:00 AM,THEFT,$500 AND UNDER,TAVERN/LIQUOR STORE,False,False,1813,18.0,,41.917383,-87.648623
4,1311735,01/01/2001 01:00:00 AM,BATTERY,AGGRAVATED: OTHER DANG WEAPON,STREET,False,False,1632,16.0,,41.938196,-87.800534
...,...,...,...,...,...,...,...,...,...,...,...,...
122001,13122057,06/28/2023 12:47:00 AM,ROBBERY,ARMED - HANDGUN,CAR WASH,False,False,1221,12.0,36.0,41.890695,-87.684644
122002,13122505,06/28/2023 12:50:00 PM,DECEPTIVE PRACTICE,ATTEMPT - FINANCIAL IDENTITY THEFT,APARTMENT,False,False,1931,19.0,32.0,41.934429,-87.675762
122003,13122064,06/28/2023 12:50:00 AM,ROBBERY,ARMED - HANDGUN,STREET,False,False,1135,11.0,28.0,41.867514,-87.686846
122004,13122624,06/28/2023 12:50:00 PM,ROBBERY,AGGRAVATED,SIDEWALK,False,False,513,5.0,9.0,41.694309,-87.620794


In [39]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7834343 entries, 0 to 122005
Data columns (total 12 columns):
 #   Column                Dtype  
---  ------                -----  
 0   ID                    int64  
 1   Date                  object 
 2   Primary Type          object 
 3   Description           object 
 4   Location Description  object 
 5   Arrest                bool   
 6   Domestic              bool   
 7   Beat                  int64  
 8   District              float64
 9   Ward                  float64
 10  Latitude              float64
 11  Longitude             float64
dtypes: bool(2), float64(4), int64(2), object(4)
memory usage: 672.4+ MB
