# Chicago Crime Data Project: Part 1

**Steps:**

1. Use data from the Chicago Data Portal: Crimes 2001 to Present, which includes type of crime, date/time, lat/long, District/ward, arrests, etc.
    - https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2

2. Use this data processing helper notebook to process large Chicago crime data .csv into several smaller .csv.gz files for analysis.
    - https://github.com/coding-dojo-data-science/preparing-chicago-crime-data

3. Answer several questions about the data. Questions:
    - (1) Comparing Police Districts: 
        - (A) Which district has the most crimes? 
        - (B) Which has the least?
    - (2) Crimes Across the Years:
        - (A) Is the total number of crimes increasing or decreasing across the years?
        - (B) Are there any individual crimes that are doing the opposite (i.e., decreasing when orverall crime is increasing or vice-versa)?
    - (3) Comparing AM vs. PM Rush Hour:
        - (A) Are crimes more common during AM rush hour or PM rush hour? Consider 7am-10am and 4pm-7pm as rush hours.
        - (B) What are the top 5 most common crimes during AM rush hour? What are the top 5 most common crimes during PM rush hour?
        - (C) Are MOtor Vehicle Thefts more common during AM rush hour or PM rush hour?
    - (4) Comparing Months:
        - (A) What months have the most crime? What months have the least?
        - (B) Are there any individual crimes that do not follow this pattern? If so, which crimes?
    - (5) Comparing Holidays:
        - (A) Are there any holidays that show an increase in the number of crimes?
        - (B) Are there any holidays that show a decrease in the number of crimes?

# 1 Imports

In [1]:
# imports
import pandas as pd

# 2 Data Preparation

I downloaded the raw data from the Chicago Data Portal and processed it with the helper notebook in this repository. Here I will import the data that was saved in this repository and concatenate it back into one full dataframe. There is one saved .csv file for each year from 2001 to 2023 (inclusive).

In [10]:
# import data into variables
def import_files():
    """Reads in .csv files from Data folder and names them
    dynamically with appropriate year"""
    
    # set file name template and years to use in file names
    file_name_template = "Chicago-Crime_{}"
    years = range(2001, 2024)
    
    # initialize dictionary to save files to
    data = {}
    
    # iterate through years
    for year in years:
        
        # recreate the file name
        file_name = file_name_template.format(year)
        
        # save file path based on file name
        file_path = f"Data/{file_name}.csv"
        
        # import as df
        df = pd.read_csv(file_path)
        
        # store in dictionary
        data[file_name] = df
        
        # replace dash (not allowed in python variable
        # names) with underscore
        # initialize new dictionary
        new_data = {}
            
        # loop through dict items
        for key, value in data.items():
            
            # if dash in key
            if "-" in key:
                
                # replace dash with underscore and save as new key
                new_key = key.replace("-", "_")
                
            # else new key is same as old key
            else:
                new_key = key
                
            # add new key with corresponding value to new dict
            new_data[new_key] = value
        
    # return dictionary
    return new_data

# call function and store in variable
imported_data = import_files()

# assign values to variables based on key names
for key, value in imported_data.items():
    globals()[key] = value
    
# check
Chicago_Crime_2001

Unnamed: 0,ID,Date,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Latitude,Longitude
0,1422085,01/01/2001 01:00:00 AM,OTHER OFFENSE,TELEPHONE THREAT,RESIDENCE,False,True,1023,10.0,,41.858050,-87.695513
1,1324743,01/01/2001 01:00:00 PM,GAMBLING,ILLEGAL ILL LOTTERY,STREET,True,False,313,3.0,,41.780412,-87.611970
2,1310288,01/01/2001 01:00:00 AM,CRIMINAL DAMAGE,TO PROPERTY,RESIDENCE,False,False,621,6.0,,41.756650,-87.641608
3,6808288,01/01/2001 01:00:00 PM,THEFT,FINANCIAL ID THEFT: OVER $300,APARTMENT,False,False,213,2.0,3.0,41.822551,-87.615632
4,1328315,01/01/2001 01:00:00 AM,DECEPTIVE PRACTICE,FRAUD OR CONFIDENCE GAME,RESIDENCE,False,False,725,7.0,,41.771269,-87.662929
...,...,...,...,...,...,...,...,...,...,...,...,...
485880,1922811,12/31/2001 12:50:00 AM,BATTERY,SIMPLE,SIDEWALK,True,False,1433,14.0,,41.904680,-87.667387
485881,1916915,12/31/2001 12:51:00 PM,OTHER OFFENSE,TELEPHONE THREAT,COMMERCIAL / BUSINESS OFFICE,False,True,131,1.0,,41.874313,-87.643741
485882,1916172,12/31/2001 12:51:13 AM,OTHER OFFENSE,TELEPHONE THREAT,APARTMENT,False,False,423,4.0,,41.740157,-87.558617
485883,1927120,12/31/2001 12:55:00 PM,PUBLIC PEACE VIOLATION,BOMB THREAT,GOVERNMENT BUILDING/PROPERTY,False,False,113,1.0,,41.883320,-87.629777


In [None]:
# concatenate into one large dataframe