In [None]:
# Title: Analysis of Crime and Economic Status in Toronto

In [None]:
# Motivation


In [None]:
# Review of Similar Research

In [None]:
# Research Question

# Data: Toronto Neighborhood Crime Analysis (2016-2021)

## Data Sources

This research utilizes multiple datasets to analyze the relationship between socioeconomic factors and crime rates across Toronto neighborhoods, with a particular focus on comparing patterns between 2016 and 2021.

In [None]:
# Core datasets used in this analysis
raw_data_sources = {
    "crime_data": "raw_data/neighbourhood-crime-rates - 4326.csv",
    "neighborhood_profiles_2016": "raw_data/neighbourhood-profiles-2016-140-model.csv",
    "neighborhood_profiles_2021": "raw_data/neighbourhood-profiles-2021-158-model.csv"
}

The primary datasets include:

**1. Toronto Neighborhood Crime Statistics**: Contains detailed crime counts and rates across various crime types (assault, auto theft, bike theft, break and enter, robbery, theft from motor vehicle, and theft over $5000)

**2. Toronto Neighborhood Profiles (2016)**: Demographic and socioeconomic indicators from the 2016 census

**3. Toronto Neighborhood Profiles (2021)**: Updated demographic and socioeconomic indicators from the 2021 census

## Data Preparation

The raw data required significant preprocessing to enable meaningful analysis. We developed dedicated cleaning scripts for each dataset to standardize neighborhood names, select relevant columns, and transform the data into a consistent format.

In [None]:
# Load and clean crime data for both years
def clean_crime_data(crime_data_path, output_path):
    # Read raw crime data
    crime_data = pd.read_csv(crime_data_path)
    
    # Select relevant columns and filter by year
    crime_cols = ['AREA_NAME', 'AREA_ID', 'ASSAULT', 'AUTOTHEFT', 'BIKETHEFT', 
                  'BREAKENTER', 'ROBBERY', 'THEFTFROMMV', 'THEFTOVER']
    
    # Clean neighborhood names for consistent joins
    crime_data['AREA_NAME'] = crime_data['AREA_NAME'].apply(clean_name)
    
    # Calculate total crime counts and save
    crime_data['TOTAL_CRIME_COUNT'] = crime_data[crime_cols[2:]].sum(axis=1)
    crime_data.to_csv(output_path, index=False)
    
    return crime_data

The data cleaning process addressed several challenges:

**1. Inconsistent neighborhood naming**: Names required standardization (e.g., "St." vs "St", hyphenation differences)

**2. Different neighborhood counts**: 2016 data contains 140 neighborhoods while 2021 data includes 158

**3. Missing values**: Empty values were replaced with zeros where appropriate

**4. Column selection**: Only relevant demographic and crime variables were retained

After cleaning, we produced standardized datasets for both 2016 and 2021, enabling direct comparison.

## Dataset Features

The cleaned datasets contain the following key variables:

In [None]:
# Display key variables in our final datasets
merged_2021.columns.to_list()

### Crime Variables (2016 and 2021)

-`ASSAULT_2016/2021`: Assault incidents count

-`AUTOTHEFT_2016/2021`: Auto theft incidents count

-`BIKETHEFT_2016/2021`: Bicycle theft incidents count

-`BREAKENTER_2016/2021`: Break and enter incidents count

-`ROBBERY_2016/2021`: Robbery incidents count

-`THEFTFROMMV_2016/2021`: Theft from motor vehicle incidents count

-`THEFTOVER_2016/2021`: Theft over $5000 incidents count

-`TOTAL_CRIME_COUNT`: Sum of all crime counts

### Socioeconomic Variables (2016 and 2021)

- `total_population`: Total neighborhood population

- `low_income_percent`: Percentage of population considered low income

- `is_improvement_area`: Flag indicating if neighborhood is designated as an improvement area

## Exploratory Data Analysis

Initial exploration revealed interesting patterns in the Toronto neighborhood crime data:

In [None]:
# Basic descriptive statistics for 2016 vs 2021
print(f"Number of matched neighborhoods (2016): {len(merged_2016)}")
print(f"Number of matched neighborhoods (2021): {len(merged_2021)}")
print(f"Number of neighborhoods present in both years: {len(common_neighborhoods)}")

# Statistical summaries
merged_2016[crime_cols_2016].describe()
merged_2021[crime_cols_2021].describe()

Our preliminary analysis showed:

1. We successfully matched 124 neighborhoods in the 2016 dataset and 154 neighborhoods in the 2021 dataset

2. 120 neighborhoods were present in both datasets, enabling direct comparison

3. Crime patterns show significant variation across neighborhoods, with some areas experiencing substantially higher rates than others

## Data Integration

To analyze relationships between socioeconomic factors and crime, we merged the neighborhood profile and crime datasets:

In [None]:
# Merge datasets for analysis
merged_2016 = pd.merge(
    nbh_data_2016, 
    crime_data_2016,
    left_on='neighbourhood_name',
    right_on='AREA_NAME',
    how='inner'
)

merged_2021 = pd.merge(
    nbh_data_2021, 
    crime_data_2021,
    left_on='neighbourhood_name',
    right_on='AREA_NAME',
    how='inner'
)

This integration allowed us to create three primary analytical approaches:

**1. Box plot comparisons**: Visualizing crime distribution differences between 2016 and 2021

**2. Correlation matrices**: Examining how relationships between crime types changed over time

**3. Linear regression analysis**: Assessing how socioeconomic factors predict crime rates

## Data Limitations

Several limitations should be acknowledged:

**1. Neighborhood boundary changes**: Some neighborhood boundaries may have changed between 2016 and 2021

**2. COVID-19 impact**: The 2021 data likely reflects pandemic-related changes in crime patterns

**3. Reporting variations**: Changes in crime reporting practices could influence apparent trends

**4. Missing neighborhoods**: Not all neighborhoods have complete data for both time periods

Despite these limitations, the dataset provides valuable insights into the changing landscape of crime in Toronto and its relationship with socioeconomic factors.

In [None]:
# Methodology

In [None]:
# Analysis and Interpretation of Results

In [1]:
# Discussion

In [None]:
# References