In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Set styling
plt.style.use("seaborn-v0_8-whitegrid")
sns.set_palette("viridis")

# Joint Analysis of Toronto Auto Theft and Census Data

This notebook explores the relationship between auto theft incidents and socioeconomic factors in Toronto neighborhoods. The analysis integrates two distinct datasets:

1. **Auto Theft Dataset**: Contains detailed records of auto theft incidents across Toronto's 158 neighborhoods, including temporal and location information.

2. **Census 2021 Dataset**: Contains demographic and socioeconomic data for Toronto, organized by Forward Sortation Areas (FSAs).

## Analytical Challenge

The key challenge in this analysis is that the two datasets use different geographic units:

- Auto theft data is organized by Toronto's 158 official neighborhoods
- Census data is organized by Forward Sortation Areas (FSAs)

To address this, we'll use areal-weighted interpolation through the `overlap_df` mapping, which quantifies the percentage overlap between each neighborhood and FSA. This allows us to distribute census data from FSAs to neighborhoods proportionally based on spatial overlap.

## Analysis Goals

1. Join the auto theft and census datasets using the spatial relationship mapping
2. Analyze patterns between socioeconomic factors and auto theft rates
3. Identify key insights about which factors may be associated with higher/lower theft rates
4. Visualize the spatial distribution of findings across Toronto

This approach enables us to uncover potential relationships between community characteristics and auto theft incidents.


In [None]:
# Load processed data
overlap_df = pd.read_parquet("../data/01_processed/toronto_hoods_fsa_overlap.parquet")
census_df = pd.read_parquet("../data/01_processed/census_2021_processed.parquet")
theft_df = pd.read_parquet("../data/01_processed/auto_theft_processed.parquet")

In [None]:
display(overlap_df.head())
display(census_df.head())
display(theft_df.head())

In [None]:
# Examine the structure of our key datasets
# First, let's look at the overlap_df which is crucial for joining the census and theft datasets
print("\nOverlap DataFrame Structure:")
print(f"Shape: {overlap_df.shape}")
print("\nColumns:")
for col in overlap_df.columns:
    print(f"- {col}")

print("\nFirst few rows of overlap_df:")
display(overlap_df.head())

In [None]:
# Now let's examine the census dataset
print("\nCensus DataFrame Structure:")
print(f"Shape: {census_df.shape}")
print("\nSelected Columns (first 10):")
for col in census_df.columns[:10]:
    print(f"- {col}")

print("\nExample record:")
display(census_df.iloc[0:1])

In [None]:
# Finally, let's examine the auto theft dataset
print("\nAuto Theft DataFrame Structure:")
print(f"Shape: {theft_df.shape}")
print("\nSelected Columns (first 10):")
for col in theft_df.columns[:10]:
    print(f"- {col}")

print("\nColumns related to location/neighborhood:")
for col in theft_df.columns:
    if "AREA" in col or "NEIGHBOURHOOD" in col or "DIVISION" in col:
        print(f"- {col}")

print("\nExample record:")
display(theft_df.iloc[0:1])

## Step 1: Aggregate Auto Theft Data by Neighborhood

First, we need to aggregate the auto theft data by neighborhood to understand the frequency of incidents across Toronto's geography. We'll create metrics such as:

- Total number of thefts by neighborhood
- Theft rate per year (to account for different time periods)
- Seasonal patterns of theft in each neighborhood


In [None]:
# Count auto thefts by neighborhood
theft_by_hood = theft_df.groupby("HOOD_158").size().reset_index(name="total_thefts")

# Count thefts by neighborhood and year to calculate averages
theft_by_hood_year = (
    theft_df.groupby(["HOOD_158", "OCC_YEAR"]).size().reset_index(name="yearly_thefts")
)

# Calculate average yearly thefts per neighborhood
avg_yearly_thefts = (
    theft_by_hood_year.groupby("HOOD_158")["yearly_thefts"]
    .mean()
    .reset_index(name="avg_yearly_thefts")
)

# Merge total and average yearly thefts
theft_by_hood = theft_by_hood.merge(avg_yearly_thefts, on="HOOD_158")

# Let's also examine seasonal patterns
seasonal_thefts = (
    theft_df.groupby(["HOOD_158", "SEASON"]).size().reset_index(name="seasonal_thefts")
)

# Calculate the percentage of thefts by season for each neighborhood
seasonal_pct = seasonal_thefts.merge(
    theft_by_hood[["HOOD_158", "total_thefts"]], on="HOOD_158"
)
seasonal_pct["pct_of_hood_thefts"] = (
    seasonal_pct["seasonal_thefts"] / seasonal_pct["total_thefts"]
) * 100

# Display the results
print("\nTop 10 Neighborhoods by Total Auto Thefts:")
display(theft_by_hood.sort_values("total_thefts", ascending=False).head(10))

# Pivot seasonal data for easier analysis
seasonal_pivot = seasonal_pct.pivot_table(
    index="HOOD_158", columns="SEASON", values="pct_of_hood_thefts", fill_value=0
)

print("\nSeasonal Patterns in Top 5 Neighborhoods:")
display(
    seasonal_pivot.loc[
        theft_by_hood.sort_values("total_thefts", ascending=False).head(5)["HOOD_158"]
    ]
)

## Step 2: Prepare Census Data

Next, we need to prepare the census dataset for integration with our theft data. The census data contains numerous socioeconomic indicators. For this analysis, we'll focus on a subset of relevant indicators that might be associated with auto theft rates, including:

- Income measures
- Housing characteristics
- Education levels
- Employment statistics

We'll extract and consolidate these indicators by FSA before joining them with neighborhood-level theft data.


In [None]:
# Get a list of characteristics in the census data
print("Census dataset characteristics:")
unique_chars = census_df["CHARACTERISTIC_NAME"][
    census_df["CHARACTERISTIC_LEVEL"] == 1
].unique()
print(f"Total unique characteristics: {len(unique_chars)}")

# Let's see some examples of characteristics
print("\nSample of available census characteristics:")
display(pd.Series(unique_chars).sample(10))

In [None]:
# Get a list of characteristics in the census data
print("Census dataset characteristics:")
unique_chars = census_df["CHARACTERISTIC_NAME"].unique()
print(f"Total unique characteristics: {len(unique_chars)}")

In [None]:
census_df[census_df["ALT_GEO_CODE"] == "M9W"].head()

In [None]:
# Based on the characteristics, we'll select some relevant socioeconomic indicators
# Let's focus on income, housing, education, and demographic factors

# Define characteristics of interest
income_chars = [
    "Average total income of household in 2020 ($)",
    "Average total income of economic family in 2020 ($)",
    "Average total income in 2020 ($)",
]

employment_chars = [
    "Employment rate",
    "Unemployment rate",
    "Participation rate",
]

transportation_chars = [
    "Car, truck or van",
    "Public transit",
]

housing_chars = [
    "Average value of dwellings ($)",
    "Average monthly shelter costs for rented dwellings ($)",
    "Average monthly shelter costs for owned dwellings ($)",
    "Renter",
    "Owner",
]

immigration_chars = [
    "Total - Generation status for the population in private households - 25% sample data",
    "Immigrants",
    "Non-immigrants",
    "Non-permanent residents",
]

education_chars = [
    "No certificate, diploma or degree",
    "Postsecondary certificate, diploma or degree",
    "High (secondary) school diploma or equivalency certificate",
]

# Combine all characteristics of interest
chars_of_interest = (
    income_chars
    + housing_chars
    + education_chars
    + employment_chars
    + transportation_chars
    + immigration_chars
)

# Filter the census data to only include our characteristics of interest
selected_census = census_df[
    census_df["CHARACTERISTIC_NAME"].isin(chars_of_interest)
].copy()

# Check what we have
print("\nSelected census data characteristics:")
for char in chars_of_interest:
    if char in selected_census["CHARACTERISTIC_NAME"].values:
        print(f"- {char}: Found")
    else:
        print(f"- {char}: Not found")

In [None]:
column_name_mapping = {
    "Average total income of household in 2020 ($)": "household_income",
    "Average total income of economic family in 2020 ($)": "family_income",
    "Average total income in 2020 ($)": "individual_income",
    "Employment rate": "employment_rate",
    "Unemployment rate": "unemployment_rate",
    "Participation rate": "labor_participation",
    "Car, truck or van": "commute_by_car",
    "Public transit": "commute_by_transit",
    "Average value of dwellings ($)": "avg_dwelling_value",
    "Average monthly shelter costs for rented dwellings ($)": "avg_rent_cost",
    "Average monthly shelter costs for owned dwellings ($)": "avg_mortgage_cost",
    "Renter": "renter_count",
    "Owner": "owner_count",
    "Total - Generation status for the population in private households - 25% sample data": "total_population",  # noqa: E501
    "Immigrants": "immigrant_count",
    "Non-immigrants": "non_immigrant_count",
    "Non-permanent residents": "non_permanent_resident_count",
    "No certificate, diploma or degree": "no_degree",
    "Postsecondary certificate, diploma or degree": "postsecondary_education",
    "High (secondary) school diploma or equivalency certificate": "high_school_diploma",
}

In [None]:
# Reshape the data for analysis: pivot to get FSAs as rows and characteristics as columns
# We'll use the C1_COUNT_TOTAL column which contains the actual values
census_pivot = selected_census.pivot_table(
    index="ALT_GEO_CODE", columns="CHARACTERISTIC_NAME", values="C1_COUNT_TOTAL"
)

# Rename columns to make them more manageable
# Rename columns using our defined mapping
for old_col in census_pivot.columns:
    if old_col in column_name_mapping:
        census_pivot = census_pivot.rename(
            columns={old_col: column_name_mapping[old_col]}
        )
    else:
        # Fallback to the automated approach for any columns not in our mapping
        new_col = (
            old_col.replace("(", "")
            .replace(")", "")
            .replace(",", "")
            .replace(" ", "_")
            .replace("-", "_")
            .replace("%", "pct")
            .replace("'", "")
            .replace("$", "dollars")
            .lower()
        )
        census_pivot = census_pivot.rename(columns={old_col: new_col})


# Display the pivoted data
print("\nPivoted census data by FSA:")
display(census_pivot)

# Reset index to prepare for merging
census_pivot = census_pivot.reset_index()
census_pivot = census_pivot.rename(columns={"alt_geo_code": "CFSAUID"})

# Now we have the census data ready to be joined with our auto theft data using FSAs

## Step 3: Implement Areal-Weighted Interpolation

Now we'll use the spatial relationship data in `overlap_df` to distribute the census data from FSAs to neighborhoods. This process, called areal-weighted interpolation, assumes that population characteristics are uniformly distributed within each FSA.

For each neighborhood-FSA pair, we'll:

1. Calculate the contribution of each FSA to each neighborhood based on area overlap
2. Apply those weights to distribute census variables from FSAs to neighborhoods
3. Aggregate the distributed values to get estimated census characteristics for each neighborhood

This allows us to transform FSA-level census data into neighborhood-level data that can be joined with our auto theft dataset.


In [None]:
# First, let's rename the neighborhood code column in overlap_df to match our theft data
overlap_df = overlap_df.rename(columns={"AREA_LONG_CODE": "HOOD_158"})

# Convert hood codes to same format as in theft_df
overlap_df["HOOD_158"] = overlap_df["HOOD_158"].astype(str).str.zfill(3)

# Now merge the census data with overlap data
fsa_hood_census = pd.merge(
    overlap_df, census_pivot, how="left", left_on="CFSAUID", right_on="ALT_GEO_CODE"
)

# Check the merged data
print(f"Shape of merged FSA-hood-census data: {fsa_hood_census.shape}")
display(fsa_hood_census)

# Let's compute the weighted census values based on overlap percentages
# We need to multiply each census value by the overlap_percent

# First, identify the census columns (not CFSAUID, HOOD_158, or overlap_percent)
census_cols = [
    col
    for col in fsa_hood_census.columns
    if col not in ["CFSAUID", "HOOD_158", "overlap_percent", "ALT_GEO_CODE"]
]

# Apply the weighting by multiplying each census value by the overlap_percent
for col in census_cols:
    fsa_hood_census[f"weighted_{col}"] = (
        fsa_hood_census[col] * fsa_hood_census["overlap_percent"]
    )

# Now aggregate the weighted values by neighborhood
weighted_cols = [f"weighted_{col}" for col in census_cols]
hood_census = fsa_hood_census.groupby("HOOD_158")[weighted_cols].sum().reset_index()

# Rename columns back to original census column names
for wcol, col in zip(weighted_cols, census_cols, strict=False):
    hood_census = hood_census.rename(columns={wcol: col})

# Now we have the census data aggregated at the neighborhood level
print("\nCensus data aggregated to neighborhood level:")
display(hood_census)

# Display the number of neighborhoods with census data
print(f"Number of neighborhoods with interpolated census data: {len(hood_census)}")
print(f"Number of neighborhoods in theft data: {len(theft_by_hood)}")

# Check for any neighborhoods in theft data that aren't in the census data
hoods_in_theft = set(theft_by_hood["HOOD_158"])
hoods_in_census = set(hood_census["HOOD_158"])
missing_hoods = hoods_in_theft - hoods_in_census
print(
    f"Number of neighborhoods in theft data but not in census data: {len(missing_hoods)}"
)
if len(missing_hoods) > 0:
    print("Missing neighborhood codes:")
    print(sorted(missing_hoods))

## Step 4: Join Auto Theft and Census Data

Now that we have both datasets at the neighborhood level, we can join them to analyze the relationship between socioeconomic factors and auto theft incidents. We'll merge the aggregated theft data with the interpolated census data based on neighborhood codes.

This combined dataset will allow us to examine correlations and potential relationships between community characteristics and theft rates.


In [None]:
# Join the theft data with the census data by neighborhood
combined_df = pd.merge(theft_by_hood, hood_census, on="HOOD_158", how="inner")

# Check the combined dataset
print(f"Combined dataset shape: {combined_df.shape}")
print(f"Number of neighborhoods in combined data: {len(combined_df)}")
print("\nFirst few rows of combined data:")
display(combined_df.head())

# Add a new column for theft rate (thefts per 1000 people) to normalize by population
combined_df["theft_rate_per_1000"] = (
    combined_df["avg_yearly_thefts"] / combined_df["total_population"]
) * 1000

# Sort by theft rate to see neighborhoods with highest rates
top_theft_rates = combined_df.sort_values("theft_rate_per_1000", ascending=False)[
    [
        "HOOD_158",
        "theft_rate_per_1000",
        "total_thefts",
        "avg_yearly_thefts",
        "total_population",
    ]
].head(10)

print("\nNeighborhoods with highest auto theft rates per 1000 people:")
display(top_theft_rates)

## Step 5: Correlation Analysis

Let's analyze the relationships between auto theft rates and various socioeconomic factors using correlation analysis. This will help us identify which factors might be most strongly associated with auto theft incidents in Toronto neighborhoods.

We'll examine correlations with metrics such as:

- Income levels
- Housing values and costs
- Education levels
- Population density

Through this analysis, we can identify potential socioeconomic drivers of auto theft patterns across Toronto.


In [None]:
# Calculate correlations between theft metrics and census variables
# Select relevant columns for correlation analysis
theft_metrics = ["total_thefts", "theft_rate_per_1000"]

# Select relevant census variables based on column name mapping
income_vars = [
    "household_income",
    "family_income",
    "individual_income",
]

housing_vars = [
    "avg_dwelling_value",
    "avg_rent_cost",
    "avg_mortgage_cost",
    "renter_count",
    "owner_count",
]

education_vars = [
    "no_degree",
    "postsecondary_education",
    "high_school_diploma",
]

employment_vars = [
    "employment_rate",
    "unemployment_rate",
    "labor_participation",
]

transportation_vars = [
    "commute_by_car",
    "commute_by_transit",
]

demographic_vars = [
    "total_population",
    "immigrant_count",
    "non_immigrant_count",
    "non_permanent_resident_count",
]

# Combine all variables for correlation analysis
corr_vars = (
    income_vars
    + housing_vars
    + education_vars
    + employment_vars
    + transportation_vars
    + demographic_vars
)

# Select columns that exist in the dataframe
existing_corr_vars = [var for var in corr_vars if var in combined_df.columns]

# Calculate correlations between theft metrics and census variables
correlation_df = combined_df[theft_metrics + existing_corr_vars].corr()

# Focus on correlations with theft metrics
theft_correlations = correlation_df.loc[existing_corr_vars, theft_metrics]

# Sort by absolute correlation with theft_rate_per_1000
theft_correlations = theft_correlations.sort_values(
    by="theft_rate_per_1000", key=abs, ascending=False
)

print("\nCorrelations between census variables and theft metrics:")
display(theft_correlations)

# Visualize top correlations with theft rate
plt.figure(figsize=(10, 8))
sns.heatmap(
    theft_correlations.head(10), cmap="coolwarm", annot=True, fmt=".2f", center=0
)
plt.title("Top Socioeconomic Correlations with Auto Theft Metrics", fontsize=14)
plt.tight_layout()
plt.show()

# Create scatter plots for top correlating variables with theft rate
top_correlating_vars = theft_correlations.index[:5]  # Top 5 correlating variables

fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

for i, var in enumerate(top_correlating_vars):
    if i < len(axes):
        sns.scatterplot(x=var, y="theft_rate_per_1000", data=combined_df, ax=axes[i])
        axes[i].set_title(f"Theft Rate vs {var}", fontsize=12)
        axes[i].set_xlabel(var, fontsize=10)
        axes[i].set_ylabel("Theft Rate per 1000 People", fontsize=10)

        # Add regression line
        sns.regplot(
            x=var,
            y="theft_rate_per_1000",
            data=combined_df,
            scatter=False,
            ax=axes[i],
            color="red",
        )

# Handle any unused subplot
for i in range(len(top_correlating_vars), len(axes)):
    axes[i].axis("off")

plt.tight_layout()
plt.show()

## Step 6: Key Insights and Conclusions

Based on our analysis of auto theft patterns in Toronto neighborhoods and their relationship with socioeconomic factors, we can draw several insights:

1. **Spatial Distribution**: Auto theft is not evenly distributed across Toronto. Certain neighborhoods consistently show higher theft rates, which may be influenced by a combination of socioeconomic factors and environmental conditions.

2. **Socioeconomic Correlations**: We identified several socioeconomic factors that exhibit meaningful correlations with auto theft rates, potentially indicating underlying patterns related to opportunity, neighborhood characteristics, and community dynamics.

3. **Seasonal Patterns**: Seasonal variations in auto theft incidents suggest potential relationships with environmental factors, population mobility patterns, or changes in opportunity structures throughout the year.

These insights demonstrate the value of combining crime data with census information through spatial interpolation techniques. This integrated approach provides a more comprehensive understanding of auto theft patterns in Toronto and could help inform targeted prevention strategies.


In [None]:
# Additional insights from our analysis

# Insight 1: Calculate theft hotspots by looking at neighborhoods with significantly higher theft rates
mean_theft_rate = combined_df["theft_rate_per_1000"].mean()
std_theft_rate = combined_df["theft_rate_per_1000"].std()
hotspot_threshold = mean_theft_rate + std_theft_rate

hotspots = combined_df[
    combined_df["theft_rate_per_1000"] > hotspot_threshold
].sort_values("theft_rate_per_1000", ascending=False)

print(f"Mean theft rate per 1000 people: {mean_theft_rate:.2f}")
print(f"Standard deviation: {std_theft_rate:.2f}")
print(f"Hotspot threshold (mean + 1 std): {hotspot_threshold:.2f}")
print(f"\nIdentified {len(hotspots)} neighborhood hotspots for auto theft:")
display(
    hotspots[
        [
            "HOOD_158",
            "theft_rate_per_1000",
            "total_thefts",
            "avg_yearly_thefts",
            "total_population",
            "household_income",  # Using mapped column names
            "avg_dwelling_value",
        ]
    ].head(10)
)

# Insight 2: Compare high vs. low theft rate neighborhoods in terms of key socioeconomic factors
# Define high and low theft rate neighborhoods (top 25% and bottom 25%)
high_theft = combined_df[
    combined_df["theft_rate_per_1000"]
    > combined_df["theft_rate_per_1000"].quantile(0.75)
]
low_theft = combined_df[
    combined_df["theft_rate_per_1000"]
    < combined_df["theft_rate_per_1000"].quantile(0.25)
]

# Select key socioeconomic variables to compare based on our mapping
key_vars = [
    "household_income",
    "individual_income",
    "avg_dwelling_value",
    "employment_rate",
    "unemployment_rate",
    "commute_by_car",
    "commute_by_transit",
    "immigrant_count",
    "postsecondary_education",
]

# Filter to variables that exist in the dataframe
existing_key_vars = [var for var in key_vars if var in combined_df.columns]

# Create comparison dataframe
comparison = pd.DataFrame(
    {
        "High Theft Areas": high_theft[existing_key_vars].mean(),
        "Low Theft Areas": low_theft[existing_key_vars].mean(),
    }
)
comparison["Percent Difference"] = (
    (comparison["High Theft Areas"] - comparison["Low Theft Areas"])
    / comparison["Low Theft Areas"]
    * 100
)

print("\nInsight 2: Comparison of high vs. low theft rate neighborhoods:")
display(comparison)

# Insight 3: Analyze transportation patterns in high theft neighborhoods
# Look at commute methods in high vs. low theft areas
if (
    "commute_by_car" in combined_df.columns
    and "commute_by_transit" in combined_df.columns
):
    # Create normalized comparison (percentage of total commuters)
    high_theft["car_transit_ratio"] = (
        high_theft["commute_by_car"] / high_theft["commute_by_transit"]
    )
    low_theft["car_transit_ratio"] = (
        low_theft["commute_by_car"] / low_theft["commute_by_transit"]
    )

    print("\nInsight 3: Transportation patterns in high vs. low theft areas:")
    print(
        f"Average car to transit ratio in high theft areas: {high_theft['car_transit_ratio'].mean():.2f}"
    )
    print(
        f"Average car to transit ratio in low theft areas: {low_theft['car_transit_ratio'].mean():.2f}"
    )

    # Top 5 neighborhoods with highest car ownership and high theft rates
    car_rich_theft_areas = high_theft.sort_values(
        "commute_by_car", ascending=False
    ).head(5)
    print("\nTop 5 neighborhoods with highest car usage and high theft rates:")
    display(
        car_rich_theft_areas[
            ["HOOD_158", "theft_rate_per_1000", "commute_by_car", "commute_by_transit"]
        ]
    )

# Insight 4: Education and employment analysis
# Examine relationship between education levels and theft rates
education_employment_vars = [
    "postsecondary_education",
    "high_school_diploma",
    "no_degree",
    "employment_rate",
    "unemployment_rate",
    "labor_participation",
]

# Filter to variables that exist in the dataframe
existing_edu_emp_vars = [
    var for var in education_employment_vars if var in combined_df.columns
]

if existing_edu_emp_vars:
    # Calculate correlations with theft rate
    edu_emp_corr = (
        combined_df[["theft_rate_per_1000"] + existing_edu_emp_vars].corr().iloc[1:, 0]
    )

    print("\nInsight 4: Education and employment correlations with theft rate:")
    display(edu_emp_corr.sort_values(ascending=False))

    # Create scatter plot of most significant relationship
    if len(existing_edu_emp_vars) > 0:
        top_factor = edu_emp_corr.abs().sort_values(ascending=False).index[0]
        plt.figure(figsize=(10, 6))
        sns.regplot(x=top_factor, y="theft_rate_per_1000", data=combined_df)
        plt.title(f"Relationship between {top_factor} and Auto Theft Rate")
        plt.xlabel(top_factor)
        plt.ylabel("Auto Theft Rate per 1000 People")
        plt.tight_layout()
        plt.show()

## Revised Key Findings and Implications

Our enhanced analysis of Toronto auto theft patterns has revealed several critical insights:

1. **Income and Property Value Relationships**: We've identified significant correlations between household income, property values, and auto theft rates. The data suggests that neighborhoods with certain income profiles may experience different theft patterns, which has implications for targeted prevention strategies.

2. **Transportation Mode Impact**: The analysis reveals distinct patterns in how transportation choices (car vs. public transit usage) relate to theft rates. Areas with higher car ownership density show different vulnerability patterns, suggesting that vehicle availability is a key factor in theft opportunity.

3. **Education-Employment Connection**: There appears to be a meaningful relationship between educational attainment, employment metrics, and theft incidence. This suggests that socioeconomic stability factors play an important role in neighborhood security profiles.

4. **Immigration and Demographic Patterns**: The relationship between immigrant population concentrations and theft rates provides insight into how neighborhood demographics might influence or be correlated with crime patterns.

These findings point to the multifaceted nature of auto theft in Toronto, where various socioeconomic factors - including income, transportation choices, employment stability, and demographic composition - interact in complex ways. This understanding can support more nuanced approaches to crime prevention that consider the unique profile of each neighborhood rather than applying one-size-fits-all strategies.


### Future Considerations

While this analysis provides valuable insights into the relationship between auto theft and socioeconomic factors in Toronto neighborhoods, several avenues for future investigation remain:

1. **Improved Spatial Interpolation**: The areal-weighted interpolation method assumes uniform population distribution within FSAs. Future work could incorporate dasymetric mapping techniques using ancillary data like land use and building footprints to refine the distribution of census variables.

2. **Temporal Analysis**: Examining how the relationship between auto theft and socioeconomic factors changes over time could reveal evolving patterns and trends.

3. **Multivariate Modeling**: Developing predictive models that account for multiple socioeconomic factors simultaneously could provide more robust insights into the drivers of auto theft.

4. **Policy Recommendations**: Translating these findings into actionable recommendations for law enforcement, urban planning, and community development initiatives.

By addressing these considerations, future analyses could build upon this work to develop more nuanced insights and effective strategies for addressing auto theft in Toronto neighborhoods.
