In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Toronto Auto Theft Data: Exploratory Data Analysis (EDA)

This notebook explores, cleans, and analyzes the Toronto auto theft dataset. We address data quality, perform feature engineering, and extract key insights about temporal and spatial patterns of auto theft in Toronto.

In [None]:
theft_data_path = "../data/00_raw/auto_theft.csv"
df_theft = pd.read_csv(theft_data_path)
print("--- Head ---")
display(df_theft.head())
print("\n--- Info ---")
df_theft.info()
print("\n--- Describe ---")
display(df_theft.describe(include="all"))

From this first glance, we can understand a lot about the theft dataset:

- **Content & Scope**: The dataset contains 68,063 records of auto theft incidents in Toronto, detailed across 31 columns. The data spans from 2014 to 2024.
- **Data Granularity**: We have detailed temporal information (year, month, day, hour) for both the report date and the occurrence date. We also have specific location data, including police division, neighborhood name, and geographic coordinates.
- **Key Variables**:
  - **Temporal**: `REPORT_DATE`, `OCC_DATE`, and their component columns.
  - **Geospatial**: `DIVISION`, `NEIGHBOURHOOD_140`, `NEIGHBOURHOOD_158`, `LONG_WGS84`, `LAT_WGS84`.
  - **Categorical**: `OCC_DOW`, `OCC_Month`, `REPORT_DOW`, `REPORT_Month`, `LOCATION_TYPE` (e.g., `Parking Lot`, `Street`), `PREMISES_TYPE` (e.g., `Commercial`, `Residential`).
- **Potential Issues**:
  - **Missing Data**: There are 4 missing values in the occurrence date columns (`OCC_YEAR`, `OCC_MONTH`, etc.).
  - **Incorrect Data Types**:
    - `REPORT_DATE` and `OCC_DATE` are object (text) types, not dates. `OCC_YEAR` and `OCC_DAY` are float64 instead of integers.
    - For memory efficiency, categorical text and date component columns can be converted to categorical types respectively.
  - **Redundancy**:
    - `OFFENCE`, `MCI_CATEGORY`, `UCR_CODE`, and `UCR_EXT` columns have only one unique value and can be removed.
    - Neighborhood names and IDs `HOOD_158` and `NEIGHBOURHOOD_158` are based on City of Toronto's new 158 neighbourhood structure, while `HOOD_140` and `NEIGHBOURHOOD_140` are based on the old 140 neighbourhood structure. The 158 structure is more recent and granular and should be prioritized.
  - **Data Errors**:
    - The minimum values for `LAT_WGS84` and `LONG_WGS84` are 0, which is an invalid coordinate for Toronto and indicates data entry errors.
    - The null values for categorical variables like `DIVISION`, `HOOD_158`, and `NEIGHBOURHOOD_158` has been specified as "NSA" (Not Specified), which is a placeholder and should be treated as missing data.
  - **Trailing Whitespace**: `REPORT_DOW` and `OCC_DOW` have trailing whitespace, which needs to be stripped before converting to categorical types.
  - **Duplicate Events**: `EVENT_UNIQUE_ID` is not entirely unique, suggesting some events might have multiple entries.


In [None]:
# Data type and cleaning setup
date_columns = ["REPORT_DATE", "OCC_DATE"]
column_dtypes = {
    "EVENT_UNIQUE_ID": "object",
    "REPORT_YEAR": "Int16",
    "REPORT_MONTH": "category",
    "REPORT_DAY": "Int16",
    "REPORT_DOY": "Int16",
    "REPORT_HOUR": "Int16",
    "OCC_YEAR": "Int16",
    "OCC_MONTH": "category",
    "OCC_DAY": "Int16",
    "OCC_DOY": "Int16",
    "OCC_HOUR": "Int16",
    "DIVISION": "category",
    "LOCATION_TYPE": "category",
    "PREMISES_TYPE": "category",
    "HOOD_158": "category",
    "NEIGHBOURHOOD_158": "category",
    "LONG_WGS84": "float64",
    "LAT_WGS84": "float64",
}
columns_to_drop = [
    "OBJECTID",
    "OFFENCE",
    "MCI_CATEGORY",
    "HOOD_140",
    "NEIGHBOURHOOD_140",
    "x",
    "y",
    "UCR_CODE",
    "UCR_EXT",
]

na_values_dict = {
    "LAT_WGS84": [0, "0", "0.0"],
    "LONG_WGS84": [0, "0", "0.0"],
    "DIVISION": ["NSA"],
    "HOOD_158": ["NSA"],
    "NEIGHBOURHOOD_158": ["NSA"],
}
converter = {"REPORT_DOW": str.strip, "OCC_DOW": str.strip}

df_theft_clean = pd.read_csv(
    theft_data_path,
    parse_dates=date_columns,
    dtype=column_dtypes,
    usecols=lambda col: col not in columns_to_drop,
    na_values=na_values_dict,
    converters=converter,
)
df_theft_clean["REPORT_DOW"] = df_theft_clean["REPORT_DOW"].astype("category")
df_theft_clean["OCC_DOW"] = df_theft_clean["OCC_DOW"].astype("category")
display(df_theft_clean.head())

## Missing Data Visualization
Visualize missing data to understand patterns and plan imputation.

In [None]:
plt.figure(figsize=(12, 6))
sns.heatmap(df_theft_clean.isnull(), cbar=False, yticklabels=False)
plt.title("Missing Values Heatmap")
plt.show()

## Data Deduplication and Imputation

Let's do some basic checks in order to plan for further cleaning. First,
we check if duplicate events are due to multiple entries for the same event or if it is a mistake in creating the ids in `EVENT_UNIQUE_ID`.
It seems that timestamps in `REPORT_DATE` and `OCC_DATE` are all 05:00:00, which is likely a placeholder. We need to check if all events have the same timestamp or if there are some variations. Finally with obtain the number of missing values in the dataset.


In [None]:
# drop non-unique rows (all columns must match)
df_unique_rows = df_theft_clean.drop_duplicates(keep="first")
print(f"\n--- Number of unique rows: {len(df_unique_rows)} ---")

# check if the number of unique rows is different
# from the number of unique EVENT_UNIQUE_ID
if len(df_unique_rows) != df_unique_rows["EVENT_UNIQUE_ID"].nunique():
    print("Warning: EVENT_UNIQUE_ID duplicates differ in other columns.")
else:
    print("--- All duplicated EVENT_UNIQUE_ID have the same values in all columns ---")

# check if timestamps in REPORT_DATE and OCC_DATE are equal
for col in ["REPORT_DATE", "OCC_DATE"]:
    if df_theft_clean[col].dt.time.nunique() == 1:
        print(f"All {col} timestamps are the same.")

# Display the number of null values in each column
print("\n--- Number of null values in each column ---")
print(df_theft_clean.isnull().sum())

- There are 61534 unique `EVENT_UNIQUE_ID` values and duplicate events have the same info. So we can drop the duplicates.
- It can be seen that the timestamps in `REPORT_DATE` and `OCC_DATE` are all 05:00:00. We will replace these with values inferred from the `REPORT_HOUR` and `OCC_HOUR` columns, which are more accurate.
- There are some missing values in geospatial columns, which we need to investigate further.


We analyze the missingness patterns in key geospatial columns of the df_theft_clean DataFrame. This helps understand the extent and nature of missing geospatial information, which is crucial for deciding on imputation strategies or data exclusion.


In [None]:
na_geo = df_theft_clean["LAT_WGS84"].isna() & df_theft_clean["LONG_WGS84"].isna()
na_hood = df_theft_clean["HOOD_158"].isna()
na_div = df_theft_clean["DIVISION"].isna()

print("rows lacking LAT/LONG only      :", (na_geo & ~na_hood & ~na_div).sum())
print("rows lacking hood but have geo  :", (~na_geo & na_hood).sum())
print("rows lacking div  but have geo  :", (~na_geo & na_div).sum())
print("rows missing all three          :", (na_geo & na_hood & na_div).sum())
print("rows lacking hood but have div  :", (na_hood & ~na_div).sum())

Based on the results of the previous analysis, the imputation and data cleaning strategy is as follows:

- The timestamp in `REPORT_DATE` and `OCC_DATE` is replaced with the values inferred from the `REPORT_HOUR` and `OCC_HOUR` columns.
- Duplicate events are dropped, retaining only unique `EVENT_UNIQUE_ID` entries.
- Rows missing `OCC_DATE` components (`OCC_YEAR`, `OCC_MONTH`, `OCC_DAY`, `OCC_HOUR`) are dropped (4 rows).
- Calendar months names are converted to integer type for faster operations.
- Rows missing all three (363 rows, less than 1%): These rows are entirely dropped from the dataset because they lack any geospatial information.

- Rows lacking hood but have division (433 rows):
  - The missing `HOOD_158` and `NEIGHBOURHOOD_158` for these rows is imputed. The strategy is to use the most frequent `HOOD_158` (the mode) associated with the `DIVISION` present in that row.
  - A lookup table (`hood_mode_by_div`) is created by grouping the data (where `HOOD_158` is not null) by `DIVISION` and finding the modal `HOOD_158` for each division.
  - This modal `HOOD_158` is then used to fill the missing `HOOD_158` values for rows where `DIVISION` is known.
  - Subsequently, the `NEIGHBOURHOOD_158` (the name of the neighborhood) is filled by looking up the imputed `HOOD_158` in a mapping created from non-null `HOOD_158` and `NEIGHBOURHOOD_158` pairs.
- Imputing Missing Latitude/Longitude: After the neighborhood information is filled (either originally present or imputed as described above), any remaining missing `LAT_WGS84` and `LONG_WGS84` values are imputed.
  - This is done by calculating the median latitude and longitude (centroid) for each `HOOD_158`.
  - Rows with missing coordinates are then filled with the centroid coordinates of their respective `HOOD_158`.


In [None]:
# fix the hour time in REPORT_DATE and OCC_DATE based on REPORT_HOUR and OCC_HOUR
df_theft_clean["REPORT_DATE"] = df_theft_clean.apply(
    lambda x: x["REPORT_DATE"].replace(hour=int(x["REPORT_HOUR"]), minute=0, second=0)
    if pd.notnull(x["REPORT_HOUR"])
    else x["REPORT_DATE"],
    axis=1,
)
df_theft_clean["OCC_DATE"] = df_theft_clean.apply(
    lambda x: x["OCC_DATE"].replace(hour=int(x["OCC_HOUR"]), minute=0, second=0)
    if pd.notnull(x["OCC_HOUR"])
    else x["OCC_DATE"],
    axis=1,
)

# drop duplicated rows again after fixing the time
df_theft_clean = df_theft_clean.drop_duplicates(keep="first").reset_index(drop=True)

# drop rows with OCC_DATE components that are all null
df_theft_clean = df_theft_clean.dropna(
    subset=["OCC_YEAR", "OCC_DAY", "OCC_DOY"],
    how="all",
)

# drop rows with no geospatial information
mask = (
    df_theft_clean["LAT_WGS84"].isna()
    & df_theft_clean["HOOD_158"].isna()
    & df_theft_clean["DIVISION"].isna()
)
df_theft_clean = df_theft_clean[~mask].reset_index(drop=True)

# impute hoods with the division mode
## 1. Build a lookup: DIVISION  ->  modal HOOD_158
hood_mode_by_div = (
    df_theft_clean[df_theft_clean["HOOD_158"].notna()]  # only rows with a known hood
    .groupby("DIVISION")["HOOD_158"]
    .agg(lambda s: s.mode().iat[0])  # pick first of modes if tie
)

## 2. Build a lookup to get the neighbourhood *name* as well
name_lookup = (
    df_theft_clean[["HOOD_158", "NEIGHBOURHOOD_158"]]
    .dropna()
    .drop_duplicates()
    .set_index("HOOD_158")["NEIGHBOURHOOD_158"]
)

## 3. Apply to rows whose hood is missing but division known
mask = df_theft_clean["HOOD_158"].isna() & df_theft_clean["DIVISION"].notna()

df_theft_clean.loc[mask, "HOOD_158"] = df_theft_clean.loc[mask, "DIVISION"].map(
    hood_mode_by_div
)
df_theft_clean.loc[mask, "NEIGHBOURHOOD_158"] = df_theft_clean.loc[
    mask, "HOOD_158"
].map(name_lookup)


# Impute missing values in LONG_WGS84 and LAT_WGS84
# calculate centroids for each neighbourhood
centroids = df_theft_clean.groupby("HOOD_158", observed=False)[
    ["LAT_WGS84", "LONG_WGS84"]
].median()

# fillna with the centroid of the same neighbourhood
df_theft_clean[["LAT_WGS84", "LONG_WGS84"]] = (
    df_theft_clean.set_index("HOOD_158")
    .join(centroids, rsuffix="_cent")
    .assign(
        LAT_WGS84=lambda x: x["LAT_WGS84"].fillna(x["LAT_WGS84_cent"]),
        LONG_WGS84=lambda x: x["LONG_WGS84"].fillna(x["LONG_WGS84_cent"]),
    )
    .drop(columns=["LAT_WGS84_cent", "LONG_WGS84_cent"])
    .reset_index()[["LAT_WGS84", "LONG_WGS84"]]
)

df_theft_clean.head()

## Data Sanity and Consistency Checks

This section performs several checks to ensure the integrity and consistency of the cleaned dataset (`df_theft_clean`).

1.  **Geographic Coordinate Validation**:
    It verifies that all latitude (`LAT_WGS84`) and longitude (`LONG_WGS84`) coordinates fall within the approximate boundaries of Toronto (latitude between 43.5 and 44.0, longitude between -79.8 and -79.0).

1.  **Occurrence and Report Date Logic**:
    It checks if the occurrence date (`OCC_DATE`) is always earlier than or equal to the report date (`REPORT_DATE`).

1.  **Date Component Verification (Occurrence Date)**:
    It compares the individual date components (`OCC_YEAR`, `OCC_MONTH`, `OCC_DAY`, `OCC_DOY`, `OCC_DOW`) against the corresponding parts extracted from the `OCC_DATE` timestamp.

1.  **Date Component Verification (Report Date)**:
    It compares the individual date components (`REPORT_YEAR`, `REPORT_MONTH`, `REPORT_DAY`, `REPORT_DOY`, `REPORT_DOW`) against the corresponding parts extracted from the `REPORT_DATE` timestamp.


In [None]:
# Check if the coordinates are within the expected range
mask = (
    (df_theft_clean["LAT_WGS84"] >= 43.5)
    & (df_theft_clean["LAT_WGS84"] <= 44.0)
    & (df_theft_clean["LONG_WGS84"] >= -79.8)
    & (df_theft_clean["LONG_WGS84"] <= -79.0)
)
if not mask.all():
    print("Warning: Some coordinates are outside the expected range.")
    print(df_theft_clean.loc[~mask, ["LAT_WGS84", "LONG_WGS84"]].head())
else:
    print("All coordinates are within the expected range.")

# check if OCC_DATE <= REPORT_DATE
mask = df_theft_clean["OCC_DATE"] <= df_theft_clean["REPORT_DATE"]
if not mask.all():
    print("Warning: Some OCC_DATE are later than REPORT_DATE.")
    print(df_theft_clean.loc[~mask, ["OCC_DATE", "REPORT_DATE"]].head())
else:
    print("All OCC_DATE are earlier than or equal to REPORT_DATE.")

# check if values of date columns are between 2014 and 2024
valid_years = range(2013, 2025)  # 2014 to 2024 inclusive
mask = df_theft_clean["OCC_YEAR"].isin(valid_years) & df_theft_clean[
    "REPORT_YEAR"
].isin(valid_years)
if not mask.all():
    num_invalid = (~mask).sum()
    print(f"Warning: {num_invalid} dates are outside the valid range (2014-2024).")
    print(
        df_theft_clean.loc[~mask, ["OCC_YEAR", "REPORT_YEAR", "REPORT_DATE"]].head(100)
    )
else:
    print("All dates are within the valid range (2014-2024).")

# check if Year/month/day columns match the date columns
date_component_map = {
    "YEAR": "year",
    "MONTH": "month_name",
    "DAY": "day",
    "DOY": "dayofyear",
    "DOW": "day_name",
}

for prefix in ["OCC", "REPORT"]:
    date_col_name = f"{prefix}_DATE"  # e.g., "OCC_DATE"
    for comp_suffix, dt_attr in date_component_map.items():
        comp_col_name = f"{prefix}_{comp_suffix}"  # e.g., "OCC_YEAR"

        if comp_suffix not in ["DOW", "MONTH"]:
            # Extract values from the main date column (e.g., OCC_DATE.dt.year)
            date_derived_values = getattr(
                df_theft_clean[date_col_name].dt, dt_attr
            ).astype("Int16")
            component_col_values = pd.to_numeric(
                df_theft_clean[comp_col_name], errors="coerce"
            ).astype("Int16")
        else:
            # For DOW and MONTH, we use the category dtype directly
            date_derived_values = getattr(df_theft_clean[date_col_name].dt, dt_attr)()
            component_col_values = df_theft_clean[comp_col_name].astype("object")

        if not date_derived_values.equals(component_col_values):
            print(f"Warning: {comp_col_name} does not match {date_col_name}.")
            mismatched_mask = (date_derived_values != component_col_values) & (
                date_derived_values.notna() | component_col_values.notna()
            )
            if mismatched_mask.any():
                print(f"Mismatched entries for {comp_col_name}:")
                print(
                    df_theft_clean.loc[
                        mismatched_mask, [date_col_name, comp_col_name]
                    ].head()
                )
        else:
            print(f"{comp_col_name} matches {date_col_name}.")

There are 19 events with `OCC_DATE`s that are significantly earlier than the reporting period which this dataset covers, which is considered as an anomaly. While it is possible that some reports are filed long after the actual occurrence, this is not common in auto theft cases and we drop these rows.


In [None]:
# Drop rows with OCC_DATE before 2013
df_theft_clean = df_theft_clean[df_theft_clean["OCC_DATE"] >= "2013-01-01"].reset_index(
    drop=True
)

## Feature Engineering

This section focuses on creating new features from existing columns to enhance the dataset for analysis and modeling. The new features are:

1.  **`OCC_TIME_BIN`**: Categorizes the `OCC_HOUR` into discrete time bins:

    - **Night**: 00:00 - 05:59 and 22:00 - 23:59
    - **Morning**: 06:00 - 11:59
    - **Afternoon**: 12:00 - 17:59
    - **Evening**: 18:00 - 21:59

2.  **`SEASON`**: Derives the season from the `OCC_MONTH`:

    - **Winter**: December, January, February
    - **Spring**: March, April, May
    - **Summer**: June, July, August
    - **Autumn**: September, October, November

3.  **`IS_WEEKEND`**: A boolean feature indicating whether the `OCC_DOW` falls on a weekend (Saturday or Sunday).


In [None]:
# OCC_TIME_BIN
hour_bins = [-1, 5, 11, 17, 21, 23]  # Bins: Night, Morning, Afternoon, Evening, Night
hour_labels = ["Night", "Morning", "Afternoon", "Evening", "Night"]
df_theft_clean["OCC_TIME_BIN"] = pd.cut(
    df_theft_clean["OCC_HOUR"], bins=hour_bins, labels=hour_labels, ordered=False
)

# SEASON
# Month to season mapping
season_map = {
    "January": "Winter",
    "February": "Winter",
    "March": "Spring",
    "April": "Spring",
    "May": "Spring",
    "June": "Summer",
    "July": "Summer",
    "August": "Summer",
    "September": "Autumn",
    "October": "Autumn",
    "November": "Autumn",
    "December": "Winter",
}
df_theft_clean["SEASON"] = df_theft_clean["OCC_MONTH"].map(season_map)
df_theft_clean["SEASON"] = pd.Categorical(
    df_theft_clean["SEASON"],
    categories=["Winter", "Spring", "Summer", "Autumn"],
    ordered=False,
)


# IS_WEEKEND
# OCC_DOW
df_theft_clean["IS_WEEKEND"] = df_theft_clean["OCC_DOW"].isin(["Saturday", "Sunday"])

df_theft_clean[
    ["OCC_HOUR", "OCC_TIME_BIN", "OCC_MONTH", "SEASON", "OCC_DOW", "IS_WEEKEND"]
].head()

In [None]:
print("\n--- df_theft_clean.info() ---")
df_theft_clean.info()
print("\n--- df_theft_clean.describe() ---")
display(df_theft_clean.describe(include="all"))

In [None]:
# Get the value counts for OCC_TIME_BIN
time_bin_counts = df_theft_clean["OCC_TIME_BIN"].value_counts()

# Create the bar plot using seaborn
plt.figure(figsize=(10, 6))
sns.barplot(
    x=time_bin_counts.index,
    y=time_bin_counts.values,
    palette="viridis",
    hue=time_bin_counts.index,
    legend=False,
)
plt.title("Occurrences by Time of Day")
plt.xlabel("Time of Day")
plt.ylabel("Number of Occurrences")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Get the value counts for OCC_DOW
dow_counts = df_theft_clean["OCC_DOW"].value_counts()

dow_counts = dow_counts.reindex(
    [
        "Monday",
        "Tuesday",
        "Wednesday",
        "Thursday",
        "Friday",
        "Saturday",
        "Sunday",
    ],
    fill_value=0,
)

# Create the bar plot using seaborn
plt.figure(figsize=(10, 6))
sns.barplot(
    x=dow_counts.index,
    y=dow_counts.values,
    palette="viridis",
    hue=dow_counts.index,
    legend=False,
)
plt.title("Occurrences by Day of Week")
plt.xlabel("Day of Week")
plt.ylabel("Number of Occurrences")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# Get the value counts for SEASON
season_counts = df_theft_clean["SEASON"].value_counts()

# calculate the percentage of occurrences in each season
season_percentages = season_counts / season_counts.sum() * 100

display(season_percentages)

# Create the bar plot using seaborn
plt.figure(figsize=(10, 6))
sns.barplot(
    x=season_counts.index,
    y=season_counts.values,
    palette="viridis",
    hue=season_counts.index,
    legend=False,
)
plt.title("Occurrences by Season")
plt.xlabel("Season")
plt.ylabel("Number of Occurrences")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# Get the value counts for DIVISION
division_counts = df_theft_clean["DIVISION"].value_counts()

# Create the bar plot using seaborn
plt.figure(figsize=(12, 6))
sns.barplot(
    x=division_counts.index,
    y=division_counts.values,
    palette="viridis",
    hue=division_counts.index,
    legend=False,
)
plt.title("Occurrences by Police Division")
plt.xlabel("Police Division")
plt.ylabel("Number of Occurrences")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# Get the value counts for PREMISES_TYPE
premises_counts = df_theft_clean["PREMISES_TYPE"].value_counts()

# Create the bar plot using seaborn
plt.figure(figsize=(10, 6))
sns.barplot(
    x=premises_counts.index,
    y=premises_counts.values,
    palette="viridis",
    hue=premises_counts.index,
    legend=False,
)
plt.title("Occurrences by Premises Type")
plt.xlabel("Premises Type")
plt.ylabel("Number of Occurrences")
plt.xticks(rotation=45, ha="right") # Rotate labels for better readability
plt.tight_layout()
plt.show()


## Key Insights from Cleaned Auto Theft Data

The initial cleaning and feature engineering steps have yielded several important insights into the Toronto auto theft dataset:

**Data Quality & Integrity:**

- **Successful Cleaning**: The dataset now comprises 61,196 unique auto theft events, with almost all missing values addressed.
- **Optimized Data Types**: Appropriate data types (`datetime64[ns]`, `category`, `Int16`, `boolean`) have been applied, significantly reducing memory usage from over 16MB (the initial ingestion) to approximately 4.6MB.
- **Unique Event IDs**: All `EVENT_UNIQUE_ID` entries are now unique, ensuring each row represents a distinct theft incident.

**Temporal Patterns:**

- **Time of Day for Occurrence**:
  - Thefts are most frequently reported as occurring during the **Night** (23,021 incidents).
- **Seasonal Trends**:
  - Thefts are relatively evenly distributed throughout the year, with a slight peak in **Autumn** (16,498 incidents).
- **Weekly Patterns**:
  - Thefts are relatively evenly distributed throughout the week. However majority of thefts occur on **Thursday** (9,429 incidents) and weekends has the lowest number of thefts.
- **Reporting Lag**: The `REPORT_DATE` often differs from the `OCC_DATE`, indicating a delay between the theft and its reporting. The average report date is around October 23rd, while the average occurrence date is around October 18th for the dataset's timespan.

**Geospatial & Location-Based Patterns:**

- **Top Hotspot (Neighbourhood)**:
  - **West Humber-Clairville (HOOD_158: 001)** consistently emerges as the neighbourhood with the highest number of auto thefts (4,781 incidents).
- **Top Hotspot (Police Division)**:
  - **Division D23** reports the highest number of auto thefts (8,466 incidents), aligning with the West Humber-Clairville hotspot.
- **Common Theft Locations**:
  - **Parking Lots (Apartment, Commercial, or Non-Commercial)** are the most common `LOCATION_TYPE` for auto thefts (21,308 incidents).
- **Common Premises Type**:
  - The most frequent `PREMISES_TYPE` is **Outside** (33,024 incidents), which includes vehicles stolen from streets and unenclosed areas.
