# Exploring the Spatial Relationship Between FSAs and Toronto Neighbourhoods

This notebook explores the spatial relationship between Forward Sortation Areas (FSAs) used in the 2021 Canadian Census and the City of Toronto’s 158 official neighbourhoods. The goal is to develop a method for converting census data organized by FSAs into neighbourhood-based data, enabling integration with Toronto’s auto theft dataset that use neighbourhoods as the geographic key.

## Understanding FSAs and Neighbourhoods

- **Forward Sortation Areas (FSAs):** The first three characters of Canadian postal codes (e.g., M5V, M4C), used by Canada Post to define regions for mail delivery. FSAs can span large areas and may cross municipal boundaries.
- **Toronto’s 158 Neighbourhoods:** Defined by the City of Toronto for urban planning and community development, these neighbourhoods are based on social, economic, and historical factors and are used for organizing city services and statistical analysis.

## Relationship Between FSAs and Neighbourhoods

FSAs and neighbourhoods are defined for different purposes and do not align perfectly:

- **One FSA, Multiple Neighbourhoods:** An FSA may cover parts of several neighbourhoods (e.g., FSA M5V includes the Entertainment District, King West, and parts of Queen West).
- **One Neighbourhood, Multiple FSAs:** A single neighbourhood might span multiple FSAs (e.g., The Annex falls within both M5R and M5S).

This overlap arises because FSAs are designed for postal delivery efficiency, while neighbourhoods reflect community identity and administrative needs.

## Methodology: Areal-Weighted Interpolation

To join census data (by FSA) with neighbourhood-based datasets, we establish a spatial relationship between FSAs and neighbourhoods. The common approach is areal-weighted interpolation, which assumes that population is uniformly distributed within each FSA. Overlapping regions between FSAs and neighbourhoods receive a proportion of the population based on the area of overlap.

**Assumption:**  
Population is distributed uniformly within each FSA. This means that, for overlapping areas, census data is allocated proportionally by area.

**Limitations:**

- Urban population density can vary significantly within an FSA due to zoning, commercial areas, and uninhabited spaces.
- Uniform distribution may lead to inaccuracies in neighbourhood-level estimates.

**Potential Improvements:**

- **Dasymetric Mapping:** Uses ancillary data (e.g., land use, building footprints) to refine population allocation.
- **Building Footprint Analysis:** Allocates population based on residential building locations and types.

While these advanced methods offer greater accuracy, they require more detailed data and analysis. If such resources are unavailable, areal-weighted interpolation with the uniform distribution assumption remains a practical, if imperfect, solution.

---

This notebook documents the workflow for spatially joining FSAs and neighbourhoods, calculating area overlaps, and proportionally distributing census data to enable integrated analysis of Toronto’s neighbourhood-based datasets.


In [None]:
import geopandas as gpd

Neighbourhoods data for Toronto is available from the City of Toronto's Open Data portal:  
https://open.toronto.ca/dataset/neighbourhoods/
It can be directly downloaded as a GeoJSON file: `neighbourhoods_158.geojson`

Forward Sortation Area (FSA) boundaries for Canada can be downloaded from Statistics Canada:  
https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?year=21

**Preprocessing steps:**

- The full FSA dataset from Statistics Canada was filtered in QGIS to include only FSAs within the City of Toronto. The result was exported as `toronto_fsa.geojson`.
- Both the neighbourhoods and FSA datasets were loaded as GeoDataFrames.
- To ensure accurate area calculations, both layers were reprojected to an equal-area coordinate reference system (EPSG:3347).
- These prepared datasets are used for spatial analysis and area-based interpolation in the following steps.


In [None]:
# Load Toronto's neighborhoods shapefile
hood_file = "../data/00_raw/neighbourhoods_158.geojson"
fsa_file = "../data/00_raw/toronto_fsa.geojson"

hoods = gpd.read_file(hood_file)
fsas = gpd.read_file(fsa_file)

display(hoods.head())
display(fsas.head())

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')

# Load spatial data for visualization
try:
    print(f"Loaded neighborhoods: {len(hoods)}")
    print(f"Loaded FSAs: {len(fsas)}")

    # Plot basic maps to confirm data is loaded correctly
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7))

    hoods.plot(ax=ax1, edgecolor='black', alpha=0.5)
    ax1.set_title("Toronto Neighborhoods")
    ax1.set_axis_off()

    fsas.plot(ax=ax2, edgecolor='black', alpha=0.5)
    ax2.set_title("Toronto FSAs")
    ax2.set_axis_off()

    plt.tight_layout()
    plt.show()

except Exception as e:
    print(f"Error loading spatial data: {e}")
    print("Skipping visualization")

In [None]:
# Reproject to equal-area CRS for accurate area calculation
hoods = hoods.to_crs("EPSG:3347")
fsas = fsas.to_crs("EPSG:3347")

In [None]:
# Calculate original neighbourhood areas
fsas["fsa_area"] = fsas.geometry.area

# Spatial intersection
intersect = gpd.overlay(fsas, hoods, how="intersection")
intersect["intersect_area"] = intersect.geometry.area

# Calculate percentage overlap
intersect = intersect.merge(fsas[["CFSAUID"]], on="CFSAUID")
intersect["overlap_percent"] = intersect["intersect_area"] / intersect["fsa_area"]

# drop rows with overlap less than 0.1%
intersect = intersect[intersect["overlap_percent"] >= 0.001]

display(intersect[["AREA_LONG_CODE", "CFSAUID", "overlap_percent"]])

In [None]:
# group by CFSAUID
grouped = intersect.groupby("CFSAUID").agg({
    "AREA_LONG_CODE": list,
    "overlap_percent": "sum"
}).reset_index()
display(grouped)

In [None]:
# group by AREA_LONG_CODE
grouped = intersect.groupby("AREA_LONG_CODE").agg({
    "CFSAUID": list,
    "overlap_percent": "sum"
}).reset_index()
display(grouped)