# **Final Project Proposal:** <br> Examining the Relationship between Green Space Access, Income, and Health 

**4 November 2025** 

#### **Authors**
Patrick A. Mikkelsen // Student Number: 1010572514 // <patrick.mikkelsen@utoronto.ca> <br>
Mike McCracken // Student Number: [add your student number please] // <mike.mccracken@utoronto.ca> <br>
Hugo Cheng // Student Number: [add your student number please] // <hugo.cheng@mail.utoronto.ca>

#### **Course Information**
**Instructor:** Professor Ignacio Tiznado-Aitken <br>
**Course:** GGR375H1 F: Introduction to Programming in GIS <br>
**TA:** Evan Powers

---

## 1. Introduction and Research Question
[Describe the problem you're investigating and state your main research question]

## 2. Background and Motivation
[Explain why this topic is important and what gap in knowledge you're addressing]

## 3. Data Sources
[List the datasets you plan to use, including:
- Dataset names
- Sources/URLs
- Spatial resolution
- Temporal coverage
- Key variables]

### Required Libraries

This analysis utilizes Python libraries for geospatial analysis ([GeoPandas](https://geopandas.org/en/stable/)), statistical computing ([SciPy](https://docs.scipy.org/doc/scipy/), [NumPy](https://numpy.org/doc/stable/)), data manipulation ([Pandas](https://pandas.pydata.org/docs/)), and visualization ([Matplotlib](https://matplotlib.org/), [Seaborn](https://seaborn.pydata.org/), [Folium](https://python-visualization.github.io/folium/latest/)).

In [2]:
import geopandas as gpd
import pandas as pd
import folium
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

In [5]:
census_boundaries_path = "Green Space & Income/data/lct_000b21a_e/lct_000b21a_e.shp"
census_data_path = "Green Space & Income/data/98-401-X2021007_eng_CSV/98-401-X2021007_English_CSV_data.csv"
census_sd_path = "Green Space & Income/data/lcsd000b21a_e/lcsd000b21a_e.shp"

### Census Tract Boundaries

[Census tract boundaries from Statistics Canada's 2021 Census](https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?Year=21) are loaded and transformed to the WGS84 coordinate reference system (EPSG:4326). Census tracts represent small, relatively stable geographic units designed to be homogeneous with respect to population characteristics, economic status, and living conditions.

In [6]:
census_boundaries = gpd.read_file(census_boundaries_path)
census_boundaries = census_boundaries.to_crs(epsg=4326)

### Census Demographic and Economic Data

The 2021 Census Profile dataset contains demographic and socioeconomic characteristics for all census geographic areas in Canada. Census tract-level records are extracted for subsequent analysis.

In [7]:
census_data = pd.read_csv(census_data_path, 
                          encoding="latin1", low_memory=False)
ct_data = census_data[census_data['GEO_LEVEL'] == 'Census tract'].copy()

: 

### Spatial Delimitation to City of Toronto

To focus the analysis on Toronto proper (Census Subdivision Code 3520005), census tract boundaries are clipped to the municipal boundary, excluding surrounding municipalities within the Greater Toronto Area.

In [None]:
csd_boundaries = gpd.read_file(census_sd_path)
toronto_csd = csd_boundaries[csd_boundaries['CSDUID'] == '3520005'].copy()

if toronto_csd.crs != census_boundaries.crs:
    toronto_csd = toronto_csd.to_crs(census_boundaries.crs)

toronto_boundaries = gpd.clip(census_boundaries, toronto_csd)
toronto_boundaries = toronto_boundaries.to_crs(epsg=4326)

toronto_boundaries.write_file("toronto_census_tracts.shp")

NameError: name 'gpd' is not defined

### Population and Income Variables

Population counts and median household income data are extracted from the census profile. These variables serve as the dependent and independent measures for analyzing green space equity.

In [None]:
population_data = toronto_ct_data[
    (toronto_ct_data['CHARACTERISTIC_NAME'] == 'Population, 2021')
][['DGUID', 'C1_COUNT_TOTAL']].copy()
population_data.rename(columns={'C1_COUNT_TOTAL': 'POPULATION'}, inplace=True)
population_data['POPULATION'] = pd.to_numeric(population_data['POPULATION'], errors='coerce')

income_data = toronto_ct_data[
    toronto_ct_data['CHARACTERISTIC_NAME'].str.contains('Median total income of household', case=False, na=False)
][['DGUID', 'C1_COUNT_TOTAL']].copy()
income_data.rename(columns={'C1_COUNT_TOTAL': 'MEDIAN_INCOME'}, inplace=True)
income_data['MEDIAN_INCOME'] = pd.to_numeric(income_data['MEDIAN_INCOME'], errors='coerce')

### Green Space Data

Municipal green space data includes parks, ravines, golf courses, and other vegetated areas within Toronto's boundaries. Spatial data are standardized to the WGS84 coordinate system for compatibility with census boundaries.

In [None]:
green_spaces = gpd.read_file("Green Spaces - 4326/Green Spaces - 4326.shp")

## 4. Methodology
[Outline your analytical approach:
- Data preprocessing steps
- Spatial analysis techniques
- Statistical methods
- Python packages you'll use (e.g., geopandas, rasterio, scikit-learn)]

## 5. Expected Outputs
[Describe what you plan to produce:
- Maps/visualizations
- Statistical results
- Python scripts/modules]

## 6. Timeline
[Break down the project into tasks with tentative deadlines]

## 7. Challenges and Limitations
[Identify potential obstacles and how you plan to address them]

## 8. References
[List preliminary sources and similar studies]