In [1]:
# Methods

## Empirical Strategy

This study estimates the relationship between historical redlining and present-day access to STEM coursework in U.S. K–12 schools. Specifically, I test whether schools located in historically lower-rated Home Owners’ Loan Corporation (HOLC) neighborhoods (Grades “C” or “D”) are less likely to offer advanced STEM courses such as calculus, physics, chemistry, or computer science.

Following the boundary discontinuity design introduced by Aaronson et al. (2021, 2022), the analysis compares schools located immediately on either side of HOLC map boundaries. This approach isolates the local effects of redlining by exploiting sharp changes in historical lending grades across otherwise similar adjacent neighborhoods. By restricting attention to schools within narrow buffers around the boundaries (e.g., within 0.25–0.5 miles), the design holds constant broader city-level and regional factors.

The baseline estimating equation is:

**Y<sub>igs</sub> = α + β·REDLINE<sub>g</sub> + X<sub>igs</sub>γ + δ<sub>s</sub> + η<sub>b</sub> + ε<sub>igs</sub>**

where:
- **Y<sub>igs</sub>** represents a measure of STEM access for school *i* in geographic unit *g* within city (or CBSA) *s*.
- **REDLINE<sub>g</sub>** equals 1 if the school lies within a historically lower-rated HOLC zone (Grade “C” or “D”).
- **X<sub>igs</sub>** includes contemporary socioeconomic and demographic controls.
- **δ<sub>s</sub>** captures city or CBSA fixed effects.
- **η<sub>b</sub>** captures boundary fixed effects.
- The coefficient **β** measures the association between historical redlining and present-day STEM access.

## Outcomes

The primary outcomes measure whether a school offers advanced STEM coursework.

**Binary outcomes**
- 1 if the school offers AP Calculus AB/BC  
- 1 if the school offers AP Physics, AP Chemistry, or AP Computer Science  
- 1 if any advanced STEM course (calculus, physics, computer science) is offered  

**Continuous outcomes (optional)**
- Share of students with access to advanced STEM courses  
- Share of students enrolled in these courses  

These outcomes capture the degree of opportunity for students to pursue rigorous STEM preparation in high school.

## Data Sources

1. **Historical Redlining Data — Mapping Inequality Project (University of Richmond)**  
   - Digitized HOLC redlining boundaries and grades (A–D).  
   - Used to classify each school’s historical redline grade and identify boundary pairs for the local comparison design.  

2. **School-Level STEM Access — Civil Rights Data Collection (CRDC)**  
   - National dataset on school course offerings, including calculus, physics, computer science, and AP STEM courses.  
   - Provides the dependent variables for STEM access.  

3. **School Finance and Characteristics — NCES Common Core of Data (CCD)**  
   - School- and district-level data on enrollment, per-pupil funding, and demographics.  
   - Used as controls and mediators linking property values to educational resources.  

4. **Socioeconomic Context — American Community Survey (ACS) 5-Year Estimates**  
   - Tract-level covariates: median household income, poverty rate, % Black, % Hispanic, % foreign-born, % adults with BA or higher, population density, median home value, and unemployment rate.  
   - Used to control for present-day neighborhood characteristics.  

5. **Optional Extensions**  
   - **IPEDS:** Degree completions by STEM field and race.  
   - **HMDA or Zillow data:** Modern property values and mortgage lending trends for robustness checks.

## Identification and Robustness

To address potential pre-existing differences across HOLC boundaries:
- Restrict analysis to narrow boundary buffers (0.25, 0.5, 1 mile).  
- Include boundary fixed effects for local comparison.  
- Use propensity-weighted samples for covariate balance.  
- Conduct placebo tests using unrelated outcomes (e.g., humanities AP courses).  

## Interpretation

Under this design, **β** can be interpreted as the *local effect of historical redlining on present-day STEM access* for schools near redlining boundaries. This extends the Aaronson et al. (2021, 2022) framework from economic outcomes to educational opportunity.


SyntaxError: invalid character '–' (U+2013) (2733447935.py, line 5)

In [None]:
# Data Construction

To implement the boundary discontinuity design, I combine historical redlining boundaries from the 1930s with modern school-level and neighborhood-level data. The construction proceeds in four main steps: geocoding, spatial joins, boundary-pair identification, and variable construction.

## Step 1: Geocoding and Spatial Reference

School-level coordinates are obtained from the **NCES Common Core of Data (CCD)**.  
All geographic layers — school locations, HOLC boundaries, and census tracts — are projected into a consistent coordinate system (NAD83 / EPSG:4269) to ensure spatial accuracy.

## Step 2: Mapping Schools to HOLC Grades

Historical redlining maps are sourced from the **Mapping Inequality Project** (Nelson et al., University of Richmond).  
These maps contain digitized polygons for the original HOLC neighborhood grades (A: “Best”, B: “Still Desirable”, C: “Declining”, D: “Hazardous”).  

Each school is spatially joined to the HOLC polygon it falls in.  
The resulting dataset assigns each school a categorical HOLC grade and an indicator variable:

**REDLINE<sub>i</sub> = 1** if the school lies in a “C” or “D” area  
**REDLINE<sub>i</sub> = 0** if the school lies in an “A” or “B” area

## Step 3: Constructing Boundary Comparison Pairs

Following **Aaronson et al. (2021, 2022)**, I focus on schools within a fixed distance of HOLC boundaries.  
Using GIS operations, I create buffer zones (e.g., 0.25–0.5 miles) on each side of every boundary segment.  
Schools are then assigned to boundary pairs based on proximity to the same local boundary line.  

Each pair contains schools from adjacent neighborhoods with different grades but similar environments.  
Boundary-level fixed effects (**η<sub>b</sub>**) absorb unobserved local heterogeneity.

## Step 4: Merging Educational and Neighborhood Data

- **STEM Access (Dependent Variables):**  
  From the **Civil Rights Data Collection (CRDC)** — indicators for whether a school offers calculus, physics, computer science, chemistry, or AP STEM courses.  

- **School Characteristics:**  
  From the **NCES CCD** — total enrollment, grade span, per-pupil spending, % free/reduced lunch, % minority, % female.  

- **Neighborhood Controls:**  
  From **ACS 5-Year Estimates (2020–2024)** — median income, poverty rate, racial composition, education, home value, unemployment.  
  Schools are spatially joined to census tracts for accurate matching.

## Step 5: Analytical Sample and Weights

The final sample includes public high schools within 0.5 miles of a HOLC boundary.  
Schools are weighted by enrollment for robustness checks.  
Balance across boundaries is assessed using pre-treatment covariates (e.g., 1940 census).  
Propensity-score reweighting is applied to ensure comparability.

---

This dataset structure allows estimation of the local effect of redlining on STEM course access, consistent with the boundary comparison design of Aaronson et al. (2021, 2022).
