# Datathon Fall 2024: SFUSD Student Program Optimization

Team Name: JJJ

Contributors: James Vuong, Jay Huang, Jonathan Mui

For this project, our objective was to assign student applicants to either Daniel Webster Elementary School or Mission Bay Elementary School based on their local region in San Francisco. Our solution leverages student application data, demographic characteristics, and census block group data to partition applicants into cohorts that maintain socioeconomic diversity and maximize educational capacity.

### Overview

1) Defining the Problem: Motivated by the high demand for seats in the Kindergarten General Education Program at Webster (Daniel) Elementary School, the city of San Francisco began construction on Mission Bay Elementary School to accommodate growing demand. Our main objective was to assign regions (called block groups in our dataset) in San Francisco to Webster and Mission Bay respectively. 

2) Obtain the Data: Using data provided by SFUSD, we defined variables to store all block groups, webster block groups, school applications, and program request information. Choosing applications as our main dataset, we merged data from our other datasets to the applications dataset. After cleaning data and imputing missing values, we prepared our data for modeling. 

3) Understanding the Data: We used summary statistics and visualizations to analyze both the observed data and any sample data we generated. 

4) Our Solution: Using the final DataFrame containing the optimized school assignments, we assigned each region in San Francisco (denoted as "Block groups") to either "Webster" or "Mission Bay" Elementary. This assignment was determined based on the majority school preference of applicants residing in each region. To clearly present our results, we created a map of San Francisco, highlighting each region in blue or gold to represent its assignment to Mission Bay Elementary or Daniel Webster Elementary, respectively. This visualization provides a clear and intuitive way to understand the new attendance boundaries.

In [1]:
import pandas as pd
import numpy as np
import random

import plotly.express as px
import matplotlib.pyplot as plt
import geopandas as gpd
import folium
import mapclassify

np.random.seed(42)

### Load Datasets

In [2]:
webster_blockgroups = pd.read_csv("/work/Blockgroups to ESAA Spreadsheet with student counts - Webster ESAA Census.csv")
applications = pd.read_csv("/work/SHAREABLE 23-24 Application Request Data - Requests.csv")
requests = pd.read_csv("/work/SHAREABLE 23-24 Application Request Data - Requests.csv")

In [3]:
applications.head()

Unnamed: 0,Anon idRequest,SchoolName,Grade,ProgramCode,BerkeleyID,Rank,RequestStatus,RequestStatusReason,Gender,Race,Language Spoken by Student at Home,Elementary Attendance Area Code,Elementary Attendance Area,Block Group
0,56712,Lilienthal (Claire) K-8,6,GE,112732100,1,8,Pre-placement Request.,F,Korean,English,456.0,Bryant ES,60750230000.0
1,33251,Buena Vista Horace Mann K-8,6,SN,990015001,1,8,Pre-placement Request.,M,Hispanic,Spanish,,,
2,33434,Buena Vista Horace Mann K-8,7,SN,990040001,1,8,Pre-placement Request.,F,Hispanic,Spanish,521.0,El Dorado ES,60750260000.0
3,40043,Hoover (Herbert) MS,7,SN,990087001,1,8,Pre-placement Request.,F,Hispanic,Uncoded languages (Other non-English languages),575.0,Glen Park ES,60750260000.0
4,33624,Buena Vista Horace Mann K-8,7,SN,990094001,1,8,Pre-placement Request.,M,Hispanic,Uncoded languages (Other non-English languages),,,


In [4]:
requests.head()

Unnamed: 0,Anon idRequest,SchoolName,Grade,ProgramCode,BerkeleyID,Rank,RequestStatus,RequestStatusReason,Gender,Race,Language Spoken by Student at Home,Elementary Attendance Area Code,Elementary Attendance Area,Block Group
0,56712,Lilienthal (Claire) K-8,6,GE,112732100,1,8,Pre-placement Request.,F,Korean,English,456.0,Bryant ES,60750230000.0
1,33251,Buena Vista Horace Mann K-8,6,SN,990015001,1,8,Pre-placement Request.,M,Hispanic,Spanish,,,
2,33434,Buena Vista Horace Mann K-8,7,SN,990040001,1,8,Pre-placement Request.,F,Hispanic,Spanish,521.0,El Dorado ES,60750260000.0
3,40043,Hoover (Herbert) MS,7,SN,990087001,1,8,Pre-placement Request.,F,Hispanic,Uncoded languages (Other non-English languages),575.0,Glen Park ES,60750260000.0
4,33624,Buena Vista Horace Mann K-8,7,SN,990094001,1,8,Pre-placement Request.,M,Hispanic,Uncoded languages (Other non-English languages),,,


In [5]:
webster_blockgroups.head()

Unnamed: 0,E_AA_NAME,Label (Grouping),Geoid1,Weighted HH Inc,Median HH Income,Count of current SFUSD Kindergarten students,Total (Household Count Census 2020.csv),Total: (Population Race P3 Census 2020.csv),Total: (Population Race P8 Census 2020.csv),Total:!!White alone,...,Total:!!American Indian and Alaska Native alone,Total:!!Native Hawaiian and Other Pacific Islander alone,% White,% Asian,% Two or More,% Black,% Other,% American Indian,% Pacific Islander,Unnamed: 23
0,,TOTAL or Average,,35623,"$143,575",,49955,90551,90551,36428,...,734,363,40.2%,35.7%,9.5%,6.9%,6.5%,0.8%,0.4%,100.0%
1,WEBSTER,Block Group 1; Census Tract 124.05; San Franci...,60750120000.0,$836,"$42,383",,703,1332,1332,512,...,19,3,38.4%,15.3%,10.1%,17.0%,17.6%,1.4%,0.2%,100.0%
2,WEBSTER,Block Group 1; Census Tract 176.02; San Franci...,60750180000.0,$0,,,953,1898,1898,545,...,26,3,28.7%,37.8%,7.0%,14.0%,11.0%,1.4%,0.2%,100.0%
3,WEBSTER,Block Group 1; Census Tract 176.03; San Franci...,60750180000.0,"$1,527","$73,403",,741,1151,1151,430,...,14,2,37.4%,37.5%,9.2%,9.2%,5.3%,1.2%,0.2%,100.0%
4,WEBSTER,Block Group 1; Census Tract 176.04; San Franci...,60750180000.0,"$2,436","$123,944",,700,921,921,350,...,4,2,38.0%,44.0%,8.1%,3.8%,5.4%,0.4%,0.2%,100.0%


### Data Cleaning and Exploratory Data Analysis

Looking at the data and the Webster Attendance Area, we found the only elementary school located within the Webster Attendance Area to be Daniel Webster Elementary School, denoted "Webster (Daniel) ES" in the data.

In [6]:
WEBSTER_SCHOOL = 'Webster (Daniel) ES'

We filter our applications data to only contain applications to Daniel Webster Elementary School, specifically to the Kindergarten General Education program. We also drop redundant columns and columns that will not be considered in our model, such as student gender and language spoken at home.  

In [7]:
# filter applications to only have GE programs with grade level K at WEBSTER_SCHOOL
filter_applications = applications[applications["ProgramCode"] == "GE"]
filter_applications = filter_applications[filter_applications["Grade"] == "K"]
filter_applications = filter_applications[filter_applications["SchoolName"] == WEBSTER_SCHOOL]

# drop student gender and language
filter_applications = filter_applications.drop(columns = ["ProgramCode", "Grade", "Gender", "Language Spoken by Student at Home"])

filter_applications.head()

Unnamed: 0,Anon idRequest,SchoolName,BerkeleyID,Rank,RequestStatus,RequestStatusReason,Race,Elementary Attendance Area Code,Elementary Attendance Area,Block Group
47223,1434,Webster (Daniel) ES,990313144,1,11,Designation has been made.,Asian Indian,746.0,Ortega (Jose) ES,60750310000.0
53717,55862,Webster (Daniel) ES,990417743,2,10,Request removed; a rank 1 placement was made,Hispanic,513.0,Taylor (Edward R) ES,60750260000.0
54194,5988,Webster (Daniel) ES,990419995,3,10,Request removed; a rank 1 placement was made,Black or African American,625.0,Carver (Dr George W) ES,60750230000.0
54577,55815,Webster (Daniel) ES,99042277,5,10,Request removed; a rank 1 placement was made,Hispanic,497.0,Webster (Daniel) ES,60750610000.0
54602,5970,Webster (Daniel) ES,990422864,1,8,Final placement made,Black or African American,625.0,Carver (Dr George W) ES,60750610000.0


Since we are also considering the income of applicants, we need to combine data from the 2020 US Census containing information about median household income for each block group.

Because some block groups within the Webster attendance area were missing median household income data, we imputed missing values with the mean of available household income data.

In [8]:
# merge DataFrames on "Block Group" and "Geoid1"
with_income = pd.merge(left = filter_applications, right = webster_blockgroups[['Geoid1', 'Median HH Income']], 
    left_on = 'Block Group', right_on='Geoid1')

# prepare strings to be converted to integers
with_income['Median HH Income'] = with_income['Median HH Income'].fillna("00").str[1:].str.replace(",", "").astype(int)

# calculate mean of the available median household income
mean_income = int(with_income[with_income["Median HH Income"] > 0]["Median HH Income"].mean())

# impute missing values (now represented with 0) with the mean calculated above
with_income["Median HH Income"] = with_income["Median HH Income"].apply(lambda x: mean_income if x == 0 else x)
with_income.head()

Unnamed: 0,Anon idRequest,SchoolName,BerkeleyID,Rank,RequestStatus,RequestStatusReason,Race,Elementary Attendance Area Code,Elementary Attendance Area,Block Group,Geoid1,Median HH Income
0,55815,Webster (Daniel) ES,99042277,5,10,Request removed; a rank 1 placement was made,Hispanic,497.0,Webster (Daniel) ES,60750610000.0,60750610000.0,184365
1,94599,Webster (Daniel) ES,99053140,2,8,Placement made for the highest (rank > 1) TA,White,497.0,Webster (Daniel) ES,60750610000.0,60750610000.0,184365
2,55774,Webster (Daniel) ES,99053159,7,10,Tentative acceptance removed; only rank 6 reta...,Hispanic,497.0,Webster (Daniel) ES,60750610000.0,60750610000.0,184365
3,5971,Webster (Daniel) ES,990535937,10,10,Request removed; a rank 1 placement was made,Black or African American,497.0,Webster (Daniel) ES,60750610000.0,60750610000.0,184365
4,60714,Webster (Daniel) ES,99057082,11,11,Designation has been made.,Not Specified,497.0,Webster (Daniel) ES,60750610000.0,60750610000.0,184365


There were also missing values for the `Race` column, which we decided to resolve by imputing "Not Specified" for missing values.

Finally, we drop unrelated or redundant columns to arrive at our cleaned filtered DataFrame.

In [9]:
# include income in filter_applications
filter_applications = with_income
# impute missing values for `Race`
filter_applications["Race"] = filter_applications["Race"].fillna("Not Specified")
# drop unrelated columns
filter_applications = filter_applications.drop(columns = ["Geoid1", "Rank", "RequestStatus", "RequestStatusReason", "Elementary Attendance Area Code", "Elementary Attendance Area", "SchoolName", "Anon idRequest"])
filter_applications.head()

Unnamed: 0,BerkeleyID,Race,Block Group,Median HH Income
0,99042277,Hispanic,60750610000.0,184365
1,99053140,White,60750610000.0,184365
2,99053159,Hispanic,60750610000.0,184365
3,990535937,Black or African American,60750610000.0,184365
4,99057082,Not Specified,60750610000.0,184365


### Defining Summary Statistics for our Model

The summary statistics function computes the mean household income and race distribution of an input DataFrame

In [10]:
def summary_statistics(df):
    """
    input: DataFrame

    returns: the mean income of the DataFrame, followed by two arrays, the first containing string labels of races
    and the second containing the corresponding proportion of each race in the DataFrame
    """
    races = ['White', 'Black or African American', 'Hispanic', 'Chinese',
        'Filipino', 'Asian Indian', 'Korean', 'Vietnamese', 'Other Asian', 'Two or More', 'Not Specified']
    mean_income = np.mean(df["Median HH Income"])
    labels = np.array([])
    values = np.array([])
    for race in races:
        prop = np.mean(df["Race"] == race)
        labels = np.append(labels, race)
        values = np.append(values, prop)

    return mean_income, labels, values

In [11]:
# Our observed (population) mean income level and race distribution
observed_summary = summary_statistics(filter_applications)
observed_summary

(184365.47692307693,
 array(['White', 'Black or African American', 'Hispanic', 'Chinese',
        'Filipino', 'Asian Indian', 'Korean', 'Vietnamese', 'Other Asian',
        'Two or More', 'Not Specified'], dtype='<U32'),
 array([0.24615385, 0.04615385, 0.18461538, 0.04615385, 0.01538462,
        0.10769231, 0.01538462, 0.01538462, 0.01538462, 0.26153846,
        0.04615385]))

A bar chart is a helpful visualization to see how the proportions of races are distributed throughout our applicant pool. 

In [12]:
def display_statistics(group_name, summary_statistics):
    """
    input: string group name, tuple of summary statistics
    Displays a bar chart of the race distribution AND the mean income of the group
    returns: None
    """
    mean_income, labels, values = summary_statistics
    fig = px.bar(x=labels, y=values, title = "Race Distribution of " + group_name + " Applicants" + f" (Mean Income: {mean_income})", labels = dict(x = "Race", y = "Proportion"))
    fig.show()

Let's visualize the summary statistics of all applicants to Daniel Webster Elementary in our population of interest.

In [13]:
display_statistics("All", observed_summary)

### Defining our Model

We chose to create a model that uses a variation of permutation testing where the labels are school assignments to either "Webster" to "Mission Bay"

To start off, we first assign labels to the `filter_applications` DataFrame that will be shuffled during optimization. Daniel Webster Elementary and Mission Bay have enough capacity 22 students and 66 students for their Kindergarten General Education programs respectively.  We will assign "Webster" to the first 22 applicants and "Mission Bay" to the remaining 43 applicants.

In [14]:
school_assignment = np.array(["Webster" for i in range(22)] + ["Mission Bay" for i in range(43)])
# add the school assignment labels to the filter_applications DataFrame
filter_applications["School Assignment"] = school_assignment

Our model will produce a school assignment to each applicant, and we will then later use majority rule based on block groups to determine which individual block groups are assigned to which school.

### Defining our Loss Function

Our loss function is designed to evaluate the disparity between the simulated assignment and the actual population data, taking into account two key aspects:

Racial Distribution:
To measure how closely the simulated racial proportions align with the observed population proportions, we calculate the Total Variation Distance (TVD). This ensures that large deviations in racial distribution are heavily penalized.

Mean Income:
To account for economic balance, we calculate the relative difference between the mean income of the simulated group and the observed group. This ensures economic discrepancies are minimized.

Because we are considering our loss as a distance, we take the absolute values of both components in our final loss value

In [15]:
def loss_function(df):
    """
    input: DataFrame

    returns: loss computed by TVD of race proportions of observed and sample, 
    plus the mean income and the mean income absolute difference of mean income of observed and sample
    """
    sample_summary = summary_statistics(df)
    loss = np.sum(np.abs(sample_summary[2] - observed_summary[2])) / 2 
    + np.abs(sample_summary[0] - observed_summary[0]) / observed_summary[0]

    return loss

### Fitting the Model

Our model optimization process:

For each iteration,

1) Shuffle the positions of the "Webster" and "Mission Bay" School Assignments 

2) Calculate and save the loss for the "Webster" and "Mission Bay" groups

3) Once the loss of both groups drops below a specified threshold, or we exceed a maximum number of iterations, we terminate the loop and save the best assignment

We chose the threshold to be 0.067 after comparing the results of repeated empirical observations

In [16]:
# Parameters
max_iterations = 10000
threshold = 0.067

best_webster_loss, best_mission_bay_loss = float('inf'), float('inf')

# Iterative optimization
for i in range(max_iterations):
    # Generate new random assignment
    shuffled_labels = np.random.permutation(filter_applications['School Assignment'])
    new_sample = filter_applications.copy()
    new_sample["School Assignment"] = shuffled_labels
    
    # Compute losses
    webster_loss = loss_function(new_sample[new_sample["School Assignment"] == "Webster"])
    mission_bay_loss = loss_function(new_sample[new_sample["School Assignment"] == "Mission Bay"])
    
    # Update best assignment if it improves
    if webster_loss < best_webster_loss and mission_bay_loss < best_mission_bay_loss:
        best_assignment = new_sample.copy()
        best_webster_loss = webster_loss
        best_mission_bay_loss = mission_bay_loss
    
    # Break if threshold conditions are met
    if webster_loss < threshold and mission_bay_loss < threshold:
        print(f"Solution found at iteration {i+1}")
        break
else: 
    print("Max iterations reached without finding a solution meeting thresholds.")

# Final output
print("Webster Loss:", best_webster_loss)
print("Mission Bay Loss:", best_mission_bay_loss)

Solution found at iteration 331
Webster Loss: 0.0664335664335664
Mission Bay Loss: 0.03398926654740608


### Analyzing the Results

We can now visualize the results of the school assignments produced from our model.

In [17]:
# Population summary
display_statistics("All", observed_summary)

In [18]:
# Webster group summary
display_statistics("Webster", summary_statistics(best_assignment[best_assignment["School Assignment"] == "Webster"]))

In [19]:
# Mission Bay group summary
display_statistics("Mission Bay", summary_statistics(best_assignment[best_assignment["School Assignment"] == "Mission Bay"]))

We can see visually that the racial distributions both the Webster and Mission Bay assignment groups closely resemble the true population race distribution.

Similarly, the mean income from both the Webster and Mission Bay assignment groups closely resembles the true population mean income.

### Final Solution

The final assignments DataFrame chosen by our model is stores in `best_assignment` which assigns each applicant to a school.

In [20]:
best_assignment

Unnamed: 0,BerkeleyID,Race,Block Group,Median HH Income,School Assignment
0,99042277,Hispanic,6.075061e+10,184365,Mission Bay
1,99053140,White,6.075061e+10,184365,Mission Bay
2,99053159,Hispanic,6.075061e+10,184365,Webster
3,990535937,Black or African American,6.075061e+10,184365,Mission Bay
4,99057082,Not Specified,6.075061e+10,184365,Mission Bay
...,...,...,...,...,...
60,990639078,Two or More,6.075061e+10,99030,Mission Bay
61,990640451,Two or More,6.075018e+10,133780,Webster
62,990641349,Asian Indian,6.075062e+10,175911,Webster
63,990649102,Chinese,6.075061e+10,211307,Mission Bay


After grouping the final assignments DataFrame by "Block Group", we assign each block group to the majority school assignment based on our model.  

In [21]:
final = best_assignment[["Block Group", "School Assignment"]].groupby("Block Group").agg(lambda x: x.mode()[0]).reset_index()
final

Unnamed: 0,Block Group,School Assignment
0,60750180000.0,Mission Bay
1,60750180000.0,Mission Bay
2,60750180000.0,Webster
3,60750180000.0,Mission Bay
4,60750180000.0,Webster
5,60750180000.0,Webster
6,60750180000.0,Mission Bay
7,60750230000.0,Webster
8,60750230000.0,Mission Bay
9,60750230000.0,Mission Bay


In [22]:
sf_map_properties = gpd.read_file('geo.shp')
sf_map_properties.head()
map_final = final.copy()

#sf_map_properties['geoid20'].astype(str).str[:-3].str.len().value_counts()
#map_final["Block Group"].astype(str).str.len().value_counts()


# noticing that the geoid column had additional digits attached corresponding to irrelevant data
sf_map_properties['geoid20'] = sf_map_properties['geoid20'].astype(str).str[:-3]
# noticing that converting to string adds 'decimal' to the end
map_final['Block Group'] = map_final['Block Group'].astype(str).str[:-2]

# merge (resulting) DataFrame and SF Census map
merge_df = pd.merge(sf_map_properties, map_final, left_on='geoid20', right_on='Block Group')
sf_map_properties['geoid20'], map_final['Block Group']

# removing irrelevant columns
merge_df = merge_df.drop(columns=['statefp20', 'countyfp20', 'tractce20', 'geoid20', 'name20', 'mtfcc20', 'funcstat20', 'aland20', 'awater20', 'blockce20', 'housing20', 'pop20', 'date_data_', 'time_data_', 'date_dat_2', 'time_dat_2'])
merge_df = merge_df.rename(columns={'intptlat20':'Latitude', 'intptlon20':'Longitude'})

# assigning official names
merge_df['School Assignment'] = merge_df['School Assignment'].replace({
    'Webster': 'Webster (Daniel) Elementary',
    'Mission Bay': 'Mission Bay Elementary'
})

In [23]:
map = merge_df.explore(
    column='School Assignment',  # Column used to determine colors
    cmap = ['blue', 'gold'],  # Custom colormap
    legend=True # Display the legend
)
map

### Assessing the Reliability of Our Model

Our model operates on randomness in its optimization process, meaning that the final "best assignment" may vary depending on the specific run. We use a set random seed to ensure the reproducibility of our work. 

To evaluate the consistency and robustness of our model, we simulate multiple optimization runs. 

For each run:

- Performance metrics—Total Variation Distance (TVD) and Proportional Mean Income Differences—are recorded for both Webster and Mission Bay groups.

- These metrics quantify the racial and economic equity of the school assignments.

After collecting data from these simulations, we compute confidence intervals for each metric. By analyzing these intervals, we gain insights into:

- The variability and stability of the model’s performance across different runs.

- The reliability of its ability to maintain fairness and balance in school assignments.

In [24]:
# Simulates one run of our model optimization, and returns the best assignment for that run
def one_run():
    # Parameters    
    max_iterations = 1000
    threshold = 0.067

    best_webster_loss, best_mission_bay_loss = float('inf'), float('inf')

    # Iterative optimization
    for i in range(max_iterations):
        # Generate new random assignment
        shuffled_labels = np.random.permutation(filter_applications['School Assignment'])
        new_sample = filter_applications.copy()
        new_sample["School Assignment"] = shuffled_labels
        
        # Compute losses
        webster_loss = loss_function(new_sample[new_sample["School Assignment"] == "Webster"])
        mission_bay_loss = loss_function(new_sample[new_sample["School Assignment"] == "Mission Bay"])
        
        # Update best assignment if it improves
        if webster_loss < best_webster_loss and mission_bay_loss < best_mission_bay_loss:
            best_assignment = new_sample.copy()
            best_webster_loss = webster_loss
            best_mission_bay_loss = mission_bay_loss
        
        # Break if threshold conditions are met
        if webster_loss < threshold and mission_bay_loss < threshold:
            print(f"Solution found at iteration {i+1}")
            break
    else: 
        print("Max iterations reached without finding a solution meeting thresholds.")

    return best_assignment

In [25]:
# Calculates Income Difference metric for one group of our simulated assignment
def income_difference(df, group):
    sample_summary = summary_statistics(df[df["School Assignment"] == group])
    return np.abs(sample_summary[0] - observed_summary[0]) / observed_summary[0]

In [26]:
# Calculates Total Variation Difference of from observed population distribution to 
# one group of our simulated assignment
def TVD(df, group):
    sample_summary = summary_statistics(df[df["School Assignment"] == group])
    return np.sum(np.abs(sample_summary[2] - observed_summary[2])) / 2

In [27]:
# Runs multiple simulations, storing the computed metrics
def run_simulations(n):
    TVDs_Webster = []
    TVDs_Mission_Bay = []
    income_differences_Webster = []
    income_differences_Mission_Bay = []
    for i in range(n):
        simulated_assignment = one_run()
        TVDs_Webster.append(TVD(simulated_assignment, "Webster"))
        TVDs_Mission_Bay.append(TVD(simulated_assignment, "Mission Bay"))
        income_differences_Webster.append(income_difference(simulated_assignment, "Webster"))
        income_differences_Mission_Bay.append(income_difference(simulated_assignment, "Mission Bay"))
    return [TVDs_Webster, TVDs_Mission_Bay, income_differences_Webster, income_differences_Mission_Bay]


In [28]:
# Computes and returns the Confidence Interval of data for confidence level
def compute_ci(data, confidence=0.95):
    lower_percentile = (100 - confidence * 100) / 2
    upper_percentile = 100 - lower_percentile
    
    lower_bound = np.percentile(data, lower_percentile)
    upper_bound = np.percentile(data, upper_percentile)
    
    return [lower_bound, upper_bound]

In [29]:
# Run our simulated optimizations 
simulations = run_simulations(30)

Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Solution found at iteration 497
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting thresholds.
Max iterations reached without finding a solution meeting 

In [30]:
# Compute the respective confidence intervals for each metric, for each group
TVDs_Webster_ci = compute_ci(simulations[0], confidence=0.95)
TVDs_Mission_Bay_ci = compute_ci(simulations[1], confidence=0.95)
income_diff_Webster_ci = compute_ci(simulations[2], confidence=0.95)
income_diff_Mission_Bay_ci = compute_ci(simulations[3], confidence=0.95)

In [31]:
# Results
print("Webster Group TVD 95% Confidence Interval:", TVDs_Webster_ci)
print("Mission Bay Group TVD 95% Confidence Interval:", TVDs_Mission_Bay_ci)
print("Webster Group Proportional Mean Income Difference 95% Confidence Interval:", income_diff_Webster_ci)
print("Mission Bay Group Proportional Mean Income Difference 95% Confidence Interval:", income_diff_Mission_Bay_ci)

Webster Group TVD 95% Confidence Interval: [0.0664335664335664, 0.0738286713286713]
Mission Bay Group TVD 95% Confidence Interval: [0.03398926654740608, 0.03777280858676206]
Webster Group Proportional Mean Income Difference 95% Confidence Interval: [0.00208160435145402, 0.05156876714840254]
Mission Bay Group Proportional Mean Income Difference 95% Confidence Interval: [0.0010650068774880929, 0.026384020401508345]


Interpretation:

- The confidence intervals for TVD and proportional mean income differences are narrow, with widths under 
0.02 for all metrics. These small ranges highlight the model's consistency across simulations and its robustness in maintaining racial and economic equity in student assignments. Additionally, the tight bounds indicate that any variations introduced by the random optimization process have a negligible impact on the overall fairness of the assignments. This reinforces confidence in the reliability and stability of the model's results

While specific assignments may vary due to randomness, the equity and socioeconomic balance of our results are consistent across runs.

### Closing Thoughts: 

- Although there are isolated regions for each assignment group, we believe that our region assignments heavily prioritize educational equity. We chose to prioritize diverse regions over contiguous zones because having contiguous zones tends to limit diversity within regions.

- We chose not to use a binary classifier because of the nature of the classes. Regions assigned to Mission Bay Elementary compared to Daniel Webster Elementary should have no difference. Considering the context of the problem, keeping regions classless promotes social equity and greater diversity within every region. 

- Working together as a team throughout all phases of the project enhanced our mutual understanding of each task. We were all able to share our ideas, whether from research or university courses, and leverage individual strengths to ensure each phase of the project was implemented correctly. 

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f5e651ad-5eee-402d-a771-d60f326b92f5' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>