## Load Enrollment Capacity and Utilization Data

This data contains enrollment, capacity, and utilization for both buildings and organizations (i.e. schools). However, we are going to just use the **building** capacity/utilization fields and ignore the organization-specific ones, following the methodology of the IBO "Barriers to Learning" [Report](https://www.ibo.nyc.gov/content/publications/2025-march-barriers-to-learning-age-accessibility-space-usage-and-air-conditioning-in-nyc-school-buildings). 

While there are 1,540 unique buildings in this capacity/utilzation data, only 1,217 match to a building code in our schools data. This is less than what IBO claims to have matched when they joined their school buildings data to their "Blue Book" (i.e. capacity/utilization) data. However, when joining to building codes in *our* schools data, using this data nets 15 more buildings joined than joining directly to the IBO "Barriers" report data itself (which yields 1,202 buildings matched). For this reason, we're going to use this data in the final `master_schools` layer rather than the IBO report data.

In [1]:
import pandas as pd

In [None]:
capacity_utilization_df = pd.read_csv('../data/raw_data/SCA/Capacity and Utilization/Enrollment_Capacity_And_Utilization_Reports_20250915.csv')
print('total records:', len(capacity_utilization_df))
print('unique buildings:', capacity_utilization_df['Bldg ID'].nunique())
print('unique organizations:', capacity_utilization_df['Org ID'].nunique())

# Fix data types
# Convert 'Data As Of' to datetime
capacity_utilization_df['Data As Of'] = pd.to_datetime(capacity_utilization_df['Data As Of'], format='%m/%d/%Y')
# Separate DFs for Org ID and Bldg ID cap/util data so can experiment with what makes more sense to join
bldg_cap_util_df = capacity_utilization_df.sort_values('Data As Of').drop_duplicates(subset=['Bldg ID'], keep='last')
bldg_cols = ['Org ID', 'Bldg ID', 'Bldg Name', 'Bldg Enroll', 'Target Bldg Cap', 'Target Bldg Util', 'Data As Of']
bldg_cap_util_df = bldg_cap_util_df[bldg_cols]
org_cap_util_df = capacity_utilization_df.sort_values('Data As Of').drop_duplicates(subset=['Org ID'], keep='last')
org_cols = ['Bldg ID', 'Org ID', 'Organization Name', 'Org Enroll', 'Org Target Cap', 'Org Target Util', 'Data As Of']
org_cap_util_df = org_cap_util_df[org_cols]

# Drop duplicates on both org ID and bldg ID for the full cap/util dataset
capacity_utilization_df = capacity_utilization_df.sort_values('Data As Of').drop_duplicates(subset=['Org ID', 'Bldg ID'], keep='last')

total records: 17622
unique buildings: 1540
unique organizations: 1964


## Join Schools Data

Load Schools Data

In [3]:
import geopandas as gpd
schools = gpd.read_file('../data/processed_data/school_points_with_lcgms.shp')

### Explore different approaches to joining data

Show join results when using *both* org ID and bldg ID

In [4]:
schools_merge_both_outer = schools[['Loc_Code', 'Loc_Name', 'Bldg_Code']].merge(
    capacity_utilization_df[['Bldg ID', 'Org ID', 'Organization Name', 'Data As Of']],
    left_on=['Loc_Code', 'Bldg_Code'],
    right_on=['Org ID', 'Bldg ID'],
    how='outer',
    indicator=True
)
schools_merge_both_outer['_merge'].value_counts()

_merge
right_only    2048
both          1683
left_only      284
Name: count, dtype: int64

In [5]:
print("Unique buildings matched from join on both `Org ID` and `Bldg ID`:", schools_merge_both_outer[schools_merge_both_outer['_merge'] == 'both']['Bldg ID'].nunique())

Unique buildings matched from join on both `Org ID` and `Bldg ID`: 1199


In [6]:
schools_org_capacity_merged_outer = schools[['Loc_Code', 'Loc_Name']].merge(
    org_cap_util_df[['Bldg ID', 'Org ID', 'Organization Name', 'Data As Of']], 
    left_on='Loc_Code', 
    right_on='Org ID', 
    how='outer', 
    indicator=True
)
schools_org_capacity_merged_outer['_merge'].value_counts()

_merge
both          1739
left_only      228
right_only     225
Name: count, dtype: int64

In [7]:
print("Unique buildings matched from join on `Org ID`:", schools_org_capacity_merged_outer[schools_org_capacity_merged_outer['_merge'] == 'both']['Bldg ID'].nunique())

Unique buildings matched from join on `Org ID`: 1205


In [8]:
schools_bldg_capacity_merged_outer = schools[['Bldg_Code']].merge(
    bldg_cap_util_df[['Bldg ID', 'Bldg Name', 'Org ID', 'Data As Of']], 
    left_on='Bldg_Code', 
    right_on='Bldg ID', 
    how='outer', 
    indicator=True
)
schools_bldg_capacity_merged_outer['_merge'].value_counts()

_merge
both          1736
right_only     323
left_only      231
Name: count, dtype: int64

In [9]:
print("Unique buildings matched from join on `Bldg ID`:", schools_bldg_capacity_merged_outer[schools_bldg_capacity_merged_outer['_merge'] == 'both']['Bldg ID'].nunique())

Unique buildings matched from join on `Bldg ID`: 1217


## Export capacity/utilization data to be merged to schools on `Bldg ID`

In [10]:
bldg_cap_util_df.drop(columns=['Org ID']).to_csv('../data/processed_data/bldg_capacity_utilization.csv', index=False)