# NYC DOE Accessibility Dataset
## This website details the school accessibility data:
## https://www.schools.nyc.gov/school-life/space-and-facilities/building-accessibility
#### The information comes from the sheets 'Current Accessible School List' and 'RAW Data'.
The datasets have 1727 schools and 2320 schools, respectively as of Oct 18th. The "Current Accessible School List" appears to be updated every week and has more accurate/timely data than the "RAW Data" sheet. The 'RAW Data' sheet appears to contain the full list of schools that have been assessed historically, regardless of whether they have closed.

In the final dataset, "accessibility_match_status" can have the following values: Complete (Current), "Complete (Raw Backup)", and "No Match". Complete (Current) indicates that the value is coming from the "Current Accessible School List" sheet (the more timely, weekly updated sheet). Complete (Raw Backup) indicates that the value is coming from the "RAW Data" (the less timely sheet) because the value corresponding couldn't be found in the "Current Accessible School List" dataset. No Match indicates that no match was found in either dataset. (This is all relative to the location dataset merged on 'Location Code' AND 'Building Code'.)




In [10]:
!pip3 install pandas openpyxl

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


In [11]:
import pandas as pd
import geopandas as gpd
import numpy as np
from pathlib import Path
import warnings
from openpyxl import load_workbook
warnings.filterwarnings('ignore')

In [12]:
data_dir = Path("../input_data")
output_dir = Path("../processed_data")
output_dir.mkdir(exist_ok=True, parents=True)

gdf = gpd.read_file(output_dir / "school_points_with_lcgms.geojson")
wb = load_workbook(data_dir / "Current_Building_Accessibility_Profile_List.xlsm", read_only=True)

In [13]:
ws_curr_data = wb["Current Accessible School List"] #current list
ws_raw_data = wb["RAW Data"] #backup list

curr_data = ws_curr_data.values
raw_data = ws_raw_data.values

next(curr_data)
next(curr_data)
curr_cols = next(curr_data) #index 0, 1, and 2 rows are all header data
curr_df = pd.DataFrame(curr_data, columns=curr_cols)

raw_cols = next(raw_data) #index 0 and 1 rows are both header data
raw_df = pd.DataFrame(raw_data, columns=raw_cols)

wb.close()

In [14]:
import re

# extracting BAP rating from a given HYPERLINK formula
def extract_rating(hyperlink_str):
    if pd.isna(hyperlink_str):
        return None
    
    # Extract the display text from HYPERLINK("url", "X out of 10")
    # Pattern: find "X out of 10" where X is a number
    match = re.search(r'"(\d+(?:\.\d+)?)\s+out\s+of\s+10"', str(hyperlink_str))
    
    if match:
        return float(match.group(1))
    return None

# Create new BAP Rating column from the HYPERLINK column
curr_df['BAP Rating'] = curr_df.iloc[:, 11].apply(extract_rating)

curr_df = curr_df.drop(curr_df.columns[[0, 11]], axis=1) #dropping columns 0 (col titled None and filled with None values) and 11 (hyperlink dupe of url col)

In [15]:
#cleaning merge keys in all dfs
gdf['Location Code'] = gdf['Location Code'].astype(str).str.strip()
gdf['Building Code'] = gdf['Building Code'].astype(str).str.strip()
curr_df['Location Code'] = curr_df['Location Code'].astype(str).str.strip()
curr_df['Building Code'] = curr_df['Building Code'].astype(str).str.strip()
raw_df['Location Code'] = raw_df['Location Code'].astype(str).str.strip()
raw_df['Building Code'] = raw_df['Building Code'].astype(str).str.strip()

# selecting columns for merge in curr_df
curr_subset = curr_df[['Location Code', 'Building Code', 'BAP Rating', 
                        'Accessibility Description', 'BAP Full URL']].copy()
curr_subset.columns = ['Location Code', 'Building Code', 'BAP Rating_curr', 
                       'Accessibility Description_curr', 'BAP Full URL_curr']

# selecting columns for merge in raw_df
raw_subset = raw_df[['Location Code', 'Building Code', 'BAP Rating', 
                      'Accessibility Description']].copy()
raw_subset.columns = ['Location Code', 'Building Code', 'BAP Rating_raw', 
                      'Accessibility Description_raw']

# merging gdf with curr_df first
final_df = gdf.merge(curr_subset, on=['Location Code', 'Building Code'], how='left')

# then merging with raw_df to get fallback values
final_df = final_df.merge(raw_subset, on=['Location Code', 'Building Code'], how='left')

# creating final columns with fallback logic
# BAP Rating: use curr, fallback to raw
final_df['BAP Rating'] = final_df['BAP Rating_curr'].fillna(final_df['BAP Rating_raw'])

# accessibility description: use curr, fallback to raw
final_df['Accessibility Description'] = final_df['Accessibility Description_curr'].fillna(
    final_df['Accessibility Description_raw']
)

# BAP Full URL: use curr, set to "TBD" if came from raw
final_df['BAP Full URL'] = final_df['BAP Full URL_curr']
# If we have accessibility data but no URL (came from raw), set to TBD
came_from_raw = (final_df['BAP Rating_curr'].isna()) & (final_df['BAP Rating_raw'].notna())
final_df.loc[came_from_raw, 'BAP Full URL'] = 'TBD'

# drop temp columns
final_df = final_df.drop(['BAP Rating_curr', 'BAP Rating_raw', 
                          'Accessibility Description_curr', 'Accessibility Description_raw',
                          'BAP Full URL_curr'], axis=1)

# add match status
final_df['accessibility_match_status'] = 'No Match'
has_curr_match = final_df['BAP Rating'].notna() & (~came_from_raw)
has_raw_match = came_from_raw
final_df.loc[has_curr_match, 'accessibility_match_status'] = 'Complete (Current)'
final_df.loc[has_raw_match, 'accessibility_match_status'] = 'Complete (Raw Backup)'

In [16]:
final_df

Unnamed: 0,ATS,Location Code,Location Name,Latitude,Longitude,ATS System Code,BEDS Number,Managed By Name,Location Type Description,Location Category Description,...,HighSchool Network Name,HighSchool Network Superintendent,HighSchool Network Superintendent Email,BCO Location Code,in_LCGMS,geometry,BAP Rating,Accessibility Description,BAP Full URL,accessibility_match_status
0,15K001,K001,P.S. 001 The Bergen,40.648959,-74.011420,15K001,3.315000e+11,DOE,General Academic,Elementary,...,,,,KFSN,True,POINT (-8238913.587 4960700.272),,No Accessibility,TBD,Complete (Raw Backup)
1,17K002,K002,Parkside Preparatory Academy,40.656423,-73.951575,17K002,3.317000e+11,DOE,General Academic,Junior High-Intermediate-Middle,...,,,,KFSS,True,POINT (-8232251.672 4961795.46),10.0,Fully Accessible,https://nycdoe.sharepoint.com/:w:/s/BAP/ERQlDw...,Complete (Current)
2,13K003,K003,P.S. 003 The Bedford Village,40.682311,-73.955219,13K003,3.313000e+11,DOE,General Academic,Elementary,...,,,,KFSN,True,POINT (-8232657.321 4965594.938),,No Accessibility,TBD,Complete (Raw Backup)
3,75K004,K004,P.S. K004,40.658500,-73.879276,75K004,3.075000e+11,DOE,Special Education,Elementary,...,,,,D075,True,POINT (-8224203.385 4962100.238),,No Accessibility,TBD,Complete (Raw Backup)
4,16K005,K005,P.S. 005 Dr. Ronald McNair,40.685241,-73.921970,16K005,3.316000e+11,DOE,General Academic,Elementary,...,,,,KFSN,True,POINT (-8228956.059 4966025.055),2.0,Partially Accessible,https://nycdoe.sharepoint.com/:w:/s/BAP/EX_vvq...,Complete (Current)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1963,88X994,X994,ALC - Crotona Academy,40.829306,-73.892243,,,,,,...,,,,,False,POINT (-8225646.864 4987196.846),,,,No Match
1964,88X995,X995,ALC - Rose Hill Academy,40.857248,-73.903165,,,,,,...,,,,,False,POINT (-8226862.696 4991308.524),,,,No Match
1965,88X996,X996,ALC - Stevenson Campus,40.821218,-73.855930,,,,,,...,,,,,False,POINT (-8221604.52 4986007.017),,,,No Match
1966,75X502,X999,Home Instruction - Bronx,40.831829,-73.827642,,,,,,...,,,,,False,POINT (-8218455.514 4987568.035),,,,No Match


In [17]:
import folium
from folium import plugins
import pandas as pd

# Prepare data
map_data = final_df[final_df['Latitude'].notna() & final_df['Longitude'].notna()].copy()
print(f"Schools to map: {len(map_data)}")
print("\nAccessibility Description distribution:")
print(map_data['Accessibility Description'].value_counts())  # Fixed this line

# Create base map
nyc_map = folium.Map(
    location=[40.7128, -74.0060],
    zoom_start=11,
    tiles='OpenStreetMap'
)

# Simple color mapping based on accessibility description
def get_color_for_accessibility(accessibility_desc):
    if pd.isna(accessibility_desc) or accessibility_desc == '':
        return 'gray'  # No data
    
    desc_str = str(accessibility_desc).strip()
    
    if 'Fully Accessible' in desc_str:
        return '#2ecc71'  # Green
    elif 'Partially Accessible' in desc_str:
        return '#f39c12'  # Orange
    elif 'No Accessibility' in desc_str:
        return '#e74c3c'  # Red
    elif 'No Information Available' in desc_str:
        return '#95a5a6'  # Light gray
    else:
        return 'gray'  # Default no data
    
# Add markers
for idx, school in map_data.iterrows():
    accessibility_desc = school.get('Accessibility Description')
    marker_color = get_color_for_accessibility(accessibility_desc)
    
    # Create popup
    popup_html = f"""
    <b>{school['Location Name']}</b><br>
    Location Code: {school['Location Code']}<br>
    Building Code: {school['Building Code']}<br>
    """
    
    if pd.notna(accessibility_desc):
        popup_html += f"<b>Accessibility: {accessibility_desc}</b><br>"
        if pd.notna(school.get('BAP Rating')):
            popup_html += f"BAP Rating: {school.get('BAP Rating')}/10<br>"
        popup_html += f"Address: {school.get('Primary Address', 'N/A')}<br>"
        if pd.notna(school.get('BAP Full URL')) and school.get('BAP Full URL') != 'TBD':
            popup_html += f'<a href="{school.get("BAP Full URL")}" target="_blank">View BAP Report</a><br>'
    else:
        popup_html += "<b>No Accessibility Data</b><br>"
    
    folium.CircleMarker(
        location=[school['Latitude'], school['Longitude']],
        radius=6,
        popup=folium.Popup(popup_html, max_width=300),
        color=marker_color,
        fill=True,
        fillColor=marker_color,
        fillOpacity=0.8,
        weight=2
    ).add_to(nyc_map)

# Add legend
legend_html = '''
<div style="position: fixed; 
            bottom: 50px; right: 50px; width: 220px; height: 160px; 
            background-color: white; border:2px solid grey; z-index:9999; 
            font-size:14px; padding: 10px">
<p style="margin-bottom: 8px;"><b>Accessibility Status</b></p>
<p style="margin: 5px;"><span style="color: #2ecc71;">●</span> Fully Accessible</p>
<p style="margin: 5px;"><span style="color: #f39c12;">●</span> Partially Accessible</p>
<p style="margin: 5px;"><span style="color: #e74c3c;">●</span> No Accessibility</p>
<p style="margin: 5px;"><span style="color: #95a5a6;">●</span> No Information Available</p>
<p style="margin: 5px;"><span style="color: gray;">●</span> No Data</p>
</div>
'''
nyc_map.get_root().html.add_child(folium.Element(legend_html))

# Display in Colab
nyc_map

Schools to map: 1950

Accessibility Description distribution:
Accessibility Description
Partially Accessible        718
Fully Accessible            608
No Accessibility            400
                              1
No Information Available      1
Name: count, dtype: int64


In [18]:
output_file = output_dir / "schools_with_accessibility_status.csv"
final_df.to_csv(output_file, index=False)