# Test Script: Neighborhood Geospatial Imputation

**Objective:** This script validates the accuracy of the official Boston Neighborhoods shapefile used for imputing missing `neighborhood` values.

**Methodology:**
This script uses known Boston landmarks and their correct, original neighborhood names as found directly in the shapefile. It performs one key action:
1.  Performs a **spatial join** to check if the landmark's coordinates fall within the correct neighborhood polygon.

**Expected Outcome:** A successful test confirms that our shapefile is assigning the correct neighborhood to known locations, resulting in a 100% match.

In [11]:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from pathlib import Path

# --- 1. Load the local Boston Neighborhoods shapefile ---
shapefile_path = Path("../data/processed/boston_neighborhood_boundaries/Boston_Neighborhood_Boundaries.shp")
neighborhoods_gdf = gpd.read_file(shapefile_path)
print(f"Loading shapefile from: {shapefile_path}")

# --- 2. Define 10 test points with their raw neighborhood names ---
test_points = {
    'Location': [
        'Faneuil Hall', 'Paul Revere House', 'Bunker Hill Monument', 'Museum of Fine Arts',
        'Arnold Arboretum', 'Boston Public Library', 'New England Aquarium',
        'USS Constitution', 'South Station', 'Fenway Park'
    ],
    'Latitude': [
        42.3601, 42.3639, 42.3763, 42.3394, 42.3010, 42.3496, 42.3592,
        42.3727, 42.3521, 42.3467
    ],
    'Longitude': [
        -71.0545, -71.0537, -71.0608, -71.0940, -71.1250, -71.0777, -71.0505,
        -71.0560, -71.0552, -71.0972
    ],
    'Expected Neighborhood': [
        'Downtown', 'North End', 'Charlestown', 'Fenway', 'Jamaica Plain',
        'Back Bay', 'Downtown', 'Charlestown', 'Downtown', 'Fenway'
    ]
}
test_df = pd.DataFrame(test_points)

# --- 3. Convert the test data into a GeoDataFrame ---
geometry = [Point(xy) for xy in zip(test_df['Longitude'], test_df['Latitude'])]
test_gdf = gpd.GeoDataFrame(test_df, geometry=geometry, crs="EPSG:4326")
test_gdf = test_gdf.to_crs(neighborhoods_gdf.crs)

# --- 4. Perform the spatial join using the original 'name' column ---
results_gdf = gpd.sjoin(test_gdf, neighborhoods_gdf[['name', 'geometry']], how="left", predicate="within")

# --- 5. Compare the results and print the report ---
results_gdf.rename(columns={'name': 'Found Neighborhood'}, inplace=True)
results_gdf['Match'] = results_gdf['Expected Neighborhood'] == results_gdf['Found Neighborhood']

print("\n--- TEST REPORT --- ✅")
results_gdf[['Location', 'Expected Neighborhood', 'Found Neighborhood', 'Match']]

Loading shapefile from: ../data/processed/boston_neighborhood_boundaries/Boston_Neighborhood_Boundaries.shp

--- TEST REPORT --- ✅


Unnamed: 0,Location,Expected Neighborhood,Found Neighborhood,Match
0,Faneuil Hall,Downtown,Downtown,True
1,Paul Revere House,North End,North End,True
2,Bunker Hill Monument,Charlestown,Charlestown,True
3,Museum of Fine Arts,Fenway,Fenway,True
4,Arnold Arboretum,Jamaica Plain,Jamaica Plain,True
5,Boston Public Library,Back Bay,Back Bay,True
6,New England Aquarium,Downtown,Downtown,True
7,USS Constitution,Charlestown,Charlestown,True
8,South Station,Downtown,Downtown,True
9,Fenway Park,Fenway,Fenway,True
