# Writing

After inspecting the "Geographic Cluster Name" column in the MCMF dataset, I found that there are 121,359 missing values for in-person programs. Since both analysis 1 and 2 relies heavily on information about the neighborhood programs belong to, I decided to use the geographic information in Community Boundaries dataset and latitude longtitude information in the MCMF dataset to map programs into respective neighborhoods. 

I first compared the neighborhood names in the MCMF dataset with neighborhood names in the Community Boundaries dataset to see if there are any difference. I found that aside from neighborhood names, some programs in the MCMF dataset used unstandardized names such as "Far South Equity Zone" and "Back of the Yards", which also need to be mapped. After extracting programs that has both longitude and latittude information and don't have a geographic cluster name or its geographic cluster name is unstandardized, I turned longitude lattitude information into shapely library point format. I also turned the multipolygon in Community Boundaries dataset into shapely format. Next, for each longitude-latitude pair, I checked if it is in any of the multipolygon that represents a neighborhood. 

After mapping, I reviewed the neighborhoods assigned to programs with unstandardized names. This step was necessary because some programs with unstandardized names lack latitude-longitude data, and I wanted to map them to the same neighborhoods as others with the same unstandardized name. However, upon review, I found that many unstandardized names, such as equity zones, were mapped to different neighborhood names. To avoid inconsistencies—where some equity zones are converted into neighborhood names while others remain unchanged—I decided to create a new column, "Neighborhood," dedicated exclusively to neighborhood names. Programs in equity zones that could not be mapped to a specific neighborhood will be marked as "NA" in this column.

# Code

In [3]:
import pandas as pd
from shapely.geometry import Point
from shapely import wkt

# Reading Data
project_data = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv')
chi_nei=pd.read_csv('CommAreas_20241114.csv')

# Exclude Online Program
project_data['Geographic Cluster Name'] = project_data.apply(
    lambda row: 'online' if row['Meeting Type'] == 'online' and pd.isnull(row['Geographic Cluster Name']) else row['Geographic Cluster Name'],
    axis=1
)

# Comparing Geographic Cluster Names and neighborhoodNames in Community Boundaries
project_data_unique = project_data['Geographic Cluster Name'].unique()
chi_nei_unique = chi_nei['COMMUNITY'].unique()
matches = set(project_data_unique).intersection(chi_nei_unique)
unmatched_project_data = set(project_data_unique) - matches
unmatched_chi_nei = set(chi_nei_unique) - matches
print(f"Matches: {matches}")
print(f"Unmatched in project_data: {unmatched_project_data}")
print(f"Unmatched in chi_nei: {unmatched_chi_nei}")

# Extracting Programs with No Geographic Cluster Name or Unstandardized Name
unmatched_geocluster_list = list(unmatched_project_data)
project_withlatlong=project_data.loc[
    ((project_data['Geographic Cluster Name'].isnull()) | (project_data['Geographic Cluster Name'].isin(unmatched_geocluster_list))) & 
    (project_data['Latitude'].notnull()) & 
    (project_data['Longitude'].notnull()),
    ['Program ID','Latitude','Longitude','Geographic Cluster Name']
]

# Turning data into shapely format & Mapping 
project_withlatlong['point_geom']=project_withlatlong.apply(
    lambda row: Point(row['Longitude'],row['Latitude']),axis=1
)
chi_nei['shapely_geom']=chi_nei['the_geom'].apply(wkt.loads)
def match_multiploygon(point,multipolygons):
    for muultipolygon in multipolygons:
        if muultipolygon.contains(point):
            return muultipolygon
    return None
project_withlatlong['shapely_geom']=project_withlatlong['point_geom'].apply(
    lambda point: match_multiploygon(point,chi_nei['shapely_geom'])
)
matched_program_neiname = pd.merge(project_withlatlong,chi_nei,how='left')

# Checking Unstandardized Name & the Neighborhood They Mapped to
print(matched_program_neiname.groupby('Geographic Cluster Name')['COMMUNITY'].unique())

# Result
attempt1_result = matched_program_neiname.loc[matched_program_neiname['COMMUNITY'].notnull(),['Program ID','COMMUNITY']]
project_data = pd.merge(project_data,attempt1_result,on='Program ID',how='left')
project_data['Neighborhood'] = project_data.apply(
    lambda row: row['COMMUNITY'] if pd.notnull(row['COMMUNITY']) else row['Geographic Cluster Name'],
    axis=1
)
project_data.loc[project_data['Neighborhood'].isin(unmatched_geocluster_list),'Neighborhood']= None
project_data.loc[project_data['Geographic Cluster Name']=='online',['Neighborhood']]= 'online'
project_data=project_data.drop(columns=['COMMUNITY'])

Matches: {'NORWOOD PARK', 'EAST SIDE', 'CLEARING', 'WASHINGTON PARK', 'MONTCLARE', 'MORGAN PARK', 'NORTH CENTER', 'MOUNT GREENWOOD', 'NORTH PARK', 'HYDE PARK', 'DUNNING', 'MCKINLEY PARK', 'PULLMAN', 'HEGEWISCH', 'WEST ELSDON', 'OAKLAND', 'WEST ENGLEWOOD', 'SOUTH CHICAGO', 'LAKE VIEW', 'BELMONT CRAGIN', 'DOUGLAS', 'BEVERLY', 'ARMOUR SQUARE', 'JEFFERSON PARK', 'KENWOOD', 'GARFIELD RIDGE', 'EDISON PARK', 'AUSTIN', 'ROSELAND', 'SOUTH SHORE', 'ENGLEWOOD', 'LINCOLN SQUARE', 'IRVING PARK', 'ASHBURN', 'LOOP', 'PORTAGE PARK', 'ARCHER HEIGHTS', 'RIVERDALE', 'BRIDGEPORT', 'OHARE', 'FULLER PARK', 'CALUMET HEIGHTS', 'EAST GARFIELD PARK', 'LOGAN SQUARE', 'WOODLAWN', 'LINCOLN PARK', 'LOWER WEST SIDE', 'BRIGHTON PARK', 'NEW CITY', 'GREATER GRAND CROSSING', 'CHICAGO LAWN', 'NEAR SOUTH SIDE', 'UPTOWN', 'WEST LAWN', 'HERMOSA', 'SOUTH LAWNDALE', 'EDGEWATER', 'HUMBOLDT PARK', 'BURNSIDE', 'GAGE PARK', 'WEST RIDGE', 'WASHINGTON HEIGHTS', 'AVALON PARK', 'WEST GARFIELD PARK', 'FOREST GLEN', 'WEST PULLMAN', 'AL