# Introduction 

Many people in Southwest Baltimore do not have easy access to healthcare. Some neighborhoods are far from clinics or hospitals, and others face challenges like poverty, or lack of transportation. These problems can make people sicker over time. That is why it is important to understand where these issues are happening so we can help. 
The Health and Wellness Group Project focuses on leveraging data science to address critical healthcare accessibility issues in Southwest Baltimore. In collaboration with Dr. Megan Doede, the Program Lead for Health and Wellness at Paul’s Place, our team has embarked on a mission to explore the intersection of spatial disparities, environmental stressors, and community health outcomes.  
By applying advanced analytical techniques and geospatial tools using demographic and environmental data, we aim to uncover key patterns that identify healthcare deserts and highlight areas at risk for future health concerns. Our goal is to use these insights to support data-driven, equitable interventions that can improve healthcare delivery and resource distribution for underserved communities. This will help leaders and community groups make better decisions about where to bring services and support.
In this aspect of the project, we create a geospatial loactor for homeless shelters and social welfare centers in commutable distance of Pauls place within Southwest Baltimore. The maps give a visualization of the homeless shelters within walking distance of Pauls place for community outreach purposes(link here) and aid in identifying health service deserts within Southwest Baltimore.   

# METHODOLOGY

## Data Collection

### Source of Data:

We used three main sources for our data:

#### American Community Survey (ACS) from the U.S. Census Bureau.
https://www.census.gov/programs-surveys/acs/news/data-releases/2023.html Provided information about income, education, health insurance, race, age, and more which helped us understand the community’s economic and social conditions.

#### Homeless Shelter Locations from Open Baltimore.
https://data.baltimorecity.gov/datasets/baltimore::homeless-shelters-2/explore With the use of API url to access updated data this data shows where homeless shelters are located in Baltimore.

#### Health Department Dataset.
https://data.baltimorecity.gov/datasets/e37ce649df4344dab174b34593b1c4b6_0/explore?location=39.307459%2C-76.628697%2C11.39&showTable=true Provided information about health problems in the community including data on store density (alcohol, tobacco, grocery), mortality, substance abuse, and more.

The homeless shelter geo-loactor map was hosted using stream-lit along with github. Github was used to host the text .py file and other dependencies with streamlit was used to assemble and project the map along with other features(link here).
The health and wellness service deserts map was done in jupyter notebook with html syntax and hosted in github(link here).

We employed a systematic approach to process and transform socioeconomic data in preparation for subsequent analytical procedures. The overarching goal was to derive a structured dataset suitable for in-depth viusuliaztion, potentially including spatial distribution and inference. The methodology comprised the following stages seen in code blocks:



In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 



In [5]:
import pandas as pd
import numpy as np
data_SW_social = pd.read_csv("/Users/bayowaonabajo/Downloads/ACS  SW Balt Social/ACS 5 year SW Balt Social.csv") #dataset
#data_SW_social.head(20)
for idx, value in enumerate(data_SW_social['Label (Grouping)'], 1):
    print(f"{idx}. {value}")

1. HOUSEHOLDS BY TYPE
2.     Total households
3.         Married-couple household
4.             With children of the householder under 18 years
5.         Cohabiting couple household
6.             With children of the householder under 18 years
7.         Male householder, no spouse/partner present
8.             With children of the householder under 18 years
9.             Householder living alone
10.                 65 years and over
11.         Female householder, no spouse/partner present
12.             With children of the householder under 18 years
13.             Householder living alone
14.                 65 years and over
15.         Households with one or more people under 18 years
16.         Households with one or more people 65 years and over
17.         Average household size
18.         Average family size
19. RELATIONSHIP
20.     Population in households
21.         Householder
22.         Spouse
23.         Unmarried partner
24.         Child
25.         Other rel

### Data Exploration

We looked at the data to see what it tells us about the people living in Southwest Baltimore. Missing values, outliers, and important patterns in the Data were done alongside the predictive model as seen in the health risk predictive model documentation (link here). This helped us understand the quality of the data and what we needed to extract or fix before usage.

In [7]:
 data_SW_social.iloc[64:76]

Unnamed: 0,Label (Grouping),ZCTA5 21207!!Estimate,ZCTA5 21207!!Margin of Error,ZCTA5 21207!!Percent,ZCTA5 21207!!Percent Margin of Error,ZCTA5 21216!!Estimate,ZCTA5 21216!!Margin of Error,ZCTA5 21216!!Percent,ZCTA5 21216!!Percent Margin of Error,ZCTA5 21223!!Estimate,...,ZCTA5 21223!!Percent,ZCTA5 21223!!Percent Margin of Error,ZCTA5 21229!!Estimate,ZCTA5 21229!!Margin of Error,ZCTA5 21229!!Percent,ZCTA5 21229!!Percent Margin of Error,ZCTA5 21230!!Estimate,ZCTA5 21230!!Margin of Error,ZCTA5 21230!!Percent,ZCTA5 21230!!Percent Margin of Error
64,College or graduate school,4099.0,±663,34.4%,±4.4,1867.0,±321,31.2%,±6.2,992.0,...,22.8%,±8.8,2150.0,±512,21.2%,±4.2,2225.0,±322,32.3%,±5.2
65,EDUCATIONAL ATTAINMENT,,,,,,,,,,...,,,,,,,,,,
66,Population 25 years and over,34532.0,"±1,464",34532,(X),19063.0,"±1,095",19063,(X),13870.0,...,13870,(X),30810.0,"±1,471",30810,(X),24865.0,"±1,185",24865,(X)
67,Less than 9th grade,1007.0,±289,2.9%,±0.8,758.0,±238,4.0%,±1.2,1001.0,...,7.2%,±2.0,995.0,±274,3.2%,±0.9,1130.0,±314,4.5%,±1.2
68,"9th to 12th grade, no diploma",2619.0,±573,7.6%,±1.5,1735.0,±373,9.1%,±1.9,2435.0,...,17.6%,±3.3,2467.0,±359,8.0%,±1.2,1809.0,±437,7.3%,±1.7
69,High school graduate (includes equival...,9940.0,±983,28.8%,±2.3,7732.0,±835,40.6%,±3.6,5411.0,...,39.0%,±3.0,11185.0,"±1,271",36.3%,±3.0,3397.0,±439,13.7%,±1.7
70,"Some college, no degree",7500.0,±713,21.7%,±1.8,4887.0,±597,25.6%,±3.0,2894.0,...,20.9%,±3.4,7104.0,±632,23.1%,±2.0,3070.0,±463,12.3%,±1.8
71,Associate's degree,2825.0,±524,8.2%,±1.5,846.0,±228,4.4%,±1.2,412.0,...,3.0%,±1.2,2213.0,±394,7.2%,±1.3,827.0,±213,3.3%,±0.9
72,Bachelor's degree,6022.0,±753,17.4%,±2.3,1770.0,±397,9.3%,±2.0,872.0,...,6.3%,±1.5,4328.0,±582,14.0%,±1.9,7843.0,±602,31.5%,±2.0
73,Graduate or professional degree,4619.0,±548,13.4%,±1.6,1335.0,±446,7.0%,±2.3,845.0,...,6.1%,±1.4,2518.0,±365,8.2%,±1.1,6789.0,±618,27.3%,±2.0


### Data Cleaning and feature selection

Data cleaning process is same as discussed in predictive modelling documentation(link here). For the geo-mapping, we extracted population estimate variables for each zip code tabulation area, and created a table with longitude and latitude features for each ZCTA. To enhance clarity and facilitate subsequent data handling, the column designated as 'Label (Grouping)' was programmatically renamed to 'Metric'.

In [9]:
import pandas as pd
import numpy as np

# Subset data
subset_df = data_SW_social.iloc[64:76].copy()

# Clean column names
clean_columns = {col: col.replace("ZCTA5 ", "").replace("!!", "_") for col in subset_df.columns}
subset_df.rename(columns=clean_columns, inplace=True)

# Melt to long format
melted_df = subset_df.melt(
    id_vars=["Label (Grouping)"], 
    var_name="ZCTA5_Metric", 
    value_name="Value"
)

# Split ZCTA5 and metric
split_vals = melted_df['ZCTA5_Metric'].str.split("_", n=1, expand=True)
melted_df['ZCTA5'] = split_vals[0]
melted_df['Metric'] = split_vals[1].str.replace("Margin", " Margin").str.strip()

# Clean label names
melted_df['Label (Grouping)'] = melted_df['Label (Grouping)'].str.replace('\xa0', ' ').str.strip()

# Enhanced cleaning
def clean_value(value):
    try:
        if pd.isna(value) or str(value).strip() in ('(X)', '...'):
            return np.nan
        value_part = str(value).split('±')[0]
        cleaned = value_part.replace(',', '').replace('%', '').strip()
        return float(cleaned) if cleaned else np.nan
    except:
        return np.nan

melted_df['Value'] = melted_df['Value'].apply(clean_value)
melted_df = melted_df.dropna(subset=['Value'])

# Updated education mapping 
education_mapping = {
    'Less than 9th grade': 'Education below high school',
    '9th to 12th grade, no diploma': 'Education below high school',
    'High school graduate (includes equivalency)': 'High school to grad/professional',
    'Some college, no degree': 'High school to grad/professional',
    "Associate's degree": 'High school to grad/professional',
    "Bachelor's degree": 'High school to grad/professional',
    'Graduate or professional degree': 'High school to grad/professional',
    'High school graduate or higher': 'High school to grad/professional',
    "Bachelor's degree or higher": 'High school to grad/professional'
}

# Filter and map
melted_df = melted_df[melted_df['Label (Grouping)'].isin(education_mapping.keys())]
melted_df['Education Category'] = melted_df['Label (Grouping)'].map(education_mapping)

# Pivot table
final_table = melted_df.pivot_table(
    index=['ZCTA5', 'Education Category'],
    columns='Metric',
    values='Value',
    aggfunc='sum',
    fill_value=0
).reset_index()

# Ensure columns needed
required_columns = ['Estimate', 'Margin of Error', 'Percent', 'Percent Margin of Error']
for col in required_columns:
    if col not in final_table.columns:
        final_table[col] = 0

# Final format
final_table_soc = final_table[['ZCTA5', 'Education Category'] + required_columns]
final_table_soc[['Estimate', 'Margin of Error']] = final_table[['Estimate', 'Margin of Error']].astype(int)
final_table_soc[['Percent', 'Percent Margin of Error']] = final_table[['Percent', 'Percent Margin of Error']].round(1)

final_table_soc.head()

Metric,ZCTA5,Education Category,Estimate,Margin of Error,Percent,Percent Margin of Error
0,21207,Education below high school,3626,0,10.5,0
1,21207,High school to grad/professional,72453,0,209.8,0
2,21216,Education below high school,2493,0,13.1,0
3,21216,High school to grad/professional,36245,0,190.1,0
4,21223,Education below high school,3436,0,24.8,0


In [10]:
import geopandas as gpd

#data path 
shapefile_path = "/Users/bayowaonabajo/Downloads/tl_2023_us_zcta520/tl_2023_us_zcta520.shp"

# Load the shapefile
zcta = gpd.read_file(shapefile_path)

# Check 
print(zcta.head())

  ZCTA5CE20 GEOID20       GEOIDFQ20 CLASSFP20 MTFCC20 FUNCSTAT20  ALAND20  \
0     47236   47236  860Z200US47236        B5   G6350          S  1029063   
1     47870   47870  860Z200US47870        B5   G6350          S     8830   
2     47851   47851  860Z200US47851        B5   G6350          S    53326   
3     47337   47337  860Z200US47337        B5   G6350          S   303089   
4     47435   47435  860Z200US47435        B5   G6350          S    13302   

   AWATER20   INTPTLAT20    INTPTLON20  \
0         0  +39.1517426  -085.7252769   
1         0  +39.3701518  -087.4735141   
2         0  +39.5735839  -087.2459559   
3         0  +39.8027537  -085.4372850   
4         0  +39.2657557  -086.2951577   

                                            geometry  
0  POLYGON ((-85.7341 39.15597, -85.72794 39.1561...  
1  POLYGON ((-87.47414 39.37016, -87.47409 39.370...  
2  POLYGON ((-87.24769 39.5745, -87.24711 39.5744...  
3  POLYGON ((-85.44357 39.80328, -85.44346 39.803...  
4  POLYGO

In [46]:
import geopandas as gpd
import pandas as pd

# 1. Load the shapefile 
zcta = gpd.read_file("/Users/bayowaonabajo/Downloads/tl_2023_us_zcta520/tl_2023_us_zcta520.shp")

# 2. Convert to latitude/longitude (WGS84)
zcta = zcta.to_crs(epsg=4326)

# 3. Calculate centroid coordinates
zcta["longitude"] = zcta.geometry.centroid.x
zcta["latitude"] = zcta.geometry.centroid.y

# 4. Rename ZCTA5 column 
zcta = zcta.rename(columns={"ZCTA5CE20": "ZCTA5"})

# 5. Merge 
final_table_social = pd.merge(
    final_table_soc,
    zcta[["ZCTA5", "longitude", "latitude"]],
    on="ZCTA5",
    how="left"
)

# display
final_table_social.head()

import warnings
warnings.filterwarnings('ignore')


  zcta["longitude"] = zcta.geometry.centroid.x

  zcta["latitude"] = zcta.geometry.centroid.y


In [12]:
import pandas as pd
import numpy as np

data_SW_economic = pd.read_csv("/Users/bayowaonabajo/Downloads/ACS SW Balt Economic Data/ACS 5 year SW Balt Economic.csv")
# Subset the economic data
subset_df2 = data_SW_economic.iloc[1:18].copy()

# Clean column names
clean_columns = {col: col.replace("ZCTA5 ", "").replace("!!", "_") for col in subset_df2.columns}
subset_df2.rename(columns=clean_columns, inplace=True)

# Melt to long format
melted_df2 = subset_df2.melt(
    id_vars=["Label (Grouping)"], 
    var_name="ZCTA5_Metric", 
    value_name="Value"
)

# Split ZCTA5 and metric
split_vals = melted_df2['ZCTA5_Metric'].str.split("_", n=1, expand=True)
melted_df2['ZCTA5'] = split_vals[0]
melted_df2['Metric'] = split_vals[1].str.replace("Margin", " Margin").str.strip()

# Clean label names
melted_df2['Label (Grouping)'] = melted_df2['Label (Grouping)'].str.replace('\xa0', ' ').str.strip()

# Modified cleaning
def clean_value(value):
    try:
        if pd.isna(value) or str(value).strip() in ('(X)', '...'):
            return np.nan  # Preserve NaN
        value_part = str(value).split('±')[0]
        cleaned = value_part.replace(',', '').replace('%', '').strip()
        return float(cleaned) if cleaned else np.nan
    except:
        return np.nan

melted_df2['Value'] = melted_df2['Value'].apply(clean_value)

# Filter
population_df = melted_df2[
    (melted_df2['Label (Grouping)'] == 'Population 16 years and over') &
    (melted_df2['Metric'].isin(['Estimate', 'Margin of Error']))
]

# Pivot table with fill_value=0
final_table2 = population_df.pivot_table(
    index=['ZCTA5', 'Label (Grouping)'],
    columns='Metric',
    values='Value',
    aggfunc='sum',
    fill_value=0
).reset_index()

# Ensure required columns exist
required_columns = ['Estimate', 'Margin of Error']
for col in required_columns:
    if col not in final_table2.columns:
        final_table2[col] = 0

# Create explicit copy
final_table_eco2 = final_table2.loc[:, ['ZCTA5', 'Label (Grouping)'] + required_columns].copy()

# Format numeric values
final_table_eco2[['Estimate', 'Margin of Error']] = final_table_eco2[['Estimate', 'Margin of Error']].astype(int)

print("Population 16+ by ZIP Code:")
final_table_eco2.head()

Population 16+ by ZIP Code:


Metric,ZCTA5,Label (Grouping),Estimate,Margin of Error
0,21207,Population 16 years and over,39670,0
1,21216,Population 16 years and over,21640,0
2,21223,Population 16 years and over,16062,0
3,21229,Population 16 years and over,35575,0
4,21230,Population 16 years and over,27466,0


#### Data Engineering

We merged datasets using zip code tabulation areas and for a data table that could be used for spatial mapping on longitude and latitude.

In [47]:
import geopandas as gpd
import pandas as pd

#  Load the shapefile 
zcta = gpd.read_file("/Users/bayowaonabajo/Downloads/tl_2023_us_zcta520/tl_2023_us_zcta520.shp")

# Convert to latitude/longitude (WGS84)
zcta = zcta.to_crs(epsg=4326)

#  Calculate centroid coordinates
zcta["longitude"] = zcta.geometry.centroid.x
zcta["latitude"] = zcta.geometry.centroid.y

# Rename ZCTA5 column to match your dataset
zcta = zcta.rename(columns={"ZCTA5CE20": "ZCTA5"})

#  Merge with DataFrame
final_table_economic2 = pd.merge(
    final_table_eco2,
    zcta[["ZCTA5", "longitude", "latitude"]],
    on="ZCTA5",
    how="left"
)

# display 
final_table_economic2.head()

import warnings
warnings.filterwarnings('ignore')

In [16]:
import pandas as pd
data_SW_demographic = pd.read_csv("/Users/bayowaonabajo/Downloads/ACS SW Balt Demographics/ACS 5 year SW Balt Demographic.csv") #add dataset
#data_SW_demographic.head(11)
for idx, value in enumerate(data_SW_demographic['Label (Grouping)'], 1):
    print(f"{idx}. {value}")

1. SEX AND AGE
2.     Total population
3.         Male
4.         Female
5.         Sex ratio (males per 100 females)
6.         Under 5 years
7.         5 to 9 years
8.         10 to 14 years
9.         15 to 19 years
10.         20 to 24 years
11.         25 to 34 years
12.         35 to 44 years
13.         45 to 54 years
14.         55 to 59 years
15.         60 to 64 years
16.         65 to 74 years
17.         75 to 84 years
18.         85 years and over
19.         Median age (years)
20.         Under 18 years
21.         16 years and over
22.         18 years and over
23.         21 years and over
24.         62 years and over
25.         65 years and over
26.         18 years and over
27.             Male
28.             Female
29.             Sex ratio (males per 100 females)
30.         65 years and over
31.             Male
32.             Female
33.             Sex ratio (males per 100 females)
34. RACE
35.     Total population
36.         One race
37.         Two or More 

In [17]:
import pandas as pd
data_SW_economic = pd.read_csv("/Users/bayowaonabajo/Downloads/ACS SW Balt Economic Data/ACS 5 year SW Balt Economic.csv") #add dataset
print(data_SW_economic.columns.tolist())


['Label (Grouping)', 'ZCTA5 21207!!Estimate', 'ZCTA5 21207!!Margin of Error', 'ZCTA5 21207!!Percent', 'ZCTA5 21207!!Percent Margin of Error', 'ZCTA5 21216!!Estimate', 'ZCTA5 21216!!Margin of Error', 'ZCTA5 21216!!Percent', 'ZCTA5 21216!!Percent Margin of Error', 'ZCTA5 21223!!Estimate', 'ZCTA5 21223!!Margin of Error', 'ZCTA5 21223!!Percent', 'ZCTA5 21223!!Percent Margin of Error', 'ZCTA5 21229!!Estimate', 'ZCTA5 21229!!Margin of Error', 'ZCTA5 21229!!Percent', 'ZCTA5 21229!!Percent Margin of Error', 'ZCTA5 21230!!Estimate', 'ZCTA5 21230!!Margin of Error', 'ZCTA5 21230!!Percent', 'ZCTA5 21230!!Percent Margin of Error']


In [18]:
for idx, value in enumerate(data_SW_economic['Label (Grouping)'], 1):
    print(f"{idx}. {value}")

1. EMPLOYMENT STATUS
2.     Population 16 years and over
3.         In labor force
4.             Civilian labor force
5.                 Employed
6.                 Unemployed
7.             Armed Forces
8.         Not in labor force
9.     Civilian labor force
10.         Unemployment Rate
11.     Females 16 years and over
12.         In labor force
13.             Civilian labor force
14.                 Employed
15.     Own children of the householder under 6 years
16.         All parents in family in labor force
17.     Own children of the householder 6 to 17 years
18.         All parents in family in labor force
19. COMMUTING TO WORK
20.     Workers 16 years and over
21.         Car, truck, or van -- drove alone
22.         Car, truck, or van -- carpooled
23.         Public transportation (excluding taxicab)
24.         Walked
25.         Other means
26.         Worked from home
27.         Mean travel time to work (minutes)
28. OCCUPATION
29.     Civilian employed population 16 

In [19]:
data_SW_economic.iloc[1:18]

Unnamed: 0,Label (Grouping),ZCTA5 21207!!Estimate,ZCTA5 21207!!Margin of Error,ZCTA5 21207!!Percent,ZCTA5 21207!!Percent Margin of Error,ZCTA5 21216!!Estimate,ZCTA5 21216!!Margin of Error,ZCTA5 21216!!Percent,ZCTA5 21216!!Percent Margin of Error,ZCTA5 21223!!Estimate,...,ZCTA5 21223!!Percent,ZCTA5 21223!!Percent Margin of Error,ZCTA5 21229!!Estimate,ZCTA5 21229!!Margin of Error,ZCTA5 21229!!Percent,ZCTA5 21229!!Percent Margin of Error,ZCTA5 21230!!Estimate,ZCTA5 21230!!Margin of Error,ZCTA5 21230!!Percent,ZCTA5 21230!!Percent Margin of Error
1,Population 16 years and over,39670,"±1,856",39670,(X),21640,"±1,360",21640,(X),16062,...,16062,(X),35575,"±2,040",35575,(X),27466,"±1,312",27466,(X)
2,In labor force,25204,"±1,432",63.5%,±2.2,11724,±835,54.2%,±3.2,8421,...,52.4%,±3.4,21759,"±1,266",61.2%,±2.2,20069,"±1,086",73.1%,±2.2
3,Civilian labor force,25095,"±1,425",63.3%,±2.1,11724,±835,54.2%,±3.2,8396,...,52.3%,±3.4,21685,"±1,274",61.0%,±2.3,19906,"±1,084",72.5%,±2.2
4,Employed,23464,"±1,230",59.1%,±2.2,10209,±795,47.2%,±3.1,7508,...,46.7%,±3.3,20292,"±1,191",57.0%,±2.2,19223,"±1,059",70.0%,±2.2
5,Unemployed,1631,±474,4.1%,±1.1,1515,±411,7.0%,±1.9,888,...,5.5%,±1.7,1393,±310,3.9%,±0.8,683,±194,2.5%,±0.7
6,Armed Forces,109,±90,0.3%,±0.2,0,±25,0.0%,±0.2,25,...,0.2%,±0.2,74,±82,0.2%,±0.2,163,±79,0.6%,±0.3
7,Not in labor force,14466,"±1,115",36.5%,±2.2,9916,"±1,055",45.8%,±3.2,7641,...,47.6%,±3.4,13816,"±1,279",38.8%,±2.2,7397,±725,26.9%,±2.2
8,Civilian labor force,25095,"±1,425",25095,(X),11724,±835,11724,(X),8396,...,8396,(X),21685,"±1,274",21685,(X),19906,"±1,084",19906,(X)
9,Unemployment Rate,(X),(X),6.5%,±1.7,(X),(X),12.9%,±3.3,(X),...,10.6%,±3.2,(X),(X),6.4%,±1.3,(X),(X),3.4%,±1.0
10,Females 16 years and over,21717,"±1,272",21717,(X),12555,"±1,165",12555,(X),8765,...,8765,(X),19919,"±1,447",19919,(X),13886,±783,13886,(X)


In [20]:
data_SW_economic.iloc[56:125]

Unnamed: 0,Label (Grouping),ZCTA5 21207!!Estimate,ZCTA5 21207!!Margin of Error,ZCTA5 21207!!Percent,ZCTA5 21207!!Percent Margin of Error,ZCTA5 21216!!Estimate,ZCTA5 21216!!Margin of Error,ZCTA5 21216!!Percent,ZCTA5 21216!!Percent Margin of Error,ZCTA5 21223!!Estimate,...,ZCTA5 21223!!Percent,ZCTA5 21223!!Percent Margin of Error,ZCTA5 21229!!Estimate,ZCTA5 21229!!Margin of Error,ZCTA5 21229!!Percent,ZCTA5 21229!!Percent Margin of Error,ZCTA5 21230!!Estimate,ZCTA5 21230!!Margin of Error,ZCTA5 21230!!Percent,ZCTA5 21230!!Percent Margin of Error
56,Total households,19608,±687,19608,(X),11667,±760,11667,(X),8300,...,8300,(X),18517,±772,18517,(X),16003,±759,16003,(X)
57,"Less than $10,000",1443,±306,7.4%,±1.6,1272,±372,10.9%,±3.0,816,...,9.8%,±2.5,1416,±331,7.6%,±1.8,900,±244,5.6%,±1.5
58,"$10,000 to $14,999",490,±191,2.5%,±1.0,1641,±385,14.1%,±3.2,1022,...,12.3%,±3.7,925,±321,5.0%,±1.7,596,±192,3.7%,±1.2
59,"$15,000 to $24,999",1349,±365,6.9%,±1.8,1206,±319,10.3%,±2.7,934,...,11.3%,±3.2,1651,±349,8.9%,±1.9,900,±216,5.6%,±1.3
60,"$25,000 to $34,999",1609,±382,8.2%,±1.9,848,±230,7.3%,±1.9,906,...,10.9%,±3.2,1270,±243,6.9%,±1.3,780,±218,4.9%,±1.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
120,Not in labor force:,6228,±789,6228,(X),5526,±902,5526,(X),4352,...,4352,(X),5846,±794,5846,(X),3940,±493,3940,(X)
121,With health insurance coverage,5341,±732,85.8%,±4.5,5112,±867,92.5%,±2.8,3768,...,86.6%,±5.8,5056,±732,86.5%,±5.2,3445,±475,87.4%,±5.1
122,With private health insurance,2202,±445,35.4%,±6.6,1733,±507,31.4%,±6.4,611,...,14.0%,±4.1,1709,±366,29.2%,±5.6,1860,±338,47.2%,±5.9
123,With public coverage,3925,±692,63.0%,±6.8,3706,±607,67.1%,±6.3,3341,...,76.8%,±6.3,3801,±674,65.0%,±6.4,1967,±365,49.9%,±6.5


In [21]:
import pandas as pd

# data
subset_df = data_SW_economic.iloc[56:125].copy()

# Cleaning column names to make them more readable
clean_columns = {
    col: col.replace("ZCTA5 ", "").replace("!!", "_") for col in subset_df.columns
}
subset_df.rename(columns=clean_columns, inplace=True)

# Reshape data to a long format for easier merging
melted_df = subset_df.melt(id_vars=["Label (Grouping)"], var_name="ZCTA5_Metric", value_name="Value")

# Splitting the `ZCTA5_Metric` column into ZCTA5 and Metric for clarity
melted_df[['ZCTA5', 'Metric']] = melted_df['ZCTA5_Metric'].str.split("_", n=1, expand=True)
melted_df.drop(columns=["ZCTA5_Metric"], inplace=True)

# Pivot the table so that it is easy for merging
final_table_econ = melted_df.pivot_table(index=["Label (Grouping)", "ZCTA5"], columns="Metric", values="Value", aggfunc="first").reset_index()

# Display
final_table_econ








Metric,Label (Grouping),ZCTA5,Estimate,Margin of Error,Percent,Percent Margin of Error
0,Civilian noninstitutionalized population,21207,49164,"±2,694",49164,(X)
1,Civilian noninstitutionalized population,21216,27231,"±2,284",27231,(X)
2,Civilian noninstitutionalized population,21223,19709,"±1,623",19709,(X)
3,Civilian noninstitutionalized population,21229,44133,"±2,465",44133,(X)
4,Civilian noninstitutionalized population,21230,33346,"±1,757",33346,(X)
...,...,...,...,...,...,...
260,With public coverage,21207,4035,±634,18.6%,±2.8
261,With public coverage,21216,2233,±380,23.9%,±3.7
262,With public coverage,21223,2295,±459,33.1%,±6.0
263,With public coverage,21229,4113,±852,21.8%,±3.8


A function termed categorize, was implemented to derive a new categorical variable. This function evaluates the textual content of the 'Metric' column for each record and assigns categorical labels.
Additionally, a pivot table is used aggregate the quantitative 'Estimate' data by both 'ZCTA5' and 'Category' and finally formatting with commas a separators, to enhance readability. 

In [49]:
import pandas as pd
import numpy as np

# data
df = pd.DataFrame(final_table_econ).rename(columns={'Label (Grouping)': 'Metric'})

# Clean and categorize data
def categorize(row):
    metric = row['Metric'].lower().strip()

    if not metric:
        return None

    if any(x in metric for x in ['less than $10,000', '$10,000 to $14,999', '$15,000 to $24,999']):
        return 'below 25,000'
    elif any(x in metric for x in ['$25,000 to $34,999', '$35,000 to $49,999', '$50,000 to $74,999',
                                   '$75,000 to $99,999', '$100,000 to $149,999', '$150,000 to $199,999', '$200,000 or more']):
        return 'above 25,000'
    
    if 'median income' in metric or 'median household income' in metric:
        return 'median_income'
    
    if 'without health insurance' in metric or 'no health insurance' in metric:
        return 'uninsured'
    
    
    
    if 'civilian' in metric:
        return 'total_population'
    
    return None

df['Category'] = df.apply(categorize, axis=1)
df = df.dropna(subset=['Category'])

# Function to clean and sum  values
def clean_and_sum(value):
    if pd.isna(value):
        return 0
    if isinstance(value, str):
        # Remove any non-numeric except  periods
        value = value.replace(',', '').strip()
    try:
        return float(value)
    except ValueError:
        return 0

df['Estimate'] = df['Estimate'].apply(clean_and_sum)

# Ensure 'ZCTA5' is treated as a string to prevent unintended numerical operations
df['ZCTA5'] = df['ZCTA5'].astype(str)

# Required categories 
required_categories = ['below 25,000', 'above 25,000', 'median_income',
                       'total_population', 'uninsured']

df['Category'] = pd.Categorical(df['Category'], categories=required_categories, ordered=True)

# Pivot table with proper numeric aggregation
pivot_df = df.pivot_table(
    index='ZCTA5',
    columns='Category',
    values='Estimate',
    aggfunc='sum',  # Ensure summation instead of concatenation
    fill_value=0
).reset_index()

# Rename columns for clarity 
column_mapping = {
    'below 25,000': 'Below25k_Estimate',
    'above 25,000': 'Above25k_Estimate',
    'median_income': 'MedianIncome_Estimate',
    'total_population': 'TotalPop_Estimate',
    'uninsured': 'Uninsured_Estimate'
}

pivot_df = pivot_df.rename(columns=column_mapping)

# Format numbers for better reading
numeric_cols = pivot_df.columns.difference(['ZCTA5'])
pivot_df[numeric_cols] = pivot_df[numeric_cols].applymap(lambda x: f"{int(x):,}" if pd.notnull(x) else x)

# Display 
pivot_df

import warnings
warnings.filterwarnings('ignore')

# Exploratory Data Analysis and Visualization

Using HTML we created a map that shows homeless shelters and social welfare centers within Southwest Baltimore and beyond with the aim of highlithing the health and wellness service deserts within Southwest Baltimore to give an idea of where Pauls place clinic could re-focus their outreach services to.

In [36]:
import os
import time
import webbrowser
import geopandas as gpd
import pandas as pd
import folium
from pyproj import Transformer
from folium.plugins import HeatMap
from shapely.geometry import Point
import osmnx as ox 

# Configure settings
ox.settings.use_cache = True
ox.settings.log_console = True

# Data
economic_data = pd.DataFrame(final_table_economic2)

def load_shelters(url, is_homeless=True):
    df = pd.read_csv(url, encoding='utf-8-sig', sep=',')
    
    if is_homeless:
        # Convert from EPSG:3857 (Web Mercator) to EPSG:4326 with correct axis order
        transformer = Transformer.from_crs("EPSG:3857", "EPSG:4326", always_xy=True)
        df['X'] = pd.to_numeric(df['X'], errors='coerce')
        df['Y'] = pd.to_numeric(df['Y'], errors='coerce')
        df = df.dropna(subset=['X', 'Y'])
        df['lon'], df['lat'] = transformer.transform(df['X'], df['Y'])  # Correct order
        geom = gpd.points_from_xy(df.lon, df.lat)
    else:
        df = df.dropna(subset=['Longitude', 'Latitude'])
        df['Longitude'] = pd.to_numeric(df['Longitude'], errors='coerce')
        df['Latitude'] = pd.to_numeric(df['Latitude'], errors='coerce')
        geom = gpd.points_from_xy(df['Longitude'], df['Latitude'])
        
    return gpd.GeoDataFrame(df, geometry=geom, crs="EPSG:4326")

# Load datasets with validation
print("Loading datasets...")
gdf_economic = gpd.GeoDataFrame(
    economic_data,
    geometry=gpd.points_from_xy(economic_data.longitude, economic_data.latitude),
    crs="EPSG:4326"
)

gdf_homeless = load_shelters("https://raw.githubusercontent.com/paulsplacemd/paulsplacelocator/main/Homeless_Shelters.csv")
gdf_social = load_shelters("https://raw.githubusercontent.com/paulsplacemd/paulsplacelocator/main/baltimore_help_social_health_welfare_shelters_locations.csv", False)

print(f"Loaded {len(gdf_homeless)} homeless shelters")
print(f"Loaded {len(gdf_social)} social shelters")

def create_pauls_place_buffers(coords):
    try:
        point = Point(coords[1], coords[0])  # (lon, lat)
        gdf = gpd.GeoDataFrame(geometry=[point], crs="EPSG:4326")
        projected = gdf.to_crs("EPSG:2248")
        projected["buffer_1mi"] = projected.geometry.buffer(5280)
        projected["buffer_3mi"] = projected.geometry.buffer(15840)
        return projected.to_crs("EPSG:4326")
    except Exception as e:
        print(f"Buffer error: {str(e)}")
        return gpd.GeoDataFrame()

def add_shelter_markers(map_obj, gdf_homeless, gdf_social):
    # Create isolated layer groups
    homeless_layer = folium.FeatureGroup(name='👥 Homeless Shelters', show=True)
    social_layer = folium.FeatureGroup(name='🏥 Social Shelters', show=True)

    # Add social shelters first 
    print("\nAdding social shelters:")
    for idx, row in gdf_social.iterrows():
        try:
            folium.Marker(
                location=[row['Latitude'], row['Longitude']],
                popup=row.get('Location', 'Social Shelter'),
                icon=folium.Icon(color='green', icon='heart', prefix='fa')
            ).add_to(social_layer)
        except Exception as e:
            print(f"Social marker error: {str(e)}")

    # Add homeless shelters 
    print("\nAdding homeless shelters:")
    for idx, row in gdf_homeless.iterrows():
        try:
            folium.Marker(
                location=[row['lat'], row['lon']],  # Using coordinates
                popup=row.get('name', 'Homeless Shelter'),
                icon=folium.Icon(color='red', icon='bed', prefix='fa')
            ).add_to(homeless_layer)
        except Exception as e:
            print(f"Homeless marker error: {str(e)}")

    # layers in correct z-order
    social_layer.add_to(map_obj)
    homeless_layer.add_to(map_obj)
    return social_layer, homeless_layer

def create_accessibility_map():
    m = folium.Map(location=[39.29, -76.65], zoom_start=12)

    # Economic markers
    for _, row in gdf_economic.iterrows():
        try:
            folium.CircleMarker(
                location=[row['latitude'], row['longitude']],
                radius=25 + (row['Estimate'] / 7500),
                popup=f"ZIP {row['ZCTA5']}",
                color='#4b0082',
                fill_color='#9370db',
                fill_opacity=0.7,
                weight=1
            ).add_to(m)
        except Exception as e:
            print(f"Economic error: {str(e)}")

    # Shelter markers with isolated layers
    add_shelter_markers(m, gdf_homeless, gdf_social)

    # Paul's Place elements
    pauls_place_coords = [39.2848, -76.6268]
    buffers = create_pauls_place_buffers(pauls_place_coords[::-1])
    
    if not buffers.empty:
        # Buffers
        folium.GeoJson(
            buffers['buffer_1mi'],
            style_function=lambda x: {'fillColor': 'green', 'color': 'green', 'fillOpacity': 0.1}
        ).add_to(m)
        
        folium.GeoJson(
            buffers['buffer_3mi'],
            style_function=lambda x: {'fillColor': 'yellow', 'color': 'yellow', 'fillOpacity': 0.1}
        ).add_to(m)

        # Circles as area radius
        folium.Circle(
            pauls_place_coords,
            radius=1609.34,  # 1 mile in meters
            color='blue',
            fill=False,
            weight=2
        ).add_to(m)
        
        folium.Circle(
            pauls_place_coords,
            radius=4828.02,  # 3 miles in meters
            color='red',
            fill=False,
            weight=2
        ).add_to(m)

    # Heatmap with corrected coordinates
    try:
        heatmap_data = []
        for _, row in gdf_homeless.iterrows():
            heatmap_data.append([row['lat'], row['lon']])
        for _, row in gdf_social.iterrows():
            heatmap_data.append([row['Latitude'], row['Longitude']])
        
        HeatMap(heatmap_data, radius=40, blur=30).add_to(m)
    except Exception as e:
        print(f"Heatmap error: {str(e)}")

    # legend
    legend_html = '''
    <div style="position: fixed; 
                bottom: 50px; 
                left: 50px; 
                z-index: 1000;
                padding: 10px;
                background: white;
                border: 2px solid grey;
                border-radius: 5px;
                box-shadow: 0 0 10px rgba(0,0,0,0.2)">
        <b>Legend</b><br>
        <i class="fa fa-bed" style="color:white; background:red; padding:5px; border-radius:3px"></i> Homeless<br>
        <i class="fa fa-heart" style="color:white; background:green; padding:5px; border-radius:3px"></i> Social<br>
        <div style="background:blue; height:2px; margin:5px 0"></div> 1mi Radius<br>
        <div style="background:red; height:2px; margin:5px 0"></div> 3mi Radius
    </div>
    '''
    m.get_root().html.add_child(folium.Element(legend_html))

    # Layer control
    folium.LayerControl(position='topright', collapsed=False).add_to(m)

    return m

if __name__ == "__main__":
    output_path = os.path.abspath("SWbaltimore_accessibilitymap.html")
    print(f"Generating map: {output_path}")
    
    try:
        start_time = time.time()
        final_map = create_accessibility_map()
        final_map.save(output_path)
        print(f"Map created in {time.time()-start_time:.1f} seconds")
        webbrowser.open(output_path)
    except Exception as e:
        print(f"Critical error: {str(e)}")

    display(final_map) 

Loading datasets...
Loaded 38 homeless shelters
Loaded 43 social shelters
Generating map: /Users/bayowaonabajo/Downloads/SWbaltimore_accessibilitymap.html

Adding social shelters:

Adding homeless shelters:
Map created in 0.1 seconds


# Conclusion

Our project was about figuring out which parts of Southwest Baltimore are struggling the most when it comes to health and access to services. We noticed a significant area around Pauls Place is lacking health or social welfare facilities which is possibly overburdening the two health facilities and three social welfare centers in Southwest Baltimore. We found that half the deaths in that area could’ve been prevented if people had better access to healthcare. There are more liquor stores in these neighborhoods than in other parts of the city, and fewer grorcery stores or places to get healthy food.  
This work matters because it helps places like Paul’s Place and others know where to focus their time and energy. With limited resources, it’s important to know which neighborhoods need the most help. In addition, the predictive model we are creating can be used in the future to keep track of how things change over time. 

Some of the challenges we had was that we didn’t have a lot of data directly from Paul’s Place, so we had to depend on public data like such as US Census Bureau, Open Baltimore, and health department dataset. Another challenge was the access to resources Paul's Place has which makes it harder to determine how we’d want to gather new patient information and store this data. 
Moving forward, it would help a lot if more data was collected internally. Things like online forms or surveys could really make a difference. Being able to collate and store current user data would help get insights on the people going to Paul’s Place and the community. 

# References.

-American Community Survey (ACS) from the U.S. Census Bureau.https://www.census.gov/programs-surveys/acs/news/data-releases/2023.html.

-Hospitals. (n.d.). https://data.baltimorecity.gov/datasets/e37ce649df4344dab174b34593b1c4b6_0/explore?location=39.307459%2C-76.628697%2C11.39&showTable=true.

-Homeless shelters. (n.d.). https://data.baltimorecity.gov/datasets/710b935a4e864284ad5da9019fe5fca2_0/explore?location=39.306731%2C-76.588463%2C11.39&showTable=true.

-Southwest Baltimore neighborhood in Baltimore, Maryland (MD), 21223, 21229, 21230, 21216, 21207 subdivision profile - real estate, apartments, condos, homes, community, population, jobs, income, streets. (n.d.). https://www.city-data.com/neighborhood/Southwest-Baltimore-Baltimore-MD.html.