# CS418 - MoneyMapping In Chicago: A dive into Grants Programs

By Sadika Almasri, Saja Bushara, Joshua Adereti, Freya Modi and Alisha Zaidi

This project examines how loans and grants in Chicago are distributed across neighborhoods, focusing on whether that distribution is equitable. Using datasets from SBIF, NOF, and other city funding programs alongside demographic data, we aim to answer our guiding questions: 

Are loans/grants in Chicago distributed equitably across neighborhoods? and How do socioeconomic/demographic factors relate to that distribution? 

Because Chicago’s neighborhoods differ sharply by race, income, and investment history, our data helps reveal whether funding supports communities proportionately or if certain areas receive more benefits than others.

Since the proposal, our overall scope hasn’t changed, but we refined our dataset choices. We removed older datasets because newer 2023-2025 data includes the same indicators and is more relevant. We added demographic and ownership share datasets so we could directly compare funding patterns with neighborhood characteristics. 

For data preparation, all datasets were loaded from CSVs and cleaned to make them consistent. This included standardizing column names, converting currency strings into numeric values, and aligning community area identifiers across files. We created a **grant coverage metric (incentive amount ÷ total project cost) to measure what fraction of each project was funded by SBIF or NOF**. After that, we grouped the data by community area to calculate total funding, project counts, and average percentage coverage, and then merged these results with demographic indicators like income and racial composition.

During exploratory data analysis, we reviewed dataset sizes, checked missing values, and summarized which neighborhoods appear most often in each grant program. We identified early patterns, like SBIF covering more neighborhoods than NOF and certain areas receiving higher total dollars and higher grant coverage percentages. We also ran simple correlations comparing grant coverage with demographic variables such as non-white household share and income level. These initial checks help us understand where disparities may exist and guide the direction of the deeper analysis.

In [107]:
import pandas as pd
import numpy as np

#this is just a checker to make sure the csv's r being loaded, disregard
def read_csv_any(p):
    try:
        return pd.read_csv(p, low_memory=False)
    except UnicodeDecodeError:
        return pd.read_csv(p, encoding="latin-1", low_memory=False)

paths = { 
    "cmi_loans": "Datasets/CMI_Microloans.csv",
    "cmi_demo": "Datasets/CMI_Microloans_By_Ethnicity_Gender.csv",
    "nof": "Datasets/NOF_Small_Projects_2025.csv",
    "sbif": "Datasets/SBIF_Applicants_Small_Business_Projects_2025.csv",
    "socio_old": "Datasets/Socioecon_Community_2008_2012.csv",
    "socio_new": "Datasets/Socioeconomic_Neighborhoods_2025.csv.csv",
    "socio_by_ca": "Datasets/chi_data_tract.csv",#ok this is just an update social_old, same df structure
}

dfs = {k: read_csv_any(p) for k,p in paths.items()}

# for name, df in dfs.items():
#     print(f"\n{name}: {df.shape[0]:,} rows and {df.shape[1]} cols")
#     print(df.columns.tolist()[:12])
#     print(df.head(3)) 


# Part 1 - Grant Coverage Metrics & Findings
To measure how much support each project actually receives, we created a new column called GRANT_RATIO, calculated in this way:

GRANT_RATIO = INCENTIVE AMOUNT ÷ TOTAL PROJECT COST

This tells us what percentage of a project was covered by the grant. Before calculating that, we cleaned the columns by removing symbols and converting everything into numbers. We also handled cases where the total project cost was zero so the ratio wouldn’t break. After that, we turned the ratio into GRANT_Percent to make it easier to read.

In [108]:
#these are grant programs:  sbif and nof grants , this is to see the grant/community 
#obervations: more sbif usage, ill look more into other columns to see why this is the case but its more sbif are long term, nof are pre project


for prog in ["sbif", "nof"]:
    dfs[prog].columns = dfs[prog].columns.str.strip()

#cleaning all the "$some number" to be nums. curr strings
def to_numeric_clean(s: pd.Series) -> pd.Series:
    return pd.to_numeric(
        s.astype(str).str.replace(r"[^0-9.\-]", "", regex=True),
        errors="coerce"
    )

#grant ratios = INCENTIVE AMOUNT ÷ TOTAL PROJECT COST, this tells us what share of each project’s total request was covered by the EACH grant
for prog in ["sbif", "nof"]:
    df = dfs[prog]
    needed = ["INCENTIVE AMOUNT", "TOTAL PROJECT COST", "COMMUNITY AREA"]
    if "INCENTIVE AMOUNT" in df.columns:
        df["INCENTIVE AMOUNT"] = to_numeric_clean(df["INCENTIVE AMOUNT"])
    if "TOTAL PROJECT COST" in df.columns:
        df["TOTAL PROJECT COST"] = to_numeric_clean(df["TOTAL PROJECT COST"])

    #some projects dont actually have a cost so its $0.... if thats the case we dont want to devide lol
    if {"INCENTIVE AMOUNT", "TOTAL PROJECT COST"}.issubset(df.columns):
        denom = df["TOTAL PROJECT COST"].replace(0, np.nan)
        df["GRANT_RATIO"] = df["INCENTIVE AMOUNT"] / denom
    else:
        df["GRANT_RATIO"] = np.nan
    
    df["GRANT_Percent"] = (df["GRANT_RATIO"] * 100).round(1) #just turning the ratios into % for easier looks

print("SBIF GRANT_Percent summary (%):\n", dfs["sbif"]["GRANT_Percent"].dropna().describe())
print("NOF GRANT_Percent summary (%):\n", dfs["nof"]["GRANT_Percent"].dropna().describe())

print(dfs["sbif"]["GRANT_Percent"].dropna().map(lambda x: f"{x:.1f}%").head())


SBIF GRANT_Percent summary (%):
 count    2150.000000
mean       60.089581
std        18.367201
min         3.300000
25%        50.000000
50%        57.150000
75%        75.000000
max       100.000000
Name: GRANT_Percent, dtype: float64
NOF GRANT_Percent summary (%):
 count    119.000000
mean      54.594118
std       19.543950
min        1.700000
25%       47.850000
50%       52.500000
75%       69.400000
max      100.000000
Name: GRANT_Percent, dtype: float64
0    51.1%
1    49.3%
2    35.2%
3    75.0%
4    75.0%
Name: GRANT_Percent, dtype: object


# SBIF Grant - What the Numbers Mean:

count = 2,150 projects with valid values (others had missing cost or incentive data)

mean ≈ 0.601 -> on average, ~60% of each project’s cost is covered by SBIF

median ≈ 0.572 -> about half of projects have ≤ ~57% coverage!

IQR (25%–75%) = 0.50–0.75 -> the middle half of projects get 50–75% of costs covered

max = 1.0 -> some projects are fully covered (grant = total cost)

min ≈ 0.033 -> tiny coverage outliers (~3% subsidized)

What does this mean?

SBIF is clearly the more generous and widely used program. The average coverage (~60%) shows strong city participation in funding local business improvements. Many projects receive over half their total costs funded, suggesting SBIF is designed to make small upgrades financially realistic for business owners.


# NOF Grant - What the Numbers Mean:

count = 119 projects with valid data

mean ≈ 0.546 -> average coverage of ~55%

IQR (25%–75%) = 0.48–0.69 -> typical NOF projects cover ~48–69% of total costs

max = 1.0

min ≈ 0.017 → same range, much fewer high coverage projects

# Analysis

Looking at the summary stats above, SBIF clearly shows up more often in the data than NOF. SBIF also tends to have higher grant coverage percentages overall, while NOF’s percentages are lower and more spread out. We haven't made claims about why yet, but the numbers themselves show the two programs function differently in terms of how much of the project they’re actually covering.

These initial stats help set the baseline for understanding what normal coverage looks like in each program. When we start connecting grant coverage to neighborhood characteristics like income, demographics, or ownership patterns later, we’ll be able to tell whether a community is getting unusually high or low support compared to the typical SBIF or NOF project.

In [109]:
#community areas BY project count
print("SBIF:top 20 community areas by count:\n")
print(dfs["sbif"]["COMMUNITY AREA"].value_counts().head(20))


print("NOF:top 20 community areas by count:\n")
print(dfs["nof"]["COMMUNITY AREA"].value_counts().head(20))

#this is just for missing stuff
# print("SBIF missing (top 10):\n")
# print(dfs["sbif"].isna().sum().sort_values(ascending=False).head(10))

# print("NOF missing(top 10):\n")
# print(dfs["nof"].isna().sum().sort_values(ascending=False).head(10))

#this is just checking if the people who applied for sbif maybe also did nof from a parituclar community : SBIF covers the majority of chicago neighborhoods
sbif_areas = set(dfs["sbif"]["COMMUNITY AREA"].dropna().astype(str).str.strip().unique()) if "COMMUNITY AREA" in dfs["sbif"].columns else set()
nof_areas  = set(dfs["nof"]["COMMUNITY AREA"].dropna().astype(str).str.strip().unique())  if "COMMUNITY AREA" in dfs["nof"].columns  else set()

#idk this might be useful, leave this here
print("\nAreas in both SBIF & NOF:", len(sbif_areas & nof_areas))
print("Areas only in SBIF:", len(sbif_areas - nof_areas))
print("Areas only in NOF:",  len(nof_areas - sbif_areas))


# here, we are removing any nulls for community area and %, grouping by commuity area so they show up once and calculating median and mean of those grant %'s
sbif_by_ca = (
    dfs["sbif"].dropna(subset=["COMMUNITY AREA","GRANT_Percent"]).groupby("COMMUNITY AREA")["GRANT_Percent"]
      .agg(count="count", mean="mean", median="median")
      .sort_values("mean", ascending=False)
)

sbif_dollars = (
    dfs["sbif"].dropna(subset=["COMMUNITY AREA","INCENTIVE AMOUNT"]).groupby("COMMUNITY AREA")["INCENTIVE AMOUNT"].sum()
      .sort_values(ascending=False)
)

nof_dollars = (
    dfs["nof"].dropna(subset=["COMMUNITY AREA","INCENTIVE AMOUNT"]).groupby("COMMUNITY AREA")["INCENTIVE AMOUNT"].sum()
      .sort_values(ascending=False)
)
print("\nTotal SBIF dollars by neighborhood (top 20):")
print(sbif_dollars.head(20))

print("\nTotal NOF dollars by neighborhood (top 20):")
print(nof_dollars.head(20))

print("who gets the biggest % coverage by neighborhood")
print(sbif_by_ca.head(30))


SBIF:top 20 community areas by count:

COMMUNITY AREA
Near West Side     160
Portage Park       138
West Town          111
Lincoln Square      88
Logan Square        84
Austin              82
North Center        80
Humboldt Park       79
Edgewater           79
Albany Park         77
Uptown              77
Belmont Cragin      63
New City            58
North Park          56
West Ridge          51
North Lawndale      49
Mount Greenwood     49
Bridgeport          46
Hermosa             41
Jefferson Park      40
Name: count, dtype: int64
NOF:top 20 community areas by count:

COMMUNITY AREA
South Lawndale            15
Grand Boulevard            9
North Lawndale             8
Chicago Lawn               7
Austin                     6
South Shore                6
Avalon Park                5
Auburn Gresham             5
Calumet Heights            5
Greater Grand Crossing     5
Roseland                   5
South Chicago              5
Humboldt Park              4
New City                   4
W

Looking at how many projects each neighborhood has above, SBIF clearly shows the widest reach. Neighborhoods like Near West Side, Portage Park, West Town, and Lincoln Square have anywhere from around 80 to over 160 SBIF projects, showing that the program is used across a huge range of communities. NOF is way smaller in volume, with its highest counts in places like South Lawndale, Grand Boulevard, and North Lawndale, where project numbers range from just a couple to about fifteen. Right away, this tells us the two programs operate at very different scales and in different parts of the city.

The distribution of dollars reflects the same pattern. SBIF neighborhoods like Near West Side, Portage Park, and West Town receive several million dollars each, whereas NOF’s highest totals appear in South Lawndale, North Lawndale, and Auburn Gresham, but at a smaller overall scale. SBIF is essentially the “big money, big footprint” program, while NOF is smaller but more concentrated in historically under-invested areas. When we look at average grant coverage by neighborhood, areas like Gage Park, Washington Park, East Side, Woodlawn, and Beverly show very high percentage coverage, often in the 70–90 percent range. This becomes important later when we compare grant coverage to demographics because it hints that some communities may be receiving a higher share of support per project even if they have fewer projects overall.

In [110]:
socio = pd.read_csv("Datasets/chi_data.csv")

# come back to this later loan stuff later

# socio["loan_amount_per_hh"] = socio["total_loan_amount"] / socio["total_hh"]
# socio["loans_per_1k_hh"] = socio["total_loans"] / (socio["total_hh"] / 100)

# print(socio[["loan_amount_per_hh","non_white_hh_share","ami_shr"]].corr())

socio["community_area"] = socio["community_area"].str.strip().str.lower()
dfs["sbif"]["COMMUNITY AREA"] = dfs["sbif"]["COMMUNITY AREA"].astype(str).str.strip().str.lower()
dfs["nof"]["COMMUNITY AREA"]  = dfs["nof"]["COMMUNITY AREA"].astype(str).str.strip().str.lower()


#just avg of grant % coverage by community area
sbif = dfs["sbif"].groupby("COMMUNITY AREA")["GRANT_Percent"].mean().reset_index()
nof  = dfs["nof"].groupby("COMMUNITY AREA")["GRANT_Percent"].mean().reset_index()

#merging the community areas for sbif and nof-> primary key
merged = socio.merge(sbif, left_on="community_area", right_on="COMMUNITY AREA", how="left").merge(nof, left_on="community_area", right_on="COMMUNITY AREA", how="left")

# note i split these up bc the printing was a mess and it was hard to see, all of this is explained in the readme
print("\ndemographic shares:")
print(merged[["community_area","white_hh_share","black_hh_share","latino_hh_share","non_white_hh_share"]].head(20).to_string(index=False))

print("\nownership shares:")
print(merged[["community_area","white_own_shr","black_own_shr","latino_own_shr","non_white_own_shr"]].head(20).to_string(index=False))

print("\nincome & grant coverage relationship:")
print(merged[["community_area","ami_shr","income_level","low_inc","GRANT_Percent_x","GRANT_Percent_y"]].head(20).to_string(index=False))



demographic shares:
    community_area  white_hh_share  black_hh_share  latino_hh_share  non_white_hh_share
       albany park        0.418062        0.047689         0.364065            0.581938
    archer heights        0.281593        0.008170         0.649732            0.718407
     armour square        0.171386        0.091986         0.044605            0.828614
           ashburn        0.143886        0.517315         0.309608            0.856114
    auburn gresham        0.009184        0.959228         0.013805            0.990816
            austin        0.061827        0.811605         0.111939            0.938173
       avalon park        0.018486        0.956033         0.000000            0.981514
          avondale        0.460985        0.029023         0.446657            0.539015
    belmont cragin        0.210503        0.033998         0.726230            0.789497
           beverly        0.585149        0.346440         0.044899            0.414851
        bri

# Part 2: Linking Grant Coverage Metric with Demographics

Now, here is where we try to make use of the data from chi_data.csv, which is a dataset that combines neighborhood level data. We have split these into 3 different datasets, with the last dataset connecting Income and Grant Coverage metric.

# Dataset 1: Demographic Shares

These columns describe the racial and ethnic makeup of each neighborhood. 

This dataset shows a strong racial pattern across neighborhoods. Burnside stands out as almost entirely Black, with 95% Black households, and Austin also shows a heavily Black population at 81%. White-majority neighborhoods appear on the opposite end, such as Dunning with 73% white households. Latino neighborhoods like Brighton Park (10% white households) and Belmont Cragin (18% white households) still show overwhelmingly non white populations. Overall, these numbers reveal a clear racial geography where Black and Latino communities cluster in distinct, lower-income areas, while white residents concentrate in higher-income zones.

This is not super shocking as this just reflects the patterns of segregation and unequal economic opportunity in Chicago, which our group had talked about as apart of our initial hypothesis.

# Dataset 2: Ownership Shares

These columns describe who owns homes in each area.

This dataset shows who actually controls property in these same neighborhoods, and the alignment varies by community. Black neighborhoods have the strongest resident owner match: Burnside is 93% Black among owners and Austin has 72% Black homeowners, closely mirroring their household shares. But Latino neighborhoods reveal ownership gaps: Brighton Park, despite only 10% white households, has 16% white homeowners, and Archer Heights rises from 28% white households to 30% white owners. These mismatches suggest that in some neighborhoods (especially Latino ones ) the people living there are not always the ones benefiting from property based investments.

# Dataset 3: Income and Grant Coverage Metric

These columns connect neighborhood income levels with grant program outcomes.

This dataset shows income patterns follow race directly: East Garfield Park has an AMI of 0.30, Burnside 0.35, and Austin 0.40, while white majority Beverly sits far above the regional median at 1.20. When you look at grant coverage, SBIF consistently provides larger percentages in higher-income neighborhoods. For example, Beverly receives 73%, Ashburn 73%, and Avondale 63%, compared to lower income communities like Burnside (46%) and East Garfield Park (55%). NOF, however, behaves differently: Austin receives 62%, Brighton Park 64%, and Belmont Cragin 61%, while wealthier areas show no NOF activity at all. This dataset clearly shows SBIF reinforcing property wealth while NOF directs more support to under-resourced neighborhoods.

These indicators are important in helping us assess whether lower income or majority non white neighborhoods receive higher or lower levels of financial support from city grant programs. Which we found some interesting results from in the next section. 

In [111]:
#grant coverage and neighborhood characteristics relatiosbusp

#SBIF correlations (GRANT_Percent_x)
sbif_corr = merged[[
    "white_hh_share","black_hh_share","latino_hh_share",
    "non_white_hh_share","ami_shr","GRANT_Percent_x"
]].corr()["GRANT_Percent_x"].drop("GRANT_Percent_x")

print("\nSBIF correlations with neighborhood factors:")
print(sbif_corr.to_string())

# NOF correlations 
nof_corr = merged[[
    "white_hh_share","black_hh_share","latino_hh_share",
    "non_white_hh_share","ami_shr","GRANT_Percent_y"
]].corr()["GRANT_Percent_y"].drop("GRANT_Percent_y")

print("\nNOF correlations with neighborhood factors:")
print(nof_corr.to_string())



SBIF correlations with neighborhood factors:
white_hh_share        0.001029
black_hh_share        0.044600
latino_hh_share      -0.044506
non_white_hh_share   -0.001029
ami_shr               0.041459

NOF correlations with neighborhood factors:
white_hh_share       -0.093381
black_hh_share       -0.164185
latino_hh_share       0.236529
non_white_hh_share    0.093381
ami_shr              -0.002954


# Grants and Demographic/Income Relationship Findings

# What the SBIF numbers show

Now for the real findings...

The SBIF correlations show that grant coverage has no meaningful relationship to neighborhood race or income. The numbers are essentially zero across the board: white household share (+0.001), Black household share (+0.045), Latino household share (–0.045), non-white household share (–0.001), and even income (AMI, +0.041) all sit far below the threshold of a real association. These tiny values mean SBIF provides roughly the same percentage of project cost coverage regardless of whether a neighborhood is white, Black, latino, low-income, or higher-income. In other words, the SBIF formula is statistically neutral! It does not increase or decrease grant percentage based on demographic or income characteristics.

# What the NOF numbers show

The NOF correlations show slightly more direction, but still weak patterns... Coverage tends to be somewhat higher in Latino neighborhoods, reflected in the positive correlation with Latino household share (+0.237) and with non white household share (+0.093). Meanwhile, coverage is slightly lower in whiter (–0.093) or Black-majority (–0.164) neighborhoods. Income has no real relationship to NOF coverage either, shown by the near zero AMI correlation (–0.003). Overall, these values indicate that NOF leans gently toward supporting Latino or mixed-race areas, but the relationships are weak and do not represent a strong/systematic policy pattern.


# Summary of findings and next steps

The data shows that Chicago’s major grant programs are very consistent in how they cover project costs across different neighborhoods. Even in areas with dramatically different racial or income compositions, the share of project costs covered by grants stays within a similar range  between 50% and 70%. The correlation results confirm this: SBIF coverage has almost no relationship with race or income (so with correlations like white_hh_share +0.001, black_hh_share +0.045, latino_hh_share –0.045, and ami_shr +0.041), and NOF coverage also shows only very weak associations (such as latino_hh_share +0.237 and white_hh_share –0.093).

 This indicates that the grant formulas themselves are NOT favoring any specific racial group or income level in how much funding each approved project receives.

However, the bigger issue isn't the percentage covered, but it’s who actually gets to participate. Wealthier or more active neighborhoods show up far more often in the data as we showed by the counts of applications per neightborhood. They submit more applications and launch more qualifying projects. Neighborhoods on the South and West Sides, which experience the greatest economic hardship, appear less frequently not because grants are less generous there, but because fewer projects enter the pipeline in the first place. This points to barriers in participation, (so factors such as access, awareness, capacity, or resources) rather than unfair treatment in the grant itself. In short, Chicago’s grants are equitable in how much they fund, but not everyone has the same level of access to the process that unlocks those dollars.

The next steps for this portion is creating a visualization which reflects this particular finding.