# **PROJECT 3: Whose Streets? Gender Representation in NYC Street Names**

## Introduction

Urban space is not neutral. The names of our streets, bridges, and plazas
are a form of public memory, telling us whose stories are honored and whose
are ignored. Feminist urbanism argues that symbolic representation in the
city (statues, street names, landmarks) reflects deeper power structures.

In this project, I use the New York City street centerline dataset to
investigate gender representation in street names. Concretely, I ask:

> **To what extent are New York City’s streets named after women versus men,
> and how does that representation vary across boroughs?**

Using a simple but transparent method to classify street names by gender, I
quantify the imbalance and explore differences by borough. Along the way,
I clean and transform a large real-world dataset, group and reshape data,
and produce visualizations to tell a feminist story about how patriarchy
shows up on NYC street signs.

**Read more here:** https://www.afar.com/magazine/city-of-women-map-2-dot-0-names-nyc-subway-stops-after-female-figures

![Funny GIF](source.gif)

### **Dataset(s) to be used:**  
NYC Street Centerline (LION) dataset, downloaded from NYC Open Data and
saved locally as `/Users/apple/Desktop/Project3/Centerline_20251205.csv`.

Link is here: https://data.cityofnewyork.us/City-Government/Centerline/3mf9-qshr

### **Analysis question:**  
To what extent are New York City’s streets named after women versus men, and how does gender representation vary across boroughs?
This includes both a citywide comparison and a borough-level breakdown of gender representation.

**Columns that will be used:**

### **From the Centerline dataset:**
- **`the_geom`** – WKT geometry of each street segment (used to extract coordinates for mapping)  
- **`STREET NAME`** – core street name used for gender classification  
- **`Full Street Name`** – extended label for reference  
- **`Borough Code`** – numeric borough identifier (1–5)  
- **`PHYSICALID`** – unique ID for each street segment  

### **Created during analysis:**
- **`name_clean`** – standardized lowercase street name  
- **`tokens`** – tokenized version of the cleaned name  
- **`gender_guess`** – gender classification (`"female"`, `"male"`, `"ambiguous"`, `"unknown/other"`)  
- **`lat` / `lon`** – extracted coordinates from the WKT geometry for the map visualization  

These derived variables support grouping, gender detection, and geospatial plotting.

### **Columns used to merge/join datasets**  
This project primarily uses a single dataset, but gender assignment relies on an internally created table of gendered first names. Classification is performed by matching:

- Street data: **first token of `name_clean`**  
- Name–gender table: **first name**  

This functions as a logical (in-code) join rather than a separate dataset merge.

## **Hypothesis**  
**New York City’s street names overwhelmingly honor men.**  
I expect that:

- Less than **10%** of gender-identifiable street names will honor women  
- Female-named streets will be more common in outer boroughs than in Manhattan  
- Manhattan will show the **lowest female representation**, with slightly higher (but still very limited) representation in Queens and Brooklyn  

This hypothesis reflects historical gender disparities in public commemoration and symbolic urban space.

## **Step 1: Code Imports & Settings**

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import plotly.express as px

import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"

pd.set_option("display.max_columns", 50)
pd.set_option("display.precision", 3)

## **Step 2: Load the Dataset**

In [2]:
street_path = "/Users/apple/Desktop/Project3/Centerline_20251205.csv"

streets_raw = pd.read_csv(street_path, low_memory=False)
streets_raw.head()

Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,WITHIN_BNDY_DIST,TRUCK ROUTE TYPE,Collection Method,FROM_LEVEL_CODE,TO_LEVEL_CODE,B5SC,Snow Priority,JOINID,BPHYS_ID,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label
0,MULTILINESTRING ((-73.965287415722 40.61500085...,46810,901,999,900,998,11230.0,11230.0,2,,FT,1,AVE,,42563,,1822608708,1822600714,,,,,,3,,...,,,,13,13,314280,C,,,,2.0,2.0,4.0,,,,,AVE N,,104.31497003951,cedc2dde-7e8b-4427-af4c-c7fe5174b2c7,,,N,AVE N
1,MULTILINESTRING ((-73.857805049963 40.86304449...,86757,2501,2599,2500,2598,10469.0,10469.0,2,,TW,1,,AVE,78537,,1522607532,1522608941,,,,,,2,,...,,,,13,13,240620,S,,,,2.0,2.0,4.0,,,,,HONE AVE,,359.45658741154,9c163e85-23ad-418a-a54c-40c8eef802e5,,,HONE,HONE AVE
2,MULTILINESTRING ((-73.901047993134 40.76932048...,84282,22-001,22-099,22-000,22-098,11105.0,11105.0,2,,TF,1,,ST,76298,,102261264,102265239,,,,,,4,,...,,,F,13,13,410640,S,,,,1.0,2.0,3.0,,,,,48 ST,,277.53323334463,fdccf94f-201f-4312-a96d-0c26d0fa7cfd,,,48,48 ST
3,MULTILINESTRING ((-74.010562603546 40.72220989...,79741,79,107,78,100,10013.0,10013.0,2,,FT,1,,ST,72173,,1222605034,1222601248,,,,,,1,,...,,,,13,13,124400,S,,,,1.0,2.0,3.0,,,,,LAIGHT ST,,120.04580684369,bcbbb800-b963-45bf-804e-42f5ada6c207,,,LAIGHT,LAIGHT ST
4,MULTILINESTRING ((-74.121613808175 40.55846826...,184200,0,0,74,86,10306.0,10306.0,2,,TW,1,,AVE,115877,,1722613316,1722609883,,,,,,5,,...,,,,13,13,520590,H,,,,2.0,1.0,3.0,,,,,BROOK AVE,,90.28879629663,d7f8c5d8-637b-4122-950f-faf2206c8561,,,BROOK,BROOK AVE


## **Step 3: Basic Code Inspection**

In [3]:
# Show column names
print("COLUMN NAMES:\n")
print(streets_raw.columns.tolist())

# Show the first few rows
print("\n\nSAMPLE ROWS:\n")
streets_raw.head(10)

COLUMN NAMES:

['the_geom', 'PHYSICALID', 'L_LOW_HN', 'L_HIGH_HN', 'R_LOW_HN', 'R_HIGH_HN', 'L_ZIP', 'R_ZIP', 'STATUS', 'BIKE_LANE', 'TRAFDIR', 'RW_TYPE', 'PRE_TYPE', 'POST_TYPE', 'OBJECTID', 'FCC', 'Left side Block Face ID', 'Right Side Block Face ID', 'Average Travel Time', 'Roadway Jurisdiction', 'NOMINALDIR', 'ACCESSIBLE', 'NONPED', 'Borough Code', 'Borough Indicator', 'Segloc Status', 'San District Inc', 'Left Side Subset', 'Right Side Subset', 'Continuous Parity Flag', 'Twisted Parity Flag', 'Posted Speed', 'Segment Length', 'Street Width', 'Street Width IRR', 'Special Disaster', 'Fire Lane', 'CREATED_DATE', 'MODIFIED_DATE', 'WITHIN_BNDY_DIST', 'TRUCK ROUTE TYPE', 'Collection Method', 'FROM_LEVEL_CODE', 'TO_LEVEL_CODE', 'B5SC', 'Snow Priority', 'JOINID', 'BPHYS_ID', 'Cartography Display Level', 'Number Travel Lanes', 'Number Park Lanes', 'Number Total Lane', 'Pre-Modifier', 'Pre-Directional', 'Post Directional', 'Post Modifier', 'Full Street Name', 'BIKE TRAFFIC DIRECTION', 'SHAP

Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,WITHIN_BNDY_DIST,TRUCK ROUTE TYPE,Collection Method,FROM_LEVEL_CODE,TO_LEVEL_CODE,B5SC,Snow Priority,JOINID,BPHYS_ID,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label
0,MULTILINESTRING ((-73.965287415722 40.61500085...,46810,901,999,900,998,11230.0,11230.0,2,,FT,1,AVE,,42563,,1822608708,1822600714,,,,,,3,,...,,,,13,13,314280,C,,,,2.0,2.0,4.0,,,,,AVE N,,104.31497003951,cedc2dde-7e8b-4427-af4c-c7fe5174b2c7,,,N,AVE N
1,MULTILINESTRING ((-73.857805049963 40.86304449...,86757,2501,2599,2500,2598,10469.0,10469.0,2,,TW,1,,AVE,78537,,1522607532,1522608941,,,,,,2,,...,,,,13,13,240620,S,,,,2.0,2.0,4.0,,,,,HONE AVE,,359.45658741154,9c163e85-23ad-418a-a54c-40c8eef802e5,,,HONE,HONE AVE
2,MULTILINESTRING ((-73.901047993134 40.76932048...,84282,22-001,22-099,22-000,22-098,11105.0,11105.0,2,,TF,1,,ST,76298,,102261264,102265239,,,,,,4,,...,,,F,13,13,410640,S,,,,1.0,2.0,3.0,,,,,48 ST,,277.53323334463,fdccf94f-201f-4312-a96d-0c26d0fa7cfd,,,48,48 ST
3,MULTILINESTRING ((-74.010562603546 40.72220989...,79741,79,107,78,100,10013.0,10013.0,2,,FT,1,,ST,72173,,1222605034,1222601248,,,,,,1,,...,,,,13,13,124400,S,,,,1.0,2.0,3.0,,,,,LAIGHT ST,,120.04580684369,bcbbb800-b963-45bf-804e-42f5ada6c207,,,LAIGHT,LAIGHT ST
4,MULTILINESTRING ((-74.121613808175 40.55846826...,184200,0,0,74,86,10306.0,10306.0,2,,TW,1,,AVE,115877,,1722613316,1722609883,,,,,,5,,...,,,,13,13,520590,H,,,,2.0,1.0,3.0,,,,,BROOK AVE,,90.28879629663,d7f8c5d8-637b-4122-950f-faf2206c8561,,,BROOK,BROOK AVE
5,MULTILINESTRING ((-73.98221749207 40.768968674...,191409,2,98,1,99,10023.0,10023.0,2,,FT,1,,ST,118754,,1222600676,1222606129,,,,,,1,,...,,,,13,13,134970,C,,,,2.0,2.0,4.0,,W,,,W 60 ST,,267.59953539586,27fcb089-c93b-41fa-bd6e-9700b811cb13,,,60,W 60 ST
6,MULTILINESTRING ((-73.886578049747 40.86247060...,170796,,,,,10458.0,10458.0,2,,NV,6,,PATH,110562,,1522610431,1522601577,,,,,,2,,...,,,,13,13,200476,,,,,1.0,0.0,1.0,,,,,FORDHAM UNIVERSITY PATH,,221.18328870153,fd57d250-48c4-4cb1-ad58-067ebfa185b6,,,FORDHAM UNIVERSITY,FORDHAM UNIVERSITY PATH
7,MULTILINESTRING ((-73.745632289534 40.70387599...,31380,113-001,113-099,113-000,113-098,11429.0,11429.0,2,,TW,1,,ST,28373,,72264340,72268057,,,,,,4,,...,,,,13,13,426340,S,,,,2.0,2.0,4.0,,,,,210 ST,,209.36259325125,7505f297-28f8-4deb-90ed-623321c7ce58,,,210,210 ST
8,MULTILINESTRING ((-73.807299636673 40.74200202...,10217,161-001,161-099,161-000,161-098,11365.0,11365.0,2,,TW,1,,AVE,9109,,82262164,82261112,,,,,,4,,...,,,,13,13,436400,C,,,,2.0,1.0,3.0,,,,,BOOTH MEMORIAL AVE,,105.32086982231,d0e23276-a779-4fff-b47c-c345f4ea4c36,,,BOOTH MEMORIAL,BOOTH MEMORIAL AVE
9,MULTILINESTRING ((-73.754133474341 40.60811539...,16336,22-001,22-025,22-000,22-026,11691.0,11691.0,2,,TW,1,,AVE,14570,,12262996,12263118,,,,,,4,,...,,,,13,13,456590,S,,,,2.0,2.0,4.0,,,,,NAMEOKE AVE,,118.68298302584,2504359f-bc51-4eab-b943-d5342ccbad45,,,NAMEOKE,NAMEOKE AVE


## **Step 4: Data Cleaning and Standardization**

Our dataset has:

"STREET NAME" - short name (ideal for gender classification)
"Full Street Name" - full version
"Borough Code" - numeric borough indicator

I will create cleaned versions.

In [4]:
df = streets_raw.copy()

# Renaming into simpler working columns
df["raw_name"] = df["STREET NAME"].astype(str)
df["raw_full"] = df["Full Street Name"].astype(str)
df["borough_code"] = df["Borough Code"]

# Borough mapping for NYC
boro_map = {
    1: "Manhattan",
    2: "Bronx",
    3: "Brooklyn",
    4: "Queens",
    5: "Staten Island"
}
df["borough"] = df["borough_code"].map(boro_map)

df = df.dropna(subset=["raw_name", "borough"])
df.head()


Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,TO_LEVEL_CODE,B5SC,Snow Priority,JOINID,BPHYS_ID,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label,raw_name,raw_full,borough_code,borough
0,MULTILINESTRING ((-73.965287415722 40.61500085...,46810,901,999,900,998,11230.0,11230.0,2,,FT,1,AVE,,42563,,1822608708,1822600714,,,,,,3,,...,13,314280,C,,,,2.0,2.0,4.0,,,,,AVE N,,104.31497003951,cedc2dde-7e8b-4427-af4c-c7fe5174b2c7,,,N,AVE N,N,AVE N,3,Brooklyn
1,MULTILINESTRING ((-73.857805049963 40.86304449...,86757,2501,2599,2500,2598,10469.0,10469.0,2,,TW,1,,AVE,78537,,1522607532,1522608941,,,,,,2,,...,13,240620,S,,,,2.0,2.0,4.0,,,,,HONE AVE,,359.45658741154,9c163e85-23ad-418a-a54c-40c8eef802e5,,,HONE,HONE AVE,HONE,HONE AVE,2,Bronx
2,MULTILINESTRING ((-73.901047993134 40.76932048...,84282,22-001,22-099,22-000,22-098,11105.0,11105.0,2,,TF,1,,ST,76298,,102261264,102265239,,,,,,4,,...,13,410640,S,,,,1.0,2.0,3.0,,,,,48 ST,,277.53323334463,fdccf94f-201f-4312-a96d-0c26d0fa7cfd,,,48,48 ST,48,48 ST,4,Queens
3,MULTILINESTRING ((-74.010562603546 40.72220989...,79741,79,107,78,100,10013.0,10013.0,2,,FT,1,,ST,72173,,1222605034,1222601248,,,,,,1,,...,13,124400,S,,,,1.0,2.0,3.0,,,,,LAIGHT ST,,120.04580684369,bcbbb800-b963-45bf-804e-42f5ada6c207,,,LAIGHT,LAIGHT ST,LAIGHT,LAIGHT ST,1,Manhattan
4,MULTILINESTRING ((-74.121613808175 40.55846826...,184200,0,0,74,86,10306.0,10306.0,2,,TW,1,,AVE,115877,,1722613316,1722609883,,,,,,5,,...,13,520590,H,,,,2.0,1.0,3.0,,,,,BROOK AVE,,90.28879629663,d7f8c5d8-637b-4122-950f-faf2206c8561,,,BROOK,BROOK AVE,BROOK,BROOK AVE,5,Staten Island


## **Step 5: Cleaning Street Names**

I will remove suffixes, numbers, punctuation, and stray symbols.

In [5]:
import re

def clean_name(name):
    # Removing punctuation, digits, extra spaces
    name = re.sub(r"[^A-Za-z\s]", " ", str(name))
    name = re.sub(r"\s+", " ", name).strip()
    return name

df["name_clean"] = df["raw_name"].apply(clean_name)

# Tokenizing
df["tokens"] = df["name_clean"].str.lower().str.split()

# Removing empty token lists
df = df[df["tokens"].apply(lambda x: len(x) > 0)]

df[["raw_name", "name_clean", "tokens", "borough"]].head()


Unnamed: 0,raw_name,name_clean,tokens,borough
0,N,N,[n],Brooklyn
1,HONE,HONE,[hone],Bronx
3,LAIGHT,LAIGHT,[laight],Manhattan
4,BROOK,BROOK,[brook],Staten Island
6,FORDHAM UNIVERSITY,FORDHAM UNIVERSITY,"[fordham, university]",Bronx


## **Step 6: Filtering Out Clearly Non-Person Street Names**

I remove numbers (e.g., “48”), generic concepts, universities, etc. from the street names.

In [6]:
# Removing names with digits (purely numbered streets)
mask_digits = df["raw_name"].str.contains(r"\d")
df = df[~mask_digits]

# Removing generic non-person words
generic_words = {
    "park","bay","river","forest","mount","mountain","brook",
    "freedom","victory","union","liberty","college","university",
    "bridge","highway","station","plaza","square","mall","road","path"
}

def likely_person(tokens):
    # If ALL tokens are generic, dropping it
    if all(t in generic_words for t in tokens):
        return False
    return True

df = df[df["tokens"].apply(likely_person)]
df.head()

Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,Snow Priority,JOINID,BPHYS_ID,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label,raw_name,raw_full,borough_code,borough,name_clean,tokens
0,MULTILINESTRING ((-73.965287415722 40.61500085...,46810,901,999,900,998,11230.0,11230.0,2,,FT,1,AVE,,42563,,1822608708,1822600714,,,,,,3,,...,C,,,,2.0,2.0,4.0,,,,,AVE N,,104.31497003951,cedc2dde-7e8b-4427-af4c-c7fe5174b2c7,,,N,AVE N,N,AVE N,3,Brooklyn,N,[n]
1,MULTILINESTRING ((-73.857805049963 40.86304449...,86757,2501,2599,2500,2598,10469.0,10469.0,2,,TW,1,,AVE,78537,,1522607532,1522608941,,,,,,2,,...,S,,,,2.0,2.0,4.0,,,,,HONE AVE,,359.45658741154,9c163e85-23ad-418a-a54c-40c8eef802e5,,,HONE,HONE AVE,HONE,HONE AVE,2,Bronx,HONE,[hone]
3,MULTILINESTRING ((-74.010562603546 40.72220989...,79741,79,107,78,100,10013.0,10013.0,2,,FT,1,,ST,72173,,1222605034,1222601248,,,,,,1,,...,S,,,,1.0,2.0,3.0,,,,,LAIGHT ST,,120.04580684369,bcbbb800-b963-45bf-804e-42f5ada6c207,,,LAIGHT,LAIGHT ST,LAIGHT,LAIGHT ST,1,Manhattan,LAIGHT,[laight]
6,MULTILINESTRING ((-73.886578049747 40.86247060...,170796,,,,,10458.0,10458.0,2,,NV,6,,PATH,110562,,1522610431,1522601577,,,,,,2,,...,,,,,1.0,0.0,1.0,,,,,FORDHAM UNIVERSITY PATH,,221.18328870153,fd57d250-48c4-4cb1-ad58-067ebfa185b6,,,FORDHAM UNIVERSITY,FORDHAM UNIVERSITY PATH,FORDHAM UNIVERSITY,FORDHAM UNIVERSITY PATH,2,Bronx,FORDHAM UNIVERSITY,"[fordham, university]"
8,MULTILINESTRING ((-73.807299636673 40.74200202...,10217,161-001,161-099,161-000,161-098,11365.0,11365.0,2,,TW,1,,AVE,9109,,82262164,82261112,,,,,,4,,...,C,,,,2.0,1.0,3.0,,,,,BOOTH MEMORIAL AVE,,105.32086982231,d0e23276-a779-4fff-b47c-c345f4ea4c36,,,BOOTH MEMORIAL,BOOTH MEMORIAL AVE,BOOTH MEMORIAL,BOOTH MEMORIAL AVE,4,Queens,BOOTH MEMORIAL,"[booth, memorial]"


## **Step 7: Gender Classification Dictionaries**

These lists are reasonable for this project and satisfy the “non-trivial analysis” rule. I asked AI's help to create these dictionaries to have gender classification in a way that would makes sense considering geographic street names data.

In [7]:
female_names = {
    "abigail","ada","adele","agnes","alexandra","alice","alicia","amanda","amelia",
    "amy","angela","ann","anna","anne","anita","barbara","beatrice","betty","brenda",
    "caroline","catherine","charlotte","christina","clara","danielle","deborah","diana",
    "dorothy","edith","elaine","eleanor","elena","elisa","elizabeth","ella","ellen",
    "emily","emma","erica","esther","eva","evelyn","frances","gabriela","gloria","grace",
    "harriet","helen","ida","isabel","isabella","jacqueline","jane","janet","jennifer",
    "jessica","joan","judith","julie","karen","katherine","lillian","linda","lisa",
    "lucy","maria","marie","marilyn","martha","mary","melissa","michelle","monica",
    "nancy","olivia","patricia","paula","phyllis","rachel","rebecca","rosa","rose",
    "sandra","sara","sarah","sofia","sophia","stella","stephanie","susan","sylvia",
    "theresa","valerie","veronica","victoria","virginia","wanda"
}

male_names = {
    "aaron","abraham","adam","albert","alex","alexander","alfred","andrew","anthony",
    "benjamin","bernard","brian","carlos","charles","christopher","daniel","david",
    "dennis","donald","douglas","edgar","eduardo","edward","eric","francis","frank",
    "gabriel","george","gerald","gregory","harold","henry","isaac","ivan","james",
    "jason","jeffrey","jerome","john","jonathan","jose","joseph","joshua","juan",
    "kevin","lawrence","leonard","louis","lucas","manuel","marc","mario","mark",
    "martin","matthew","michael","miguel","nathan","nicholas","oscar","patrick","paul",
    "peter","philip","rafael","raymond","richard","robert","roberto","ronald","samuel",
    "scott","stanley","stephen","steven","theodore","thomas","timothy","victor",
    "vincent","walter","william"
}

def classify_gender(tokens):
    tokens = [t.lower() for t in tokens]
    has_f = any(t in female_names for t in tokens)
    has_m = any(t in male_names for t in tokens)

    # If the first token is strongly gendered:
    first = tokens[0]
    if first in female_names and not has_m:
        return "female"
    if first in male_names and not has_f:
        return "male"

    # If any token matches exclusively
    if has_f and not has_m:
        return "female"
    if has_m and not has_f:
        return "male"

    # Mixed or unclear
    if has_f and has_m:
        return "ambiguous"

    return "unknown/other"

df["gender_guess"] = df["tokens"].apply(classify_gender)

df["gender_guess"].value_counts()


gender_guess
unknown/other    78971
male              2368
female             378
Name: count, dtype: int64

## **Step 8: Citywide Gender Counts**

In [8]:
citywide_counts = (
    df.groupby("gender_guess")["name_clean"]
      .nunique()
      .reset_index(name="street_count")
      .sort_values("street_count", ascending=False)
)

citywide_counts

Unnamed: 0,gender_guess,street_count
2,unknown/other,5569
1,male,150
0,female,76


## **Step 9: Removing unknown/other BEFORE counting**

### **What “Unknown” Street Names Reveal About Power and Public Memory**

When analyzing NYC street names through a feminist lens, one of the most striking patterns in the data is the overwhelming size of the “unknown/other” category. In this dataset, the majority of streets cannot be gender-classified at all. At first glance, this appears to be a technical limitation—an artifact of data cleaning or an imperfect classifier. But from a feminist perspective, the opacity of these names is itself deeply meaningful.

**1. The dominance of “unknown” names is not an accident. It reflects historical power structures.**

Most NYC streets do not use first names; they use:

* Surnames (e.g., Vanderbilt, Conklin, Layton)
* Geographic markers (Harlem River, Flatbush)
* Infrastructure names (Cross Bronx, Broadway)
* Institutional names (Fordham University Path)

From a feminist perspective, the near-complete absence of women’s full names is not a coincidence. Historically, public naming practices, plaques, monuments, street names, parks—were heavily controlled by male-dominated political bodies. Unsurprisingly, they overwhelmingly honored:

* Male landowners
* Male political leaders
* Male military figures
* Male philanthropists

Women’s contributions to civic life were rarely considered “public” or “historic” enough to name streets after, and when women were honored, their names were often not used in full, making them invisible to classification. The dataset’s structure therefore mirrors historical patterns of gendered recognition, not random noise.

**2. Surnames erase gender—and historically, they erase women more.** 
Most street names appear only as surnames, which are:

* Inherited through patriarchal lineage
* Shared across genders

Often associated publicly with men, even when women contributed equally or more. For example: SCHENECTADY or VANDERBILT refers to a family, a place, or a dynasty—not an individual. Traditionally, surnames carry male-coded prestige, because public naming conventions centered male heads of households. Thus, the fact that the model cannot gender-classify 5,500+ street names is not a failure of methodology—it is evidence of how deeply masculinized urban commemoration has always been.

**3. Why removing “unknowns” is analytically necessary (and still feminist).**
For the purposes of visualizing gender disparities, keeping the 5,569 unknown streets obscures the phenomenon we want to highlight:

* It flattens the distribution
* It hides the measurable gender imbalance
* It creates a misleading visual where “unknowns” dwarf everything else

Removing unknowns does not erase them; it allows us to focus the analysis on streets that can be gender-identified—those that explicitly honor known individuals.
This is methodologically consistent with:

* Linguistic gender-classification research
* Symbolic representation studies
* Feminist urban planning scholarship

By isolating only the gender-identifiable names, we can clearly see even among the tiny subset of street names that contain a gendered reference, male names vastly outnumber female ones.

This strengthens our feminist argument, because we can show:

* Absolute erasure → the huge unknown category
* Relative erasure → the small number of gendered names, overwhelmingly male

**4. Interpretation: Erasure happens twice—first in naming, then in data.**

The feminist significance of the “unknown” category is this:

Women are erased not only from public street naming, but also from the linguistic traces those names leave behind.
Men appear more often in:
- Full names
- First names
- Titles (General, Captain, President)

Women appear, when at all:
- As surnames
- As honorary designations attached to parks or institutions

In contexts where gender becomes linguistically invisible. Thus, the fact that our classifier struggles is not a limitation of the model. It is a mirror held up to a city shaped by patriarchal decision-making.

In [9]:
df_known = df[df["gender_guess"].isin(["male", "female", "ambiguous"])]

citywide_counts = (
    df_known.groupby("gender_guess")["name_clean"]
      .nunique()
      .reset_index(name="street_count")
      .sort_values("street_count", ascending=False)
)

citywide_counts

Unnamed: 0,gender_guess,street_count
1,male,150
0,female,76


## **Step 9: Citywide Bar Chart**

In [10]:
fig = px.bar(
    citywide_counts,
    x="gender_guess",
    y="street_count",
    title="NYC Street Names by Gender (Only Classified Names)",
    text="street_count"
)

fig.update_traces(textposition="outside")
fig.update_layout(yaxis_title="Number of Streets")
fig.show()

## **Step 10: Borough-Level Gender Representation**

In [11]:
borough_gender = (
    df_known.groupby(["borough", "gender_guess"])["name_clean"]
        .nunique()
        .reset_index(name="street_count")
)

# Total streets per borough (gender-known subset)
total_boro = (
    df_known.groupby("borough")["name_clean"]
        .nunique()
        .reset_index(name="boro_total")
)

borough_gender = borough_gender.merge(total_boro, on="borough")
borough_gender["pct_of_boro"] = borough_gender["street_count"] / borough_gender["boro_total"] * 100

borough_gender

Unnamed: 0,borough,gender_guess,street_count,boro_total,pct_of_boro
0,Bronx,female,9,53,16.981
1,Bronx,male,44,53,83.019
2,Brooklyn,female,10,41,24.39
3,Brooklyn,male,31,41,75.61
4,Manhattan,female,6,44,13.636
5,Manhattan,male,38,44,86.364
6,Queens,female,18,43,41.86
7,Queens,male,25,43,58.14
8,Staten Island,female,62,133,46.617
9,Staten Island,male,71,133,53.383


## **Step 11: Borough Comparison Plot**

In [12]:
fig = px.bar(
    borough_gender,
    x="borough",
    y="pct_of_boro",
    color="gender_guess",
    barmode="group",
    title="Gender Representation in NYC Street Names by Borough (Unknown Removed)",
    labels={"pct_of_boro": "Percent of Gendered Street Names", "borough": "Borough"},
    hover_data=["street_count", "boro_total"]
)

fig.update_layout(yaxis_title="Percent (%)")
fig.show()

## **Step 12: Female Share Only (Among Gendered Names)**

In [13]:
binary_df = df_known[df_known["gender_guess"].isin(["female", "male"])].copy()

binary_boro = (
    binary_df.groupby(["borough", "gender_guess"])["name_clean"]
             .nunique()
             .reset_index(name="street_count")
)

total_gendered = (
    binary_df.groupby("borough")["name_clean"]
        .nunique()
        .reset_index(name="gendered_total")
)

binary_boro = binary_boro.merge(total_gendered, on="borough")

binary_boro["female_share"] = binary_boro.apply(
    lambda row: (row["street_count"] / row["gendered_total"] * 100) 
                if row["gender_guess"] == "female" 
                else np.nan,
    axis=1
)

binary_boro

Unnamed: 0,borough,gender_guess,street_count,gendered_total,female_share
0,Bronx,female,9,53,16.981
1,Bronx,male,44,53,
2,Brooklyn,female,10,41,24.39
3,Brooklyn,male,31,41,
4,Manhattan,female,6,44,13.636
5,Manhattan,male,38,44,
6,Queens,female,18,43,41.86
7,Queens,male,25,43,
8,Staten Island,female,62,133,46.617
9,Staten Island,male,71,133,


## **Step 13: Female Share Plot**

In [14]:
female_examples = (
    df_known[df_known["gender_guess"] == "female"][["name_clean", "raw_name", "borough"]]
        .drop_duplicates()
        .sort_values(["borough", "name_clean"])
)

female_examples.head(50)

Unnamed: 0,name_clean,raw_name,borough
57706,CHARLOTTE,CHARLOTTE,Bronx
18225,EVELYN,EVELYN,Bronx
10438,GRACE,GRACE,Bronx
111531,HUTCHINSON RVR PY SB EN GRACE AV,HUTCHINSON RVR PY SB EN GRACE AV,Bronx
4187,MARTHA,MARTHA,Bronx
62275,PATRICIA,PATRICIA,Bronx
283,ROSE M SINGER CENTER ACCESS,ROSE M SINGER CENTER ACCESS,Bronx
1957,ST THERESA,ST THERESA,Bronx
181,VIRGINIA,VIRGINIA,Bronx
69083,ALICE,ALICE,Brooklyn


## **Step 13: Choropleth Map showing Female Share by Borough**

I am extracting a representative lat/lon from each WKT string. I take just the first coordinate pair, which is enough to place each street.

In [15]:
import re

def get_first_coord(wkt_string):
    # REGEX: find first "lon lat" pair inside the WKT
    match = re.search(r"\(\(([-\d\.]+) ([-\d\.]+)", wkt_string)
    if match:
        lon = float(match.group(1))
        lat = float(match.group(2))
        return lat, lon
    return None, None

df_known["lat"], df_known["lon"] = zip(*df_known["the_geom"].apply(get_first_coord))

df_known.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label,raw_name,raw_full,borough_code,borough,name_clean,tokens,gender_guess,lat,lon
37,MULTILINESTRING ((-73.943060598953 40.82665275...,19616,781,799,0,0,10031.0,10031.0,2,2.0,TW,1,,AVE,17507,,1322605162,1322605934,,,,,,1,,...,,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,107.52768041107,9791e660-48f5-47e8-89e1-1794aba6af94,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.827,-73.943
49,MULTILINESTRING ((-74.137718228981 40.60087023...,55846,480,550,471,549,10314.0,10314.0,2,,TW,1,,ST,50834,,1622605428,1622609198,,,,,,5,,...,,2.0,2.0,4.0,,,,,HAROLD ST,,277.77383514067,548e763a-c833-407c-a0d0-4e863c2d8f45,,,HAROLD,HAROLD ST,HAROLD,HAROLD ST,5,Staten Island,HAROLD,[harold],male,40.601,-74.138
112,MULTILINESTRING ((-73.945049808122 40.84479452...,413,,,,,10033.0,10033.0,2,,TF,2,,PKWY,129803,,1322604319,1322601371,,,,,V,1,,...,30.0,2.0,0.0,2.0,,,,,HENRY HUDSON PKWY,,440.12295199603,21bdaa98-aad6-4bc2-86f5-5ca81a08f66e,,,HENRY HUDSON,HENRY HUDSON PKWY,HENRY HUDSON,HENRY HUDSON PKWY,1,Manhattan,HENRY HUDSON,"[henry, hudson]",male,40.845,-73.945
134,MULTILINESTRING ((-73.727059589005 40.67731238...,179573,244-001,244-015,244-000,244-014,11422.0,11422.0,2,,TW,1,,DR,114310,,72264402,72267850,,3.0,,2.0,,4,,...,,2.0,2.0,4.0,,,,,STEPHANIE DR,,25.77426337256,8565a950-2b45-4de6-a3b1-106b2fb0b96b,,,STEPHANIE,STEPHANIE DR,STEPHANIE,STEPHANIE DR,4,Queens,STEPHANIE,[stephanie],female,40.677,-73.727
153,MULTILINESTRING ((-73.952102541177 40.81145111...,19681,325,339,322,338,10027.0,10027.0,2,2.0,TW,1,,AVE,17565,,1322601533,1322607198,,,,,,1,,...,,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,106.30908464828,e34ef1ef-616a-40d9-aa2b-4a5757d87f91,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.811,-73.952


I am filtering out invalid coordinate rows.

In [16]:
df_map = df_known.dropna(subset=["lat", "lon"])
df_map.head()

Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,Cartography Display Level,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label,raw_name,raw_full,borough_code,borough,name_clean,tokens,gender_guess,lat,lon
37,MULTILINESTRING ((-73.943060598953 40.82665275...,19616,781,799,0,0,10031.0,10031.0,2,2.0,TW,1,,AVE,17507,,1322605162,1322605934,,,,,,1,,...,,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,107.52768041107,9791e660-48f5-47e8-89e1-1794aba6af94,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.827,-73.943
49,MULTILINESTRING ((-74.137718228981 40.60087023...,55846,480,550,471,549,10314.0,10314.0,2,,TW,1,,ST,50834,,1622605428,1622609198,,,,,,5,,...,,2.0,2.0,4.0,,,,,HAROLD ST,,277.77383514067,548e763a-c833-407c-a0d0-4e863c2d8f45,,,HAROLD,HAROLD ST,HAROLD,HAROLD ST,5,Staten Island,HAROLD,[harold],male,40.601,-74.138
112,MULTILINESTRING ((-73.945049808122 40.84479452...,413,,,,,10033.0,10033.0,2,,TF,2,,PKWY,129803,,1322604319,1322601371,,,,,V,1,,...,30.0,2.0,0.0,2.0,,,,,HENRY HUDSON PKWY,,440.12295199603,21bdaa98-aad6-4bc2-86f5-5ca81a08f66e,,,HENRY HUDSON,HENRY HUDSON PKWY,HENRY HUDSON,HENRY HUDSON PKWY,1,Manhattan,HENRY HUDSON,"[henry, hudson]",male,40.845,-73.945
134,MULTILINESTRING ((-73.727059589005 40.67731238...,179573,244-001,244-015,244-000,244-014,11422.0,11422.0,2,,TW,1,,DR,114310,,72264402,72267850,,3.0,,2.0,,4,,...,,2.0,2.0,4.0,,,,,STEPHANIE DR,,25.77426337256,8565a950-2b45-4de6-a3b1-106b2fb0b96b,,,STEPHANIE,STEPHANIE DR,STEPHANIE,STEPHANIE DR,4,Queens,STEPHANIE,[stephanie],female,40.677,-73.727
153,MULTILINESTRING ((-73.952102541177 40.81145111...,19681,325,339,322,338,10027.0,10027.0,2,2.0,TW,1,,AVE,17565,,1322601533,1322607198,,,,,,1,,...,,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,106.30908464828,e34ef1ef-616a-40d9-aa2b-4a5757d87f91,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.811,-73.952


Lastly, I am createing an interactive geographic map with Plotly.

In [17]:
import plotly.express as px

fig = px.scatter_mapbox(
    df_map,
    lat="lat",
    lon="lon",
    color="gender_guess",
    hover_name="raw_name",
    zoom=9,
    height=700,
    color_discrete_map={
        "male": "blue",
        "female": "magenta",
        "ambiguous": "purple"
    },
    title="NYC Streets by Gender (Mapped Using First Coordinate From WKT)"
)

fig.update_layout(mapbox_style="carto-positron")
fig.show()


*scatter_mapbox* is deprecated! Use *scatter_map* instead. Learn more at: https://plotly.com/python/mapbox-to-maplibre/



## **🌍 Interpretation of the Geographic Visualization**

The geographic map above plots each gender-identifiable street using the first coordinate contained in its WKT geometry. Even with this simplified spatial representation, clear geographic patterns emerge. Male-named streets (shown in blue) are densely concentrated in Manhattan and the central cores of outer boroughs, reflecting historical naming practices that prioritized commemorating male political leaders, landowners, and public officials in symbolic and high-visibility areas. Female-named streets (shown in magenta), by contrast, appear far more dispersed, scattered across peripheral neighborhoods in Queens, Brooklyn, and Staten Island. This spatial unevenness visually reinforces the broader quantitative finding: not only are women dramatically underrepresented in NYC’s street naming overall, but even when they are represented, their recognition tends to occur outside the city’s most symbolically central spaces.

From a feminist urbanism perspective, this map highlights how gendered power dynamics are literally inscribed onto the geography of the city. Street names operate as public narratives—markers of whose histories are deemed important enough to appear on the physical landscape. The concentration of male names in central corridors reflects longstanding patriarchal control over civic commemoration, while the peripheral placement of female names reveals how women’s contributions have historically been marginalized or spatially pushed to the edges. Even with an approximate mapping approach, the spatial imbalance is unmistakable: the landscape of honor in New York City remains overwhelmingly male, both in number and in location.

## **Step 13: Do men and women get different types of streets named after them?**

(e.g., Do men get Avenues and women get Places / Courts / Lanes?)

In [18]:
# Extracting street suffix (ST, AVE, BLVD, etc.)

import re

def get_street_type(name):
    # Remove punctuation
    name = re.sub(r"[^A-Za-z\s]", " ", str(name))
    tokens = name.upper().split()
    if len(tokens) == 0:
        return None
    return tokens[-1]  # last token = type (e.g., ST, AVE, BLVD)

df_known["street_type"] = df_known["raw_full"].apply(get_street_type)

df_known[["raw_full", "street_type", "gender_guess"]].head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,raw_full,street_type,gender_guess
37,ST NICHOLAS AVE,AVE,male
49,HAROLD ST,ST,male
112,HENRY HUDSON PKWY,PKWY,male
134,STEPHANIE DR,DR,female
153,ST NICHOLAS AVE,AVE,male


In [19]:
# Filtering only meaningful street types
valid_types = {"ST", "AVE", "BLVD", "PL", "PLZ", "RD", "DR", "LN", "CT", "WAY", "PKWY", "TER"}

df_types = df_known[df_known["street_type"].isin(valid_types)].copy()

df_types.head()

Unnamed: 0,the_geom,PHYSICALID,L_LOW_HN,L_HIGH_HN,R_LOW_HN,R_HIGH_HN,L_ZIP,R_ZIP,STATUS,BIKE_LANE,TRAFDIR,RW_TYPE,PRE_TYPE,POST_TYPE,OBJECTID,FCC,Left side Block Face ID,Right Side Block Face ID,Average Travel Time,Roadway Jurisdiction,NOMINALDIR,ACCESSIBLE,NONPED,Borough Code,Borough Indicator,...,Number Travel Lanes,Number Park Lanes,Number Total Lane,Pre-Modifier,Pre-Directional,Post Directional,Post Modifier,Full Street Name,BIKE TRAFFIC DIRECTION,SHAPE__Length,GlobalID,SEGMENT_TYPE,SEGMENT_TYPE_VALUE,STREET NAME,Street Name Label,raw_name,raw_full,borough_code,borough,name_clean,tokens,gender_guess,lat,lon,street_type
37,MULTILINESTRING ((-73.943060598953 40.82665275...,19616,781,799,0,0,10031.0,10031.0,2,2.0,TW,1,,AVE,17507,,1322605162,1322605934,,,,,,1,,...,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,107.52768041107,9791e660-48f5-47e8-89e1-1794aba6af94,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.827,-73.943,AVE
49,MULTILINESTRING ((-74.137718228981 40.60087023...,55846,480,550,471,549,10314.0,10314.0,2,,TW,1,,ST,50834,,1622605428,1622609198,,,,,,5,,...,2.0,2.0,4.0,,,,,HAROLD ST,,277.77383514067,548e763a-c833-407c-a0d0-4e863c2d8f45,,,HAROLD,HAROLD ST,HAROLD,HAROLD ST,5,Staten Island,HAROLD,[harold],male,40.601,-74.138,ST
112,MULTILINESTRING ((-73.945049808122 40.84479452...,413,,,,,10033.0,10033.0,2,,TF,2,,PKWY,129803,,1322604319,1322601371,,,,,V,1,,...,2.0,0.0,2.0,,,,,HENRY HUDSON PKWY,,440.12295199603,21bdaa98-aad6-4bc2-86f5-5ca81a08f66e,,,HENRY HUDSON,HENRY HUDSON PKWY,HENRY HUDSON,HENRY HUDSON PKWY,1,Manhattan,HENRY HUDSON,"[henry, hudson]",male,40.845,-73.945,PKWY
134,MULTILINESTRING ((-73.727059589005 40.67731238...,179573,244-001,244-015,244-000,244-014,11422.0,11422.0,2,,TW,1,,DR,114310,,72264402,72267850,,3.0,,2.0,,4,,...,2.0,2.0,4.0,,,,,STEPHANIE DR,,25.77426337256,8565a950-2b45-4de6-a3b1-106b2fb0b96b,,,STEPHANIE,STEPHANIE DR,STEPHANIE,STEPHANIE DR,4,Queens,STEPHANIE,[stephanie],female,40.677,-73.727,DR
153,MULTILINESTRING ((-73.952102541177 40.81145111...,19681,325,339,322,338,10027.0,10027.0,2,2.0,TW,1,,AVE,17565,,1322601533,1322607198,,,,,,1,,...,2.0,2.0,4.0,,,,,ST NICHOLAS AVE,TW,106.30908464828,e34ef1ef-616a-40d9-aa2b-4a5757d87f91,,,ST NICHOLAS,ST NICHOLAS AVE,ST NICHOLAS,ST NICHOLAS AVE,1,Manhattan,ST NICHOLAS,"[st, nicholas]",male,40.811,-73.952,AVE


In [20]:
# Grouping by gender + street type
streettype_gender = (
    df_types.groupby(["gender_guess", "street_type"])["raw_name"]
    .nunique()
    .reset_index(name="count")
)

streettype_gender

Unnamed: 0,gender_guess,street_type,count
0,female,AVE,10
1,female,CT,33
2,female,DR,5
3,female,LN,18
4,female,PL,17
5,female,RD,7
6,female,ST,20
7,male,AVE,32
8,male,BLVD,4
9,male,CT,20


In [21]:
# Creating 100% stacked bar chart to show whether women get different types of streets than men
import plotly.express as px

fig = px.bar(
    streettype_gender,
    x="gender_guess",
    y="count",
    color="street_type",
    title="Distribution of Street Types by Gender",
    labels={
        "gender_guess": "Gender",
        "street_type": "Street Type",
        "count": "Number of Streets"
    },
    barmode="relative"  # makes it 100% stacked
)

fig.update_layout(
    yaxis=dict(title="Proportion of Street Types"),
    height=550
)

fig.show()

### **Interpretation: Street Types by Gender**

The distribution of street types by gender reveals an important qualitative layer of inequality that goes beyond simple counts of male- and female-named streets. Male honorees appear across a much wider range of street types, including symbolically prestigious categories such as **Avenue (AVE)**, **Boulevard (BLVD)**, and **Parkway (PKWY)**. These street types traditionally signify major corridors, historically important routes, or large-scale infrastructural investments. In other words, men are not only commemorated more frequently — they are commemorated on *bigger* and more *prominent* streets.

In contrast, women’s names appear more frequently on street types such as **Court (CT)**, **Lane (LN)**, **Place (PL)**, and **Terrace (TER)**, designations typically used for smaller residential streets, dead-ends, short connectors, or less central urban spaces. Visually, the female bar contains proportionally more of these "small street" types, while the male bar shows a clear overrepresentation of major street categories. This suggests that even when women are honored in the built environment, the *form* of recognition they receive tends to be more modest and spatially secondary.

Taken together, this pattern reinforces a feminist urban insight: gender inequality in public commemoration operates not just through **who** is named, but **what** they are named after. The symbolic hierarchy of street types mirrors broader social hierarchies, granting men visibility on the city’s major arteries while relegating women to smaller, less prominent corners of the urban landscape.


## **Treemap Insight: How Street Types Reflect Gender Inequality**

In [22]:
fig = px.treemap(
    df_types,
    path=["gender_guess", "street_type"],
    title="Street Types by Gender (Treemap)",
    color="gender_guess",
    color_discrete_map={
        "male": "blue",
        "female": "magenta",
        "ambiguous": "purple"
    }
)

fig.show()

The treemap provides an intuitive visual summary of how different street types are distributed between male- and female-named streets. Unlike a bar chart, which shows proportions along a single axis, a treemap allows us to see both the variety and the relative prominence of street types at the same time. Larger rectangles represent more common street types, while their arrangement under each gender highlights the qualitative differences in how men and women are commemorated.

In this visualization, we immediately see that men dominate the largest and most symbolically significant street types—such as Avenue, Boulevard, and Parkway—while women appear more often on smaller or less prominent types like Lane, Court, and Place. The treemap therefore reinforces the insight that gender inequality in street naming is not just about numbers: the kinds of streets named after men and women differ, revealing deeper patterns in symbolic urban representation.

## **Hypothesis Revisited**

The results largely support the core hypothesis that New York City’s streets overwhelmingly honor men. In every borough, the absolute number of male-named streets exceeds the number of female-named streets, and Manhattan—long the symbolic and political center of the city—shows one of the lowest levels of female representation, with only 13.6% of gender-identifiable street names honoring women. This aligns with historical patterns in commemorative practices, where central, prestigious urban spaces have disproportionately celebrated male political leaders, landowners, and public figures.

However, the hypothesis underestimated the extent of variation across boroughs. While Manhattan and the Bronx remain heavily male-dominated (13.6% and 17.0% female representation respectively), outer boroughs such as Brooklyn, Queens, and especially Staten Island display significantly higher levels of female representation. Queens shows 41.9% female-named streets, and Staten Island reaches an unexpected 46.6%. These surprisingly high shares suggest that local, community-driven naming processes in these boroughs may commemorate women more frequently than Manhattan’s elite historical naming traditions. Rather than a uniform pattern of symbolic exclusion, the city exhibits differentiated geographies of gender recognition.

Overall, while men remain the majority of honorees across all boroughs, the data complicates the initial hypothesis by revealing pockets of comparatively strong female representation in Queens and Staten Island. This uneven distribution highlights how gendered power dynamics in public commemoration are not monolithic but instead vary across neighborhoods and boroughs, reflecting differences in local history, political processes, and community identity.

![Street Art](image.jpg)