# Project 2 
# Public Wi-Fi Access Across New York City

When I first moved to New York City, one of the hardest adjustments was staying connected on the go without burning through mobile data. The city advertises thousands of public Wi-Fi hotspots, from LinkNYC kiosks to networks in parks and public housing. But are these hotspots spread evenly across the five boroughs, or do some areas have much better access than others?

In this project, I use two NYC Open Data datasets to explore **how public Wi-Fi access varies by borough once we adjust for population size**. Instead of just counting hotspots, I calculate **hotspots per 10,000 residents** to make a fair comparison between larger and smaller boroughs.

---

## Research Question and Hypothesis

**Research question**

> After adjusting for population, which NYC borough has the best access to public Wi-Fi?  
> Does the mix of Wi-Fi types (for example, LinkNYC vs other providers) look similar across boroughs?

**Hypotheses**

Public Wi-Fi access in New York City is not evenly distributed across the five boroughs. Since Manhattan is the economic and tourist center of the city, I expect it to have more public Wi-Fi hotspots per 10,000 residents than the outer boroughs. 
I also expect the mix of hotspot types (for example, LinkNYC kiosks versus other providers) to differ by borough, with Manhattan relying more heavily on LinkNYC.


## Data Sources

I work with two datasets from NYC Open Data:

1. **NYC Wi-Fi Hotspot Locations**  
https://data.cityofnewyork.us/City-Government/NYC-Wi-Fi-Hotspot-Locations/yjub-udmw/about_data
   - Source: NYC Office of Technology and Innovation :contentReference[oaicite:0]{index=0}  
   - This dataset lists individual public Wi-Fi hotspots across the city, including their **borough**, **location**, and a field indicating the **type or provider** (for example, LinkNYC vs other networks).  
   - I use it to count how many hotspots of each type exist in each borough.

2. **New York City Population by Borough, 1950–2040**  
https://data.cityofnewyork.us/City-Government/New-York-City-Population-by-Borough-1950-2040/xywu-7bv9/about_data
   - Source: NYC Department of City Planning :contentReference[oaicite:1]{index=1}  
   - This dataset provides total population figures for each borough by year, including decennial census counts and projections.  
   - I use one recent year of population (for example, 2020) to convert raw hotspot counts into **hotspots per 10,000 residents**.

Both raw datasets are downloaded as CSV files and loaded into pandas in this notebook. All cleaning, merging, and visualization work happens in Python.

---

## Plan

To answer the research question, I will:

1. **Load and clean** the Wi-Fi hotspot dataset, keeping only rows with a valid borough and simplifying the Wi-Fi **type/provider** into a few categories.
2. **Aggregate** the Wi-Fi data to get the number of hotspots in each borough, broken down by type.
3. **Select** a single recent year of borough populations and join it to the Wi-Fi counts.
4. **Compute a derived measure** – the number of hotspots per 10,000 residents for each (borough, Wi-Fi type) combination.
5. Create **one main visualization**, a grouped bar chart showing **hotspots per 10,000 residents by borough and Wi-Fi type**, and discuss whether the results match my hypotheses.

In the next section, I start by loading the two CSV files into pandas and taking a quick look at their structure.


## Importing the datasets

I start by loading the two datasets I’ll be using in this project: the NYC Wi-Fi hotspot locations dataset and the NYC population-by-borough dataset. All analysis, cleaning, merging, and visualization will be done in Python using pandas.


In [29]:
import pandas as pd

# Load NYC Wi-Fi hotspots dataset
wifi = pd.read_csv("NYC_Wi-Fi_Hotspot.csv")

# Load NYC population-by-borough dataset
population = pd.read_csv("New_York_City_Population.csv")

# Display the first few rows of each to understand their structure
wifi.head(), population.head()


(   OBJECTID  Borough          Type           Provider                    Name  \
 0     10604        4  Limited Free           SPECTRUM       Baisley Pond Park   
 1     10555        4  Limited Free           SPECTRUM            Kissena Park   
 2     12370        3          Free   Transit Wireless            Grand St (L)   
 3      9893        3          Free  Downtown Brooklyn                     NaN   
 4     10169        1          Free   Transit Wireless  Lexington Av-63 St (F)   
 
                  Location   Latitude  Longitude                X  \
 0          Park Perimeter  40.674860 -73.784120  1,044,131.89696   
 1          Park Perimeter  40.747560 -73.818150  1,034,637.51076   
 2            Grand St (L)  40.711926 -73.940670  1,000,698.12752   
 3           125 Court St.  40.689985 -73.991995   986,469.966349   
 4  Lexington Av-63 St (F)  40.764630 -73.966115   993,636.552081   
 
                 Y  ... Neighborhood Tabulation Area (NTA) Council Distrcit  \
 0  185,219

## Preparing and cleaning the data

Before analyzing Wi-Fi access across boroughs, I first need to clean both datasets so they can be merged. 
For the Wi-Fi hotspot dataset, I keep only rows with a valid borough and simplify the Wi-Fi provider/type field so it’s easier to compare across categories. 

For the population dataset, I select one recent year of population estimates and format the borough names so they match the hotspot dataset. Once both datasets are cleaned, I aggregate the number of hotspots by borough and Wi-Fi type.


In [18]:
# Use Borough Name, not the numeric code
wifi_clean["borough_clean"] = wifi_clean["Borough Name"].str.strip().str.upper()

# Summarize Wi-Fi hotspots by borough and type
wifi_grouped = (
    wifi_clean
    .groupby(["borough_clean", "HOTSPOT_TYPE"])
    .size()
    .reset_index(name="hotspot_count")
)

wifi_grouped.head()


Unnamed: 0,borough_clean,HOTSPOT_TYPE,hotspot_count
0,BRONX,Free,196
1,BRONX,Limited Free,120
2,BROOKLYN,Free,540
3,BROOKLYN,Limited Free,160
4,MANHATTAN,Free,1573


In [19]:
# Keep only total population rows, drop NYC TOTAL
pop_total = population[
    (population["Age Group"] == "Total Population") &
    (population["Borough"] != "NYC Total")
].copy()

# Make a clean borough name column to match wifi_grouped
pop_total["borough_clean"] = pop_total["Borough"].str.strip().str.upper()

# Pick 2020 population and make it numeric
pop_total["population_2020"] = (
    pop_total["2020"]
    .astype(str)
    .str.replace(",", "", regex=False)
    .astype(int)
)

pop_total = pop_total[["borough_clean", "population_2020"]]
pop_total


Unnamed: 0,borough_clean,population_2020
1,BRONX,1446788
2,BROOKLYN,2648452
3,MANHATTAN,1638281
4,QUEENS,2330295
5,STATEN ISLAND,487155


In [20]:
merged = wifi_grouped.merge(pop_total, on="borough_clean", how="left")

# Check merge worked
merged.head()


Unnamed: 0,borough_clean,HOTSPOT_TYPE,hotspot_count,population_2020
0,BRONX,Free,196,1446788
1,BRONX,Limited Free,120,1446788
2,BROOKLYN,Free,540,2648452
3,BROOKLYN,Limited Free,160,2648452
4,MANHATTAN,Free,1573,1638281


In [21]:
# Keep only total population rows, drop NYC TOTAL
pop_total = population[
    (population["Age Group"] == "Total Population") &
    (population["Borough"] != "NYC Total")
].copy()

# Make a clean borough name column to match wifi_grouped
pop_total["borough_clean"] = pop_total["Borough"].str.strip().str.upper()

# Pick 2020 population and make it numeric
pop_total["population_2020"] = (
    pop_total["2020"]
    .astype(str)
    .str.replace(",", "", regex=False)
    .astype(int)
)

pop_total = pop_total[["borough_clean", "population_2020"]]
pop_total


Unnamed: 0,borough_clean,population_2020
1,BRONX,1446788
2,BROOKLYN,2648452
3,MANHATTAN,1638281
4,QUEENS,2330295
5,STATEN ISLAND,487155


In [22]:
merged = wifi_grouped.merge(pop_total, on="borough_clean", how="left")

# Check merge worked
merged.head()


Unnamed: 0,borough_clean,HOTSPOT_TYPE,hotspot_count,population_2020
0,BRONX,Free,196,1446788
1,BRONX,Limited Free,120,1446788
2,BROOKLYN,Free,540,2648452
3,BROOKLYN,Limited Free,160,2648452
4,MANHATTAN,Free,1573,1638281


In [23]:
merged["hotspots_per_10k"] = (
    merged["hotspot_count"] / merged["population_2020"] * 10000
)

merged[["borough_clean", "HOTSPOT_TYPE", "hotspots_per_10k"]].head()


Unnamed: 0,borough_clean,HOTSPOT_TYPE,hotspots_per_10k
0,BRONX,Free,1.354725
1,BRONX,Limited Free,0.829424
2,BROOKLYN,Free,2.038927
3,BROOKLYN,Limited Free,0.604126
4,MANHATTAN,Free,9.601527


In [26]:
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook_connected"

fig = px.histogram(
    merged,
    x="borough_clean",
    y="hotspots_per_10k",
    color="HOTSPOT_TYPE",
    barmode="group",
    labels={
        "borough_clean": "Borough",
        "hotspots_per_10k": "Wi-Fi hotspots per 10,000 residents",
        "HOTSPOT_TYPE": "Wi-Fi type",
    },
    title="Public Wi-Fi Hotspots per 10,000 Residents by Borough and Type",
    height=400,
)

fig.show()


## Results and Conclusion

The grouped bar chart shows a clear pattern:

- **Manhattan** has by far the highest number of free public Wi-Fi hotspots per 10,000 residents. Its bar for “Free” Wi-Fi is several times taller than the same bar for the other boroughs.
- **Brooklyn and Queens** have moderate levels of free Wi-Fi access per resident, higher than the Bronx and Staten Island but still below Manhattan.
- **The Bronx and Staten Island** have the lowest per-capita Wi-Fi access, especially for fully free hotspots. In these boroughs, residents have fewer public Wi-Fi options relative to the size of the population.
- Across all boroughs, “Free” hotspots dominate, while “Limited Free” and “Partner Site” hotspots make up a much smaller share. However, the relative height of these smaller bars varies by borough, suggesting some differences in the mix of Wi-Fi types.

Overall, the data supports my initial idea that public Wi-Fi access is not evenly distributed across New York City. Manhattan stands out as having much better per-capita access, while the Bronx and Staten Island lag behind. This pattern raises questions about digital equity: people in some boroughs have many more opportunities to connect to free public Wi-Fi than others.

These results are descriptive rather than causal. The analysis is limited to one population year and to public Wi-Fi hotspots listed in the NYC dataset, and it does not include private home or mobile internet access. It provides a simple snapshot of how public Wi-Fi infrastructure is distributed across the five boroughs.
