This is where we will do our final evaluation of combining all our metrics!

idea:
a pool is worth 10 points
a shelter is worth 5 points
and a playground equiptment is worth 1 point
the neighborhood with highest total points wins

Introduction:

We decided to find the best neighborhood to live in in the summer in Pittsburgh!
Obviously, this is very subjective. 
So the metrics we used to measure this were:

Public pools located in which neighborhoods?
Where is the most playground equiptment located?
Where are the most shelters located?

![Alt Text](shelter_image.jpg)

In case you are wondering (because I had to remind myself) the above image is a shelter in a park in Allegheny County!

In [3]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import geopandas

Above, we run our imports to analyze the data.

Below we begin data analysis by combining our metrics.
First we read in each dataset.

One issue we had is that the original WPRDC shelters dataset had locations as parks and street names, but not neighborhoods. so the dataset we use here, we added a neighborhood column
to the shelters datasheet..

In [4]:
#NOTE: each of these datasets has a column called "neighborhood"
playground_equip = pd.read_csv("Playground_Equiptment.csv")  
pools = pd.read_csv("Pools.csv")
shel = pd.read_csv("shelters_with_neighborhood_locations.csv")

Below, we group each entity (playground equiptment, pool, or shelter).

We used these pandas methods:
.groupby() sorts by neighborhood,
.size() counts how many rows there are,
.rename() gives the name of the resulting series.

In [5]:
pge_counts = (playground_equip
             .groupby("neighborhood")
             .size()
             .rename("play_count"))
#This groups playground equiptment entries by neighborhood and counts how many rows each neighborhood has
#meaning it counts how many playground equiptment there are in each neighborhood

print(f"Playground equipment count:\n{pge_counts} \n")


# 2b. pools
pool_counts = (pools
               .groupby("neighborhood")
               .size()
               .rename("pool_count"))

print(f"pool count:\n{pool_counts} \n")

# 2c. shelters 
shel_counts = (shel
               .groupby("neighborhood")
               .size()
               .rename("shelter_count"))
print(f"shelter count:\n{shel_counts} \n")

Playground equipment count:
neighborhood
Allegheny Center       15
Allentown               4
Banksville              6
Bedford Dwellings       3
Beechview              21
                       ..
Upper Lawrenceville     5
West End                3
West Oakland            1
Westwood                2
Windgap                 5
Name: play_count, Length: 68, dtype: int64 

pool count:
neighborhood
Allegheny Center            1
Banksville                  1
Bedford Dwellings           1
Beechview                   1
Beltzhoover                 1
Bloomfield                  1
Brighton Heights            1
Brookline                   1
Carrick                     1
East Hills                  1
Greenfield                  1
Hazelwood                   1
Highland Park               2
Homewood South              1
Lincoln Place               1
Lincoln-Lemington-Belmar    1
Mount Washington            1
Perry North                 1
Polish Hill                 1
Shadyside                   1
She

Now that we have our counts, how do we combine them?

The code below creates a list of **all** the neighborhoods that appear in any of our three datasets. 
It uses the `.index.union()` method to combine the index values from each count Series. 
Then, it initializes an empty DataFrame with this full set of neighborhoods as the index, preparing for future merging or data alignment.


In [6]:

all_neighborhoods = pge_counts.index.union(pool_counts.index).union(shel_counts.index)
df = pd.DataFrame(index=all_neighborhoods)

print(df)

Empty DataFrame
Columns: []
Index: [Allegheny Center, Allentown, Banksville, Bedford Dwellings, Beechview, Beltzhoover, Bloomfield, Bluff, Bon Air, Brighton Heights, Brookline, Carrick, Central Lawrenceville, Central Northside, Central Oakland, Crafton Heights, Crawford-Roberts, Duquesne Heights, East Allegheny, East Carnegie, East Hills, East Liberty, Elliott, Esplen, Fairywood, Fineview, Fox Chapel, Garfield, Greenfield, Hays, Hazelwood, Highland Park, Homewood North, Homewood South, Homewood West, Larimer, Lincoln Place, Lincoln-Lemington-Belmar, Lincoln–Lemington–Belmar, Lower Lawrenceville, Manchester, Marshall-Shadeland, Middle Hill, Morningside, Mount Washington, New Homestead, Oakland, Oakwood, Perry North, Perry South, Point Breeze, Point Breeze North, Polish Hill, Regent Square, Shadyside, Sheraden, South Oakland, South Side Flats, South Side Slopes, Spring Garden, Spring Hill-City View, Squirrel Hill, Squirrel Hill North, Squirrel Hill South, Stanton Heights, Strip District,

As we can see, we now have a list of all overlapping neighborhoods in the datasets. 

In the code below, we add our count values into the dataframe.

For example, the first line of code creates a new column in our dataframe called "play_count".
It takes pge_counts (count of playground equiptment per neighborhood) and reindexes it to match the full list of neighborhoods (all_neighborhoods).


In [8]:
 # 4. Add the counts into the DataFrame
df["play_count"] = pge_counts.reindex(all_neighborhoods, fill_value=0)
df["pool_count"] = pool_counts.reindex(all_neighborhoods, fill_value=0)
df["shelter_count"] = shel_counts.reindex(all_neighborhoods, fill_value=0)

print(df["play_count"])

neighborhood
Allegheny Center       15
Allentown               4
Banksville              6
Bedford Dwellings       3
Beechview              21
                       ..
Upper Lawrenceville     5
West End                3
West Oakland            1
Westwood                2
Windgap                 5
Name: play_count, Length: 75, dtype: int64


In [None]:
# 5. Compute the weighted score
WEIGHTS = {
    "pool_count": 10,
    "shelter_count": 5,
    "play_count": 1
}
df["score"] = (
    df["pool_count"] * WEIGHTS["pool_count"] +
    df["shelter_count"] * WEIGHTS["shelter_count"] +
    df["play_count"] * WEIGHTS["play_count"]
)



In [None]:
# 6. Sort and display top neighborhoods
top = df.sort_values("score", ascending=False)
print(top.head(10))

# 7. Optional: Plot the top 10 neighborhoods
fig, ax = plt.subplots(figsize=(10, 5))
top["score"].head(10).plot(kind="bar", ax=ax)
ax.set_ylabel("Total Points")
ax.set_title("Top 10 Neighborhoods by Amenities Score")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()
