# Best Neighborhood


## Metric: Population Density 

We all know our friendly neighborhood Spider-Man but he can't work if there's no one for him to help or if there's too many people for him to help at the same time. That is why my metric is on the population density of each neighborhood to accurately reflect which neighborhood Spider-Man would be most comfortable in. 

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
#Dataset for Area and Size of the Neighborhoods

url2 = "https://data.wprdc.org/datastore/dump/668d7238-cfd2-492e-b397-51a6e74182ff"
area = pd.read_csv(url2)

In [None]:
#Dataset for Population 

url3 = "https://data.wprdc.org/dataset/5b18c198-474c-4723-b735-cc5220ad43cc/resource/82f29015-6905-4b1c-8300-afe9bb2231b3/download/total-population.csv"
population = pd.read_csv(url3)

In [None]:
#Find out what data is in the datasets

for population_data in population:
    print(population_data)

In [None]:
for neighborhood in area:
    print(neighborhood)

In [None]:
#Create two new datasets by taking only the Population and Area of the Neighborhoods

populationNew = pd.DataFrame(population,columns=['Neighborhood','Estimate; Total'])
populationNew.rename(columns = {'Estimate; Total':'Population'}, inplace = True)
print(populationNew)

In [None]:
pittNeighborhoods = pd.DataFrame(area, columns = ['hood', 'area'])

#Dividing the Area by 1000000 in order to make the units square meters
pittNeighborhoods['area'] = (pittNeighborhoods['area']/10000000).round(5)
print(pittNeighborhoods)


In [None]:
#Renaming the Columns in order to merge the two datasets together

pittNeighborhoods.rename(columns = {'hood':'Neighborhood', 'area':'Area'}, inplace = True)
finalPop = pd.merge(pittNeighborhoods, populationNew)


In [None]:
#Find the Population Density by dividing the population by the area

finalPop['Density'] = (finalPop['Population']/finalPop['Area']).round(5)
print(finalPop)

In [None]:
#Graphing the dataset by Density

finalPop.plot.barh(x = 'Neighborhood', y = 'Density', figsize = (100,100))
plt.title("Population Density of Pittsburgh Neighborhoods") 
plt.xlabel("Population Density")

plt.yticks(rotation=30, horizontalalignment="center")

There are about 90 neighborhoods in Pittsburgh, we need to be able to clearly see population densities without having to read through every neighborhood. Let's use a heatmap to visualize this better.

In [None]:


map = gpd.read_file("population.geojson")
geo_df = finalPop["Density"]
map.plot(column = geo_df, figsize = (12,12), legend = True, cmap = "winter").set_axis_off()

There are very obvious places that stand out that have the highest population density. However, we must be a picky with the best neighborhood. We can't allow Spider-Man in a place where he can't handle so many at the same time but we also can't allow him in a place where there is no one. In order to solve this, let's narrow the neighborhoods by how close they are to the median of all population densities. 

In [None]:
#Creating a box plot in order to find the average population density among the neighborhoods

finalPop.plot.box(y = 'Density')

In [None]:
#Finding out the median and upper and lower quartiles of the box plot

median = finalPop['Density'].median()
q1 = finalPop['Density'].quantile(q = 0.25)
q3 = finalPop['Density'].quantile(q = 0.75)

print(median)
print(q1)
print(q3)

We now know the median and lower and upper quartile of all of population densities. Let's use the lower and upper quartile as the upper and lower boundaries in order to narrow down the neighborhoods.

In [None]:
#Filtering Dataset to find a neighborhood closest to the median

realFinal = finalPop[finalPop['Density'] <=q3]
srsFinal = realFinal[realFinal['Density']>=q1]

In [None]:
print(srsFinal)

Wow there are still so many, which makes sense. In a box plot most of the data is located in between the upper and lower quartile. Lets narrow down the boundaries to about 100 less and 100 more from the median. 

In [None]:
thinkFinal = realFinal[realFinal['Density']>= 1900]
notFinal = thinkFinal[thinkFinal['Density']<=2100]
print(notFinal)

This narrowed down the results dramatically to 7 results. Let's plot these neighborhoods and see which one is the closest to the median. 

In [None]:


notFinal.plot.barh(x = 'Neighborhood', y = 'Density')
plt.title("Population Density of Pittsburgh Neighborhoods") 
plt.xlabel("Population Density")
plt.ylabel("Pittsburgh Neighborhoods")

plt.axvline(x = median, color = 'k')

The closest to the median is Beechview. We could end it here, however, it feels a little too easy. There were a lot of outliers in the box plot and the box plot doesn't account for them. Let's use the mean of the population densities since that brings into account all of the neighborhoods. 

In [None]:
#Finding the average of all of the population densities

average = finalPop['Density'].mean()
print(average)

Just like before, we can use 100 less and 100 more of the average in order to narrow down the neighborhoods and plot them to visualize which neighborhood is the closest to the average. 

In [None]:
#Filtering neighborhoods closest to the average

averageFinal = finalPop[finalPop['Density'] <= (average+100)]
averFinal = averageFinal[averageFinal['Density']>=(average-100)]



In [None]:
averFinal.plot.barh(x = 'Neighborhood', y = 'Density')
plt.title("Population Density of Pittsburgh Neighborhoods") 
plt.xlabel("Population Density")
plt.ylabel("Pittsburgh Neighborhoods")
plt.axvline(x = average, color = 'k')

We can see that Brookline is the closest to the average, making it the best neighborhood for Spider-Man to be in due to the accountability for all of the densities. 