## **Best Neighborhood in Pittsburgh**

## Group Name: Pes (Paul and Wes)

#### **Introduction**

To find the best neighborhood in Pittsburgh, we decided to try and find the best neighborhood to raise a child in. We considered all the things parents want in a nieghborhood for their child, and decided to look at the number of parks, number of playgrounds, and number of kids enrolled in school. By doing this, we were able to look at information that was unrelated to get a wider idea of what the best neighborhood in Pittsburgh is.

Data Metrics:
Parks.csv (Parks Data)
Playgrounds.csv (Playgrounds Data)
neighborhood_enrollment.csv (Kids Enrolled in School Data)
population-density.csv (Population by Neighborhood Data, from 2010)

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
df = pd.read_csv("Parks.csv", index_col="id")

In [None]:
# Renames and reads Playgrounds.csv
playgrounds = pd.read_csv("Playgrounds.csv", index_col="id")
# Plots a graph where the value_counts() gets how many parks in the neighborhood and renames the graph as ax for axis labeling
Playgrounds = playgrounds['neighborhood'].value_counts().sort_values()
ax = Playgrounds.plot(kind='barh', rot=0, figsize=(15,35))
ax.set_xlabel("Number of Playgrounds")
ax.set_ylabel("Name of Neighborhood")
plt.title("Playgrounds in Neighborhoods in Pittsburgh")

In [None]:
y = df['type'].value_counts().sort_values()
y.plot(kind='bar', rot=0, figsize=(15,20))

In [None]:
Parks = df['neighborhood'].value_counts().sort_values(ascending = False)

In [None]:
Parks.plot(kind='barh', rot=0, figsize=(15,35))

In [None]:
Parks.head(3)

In [None]:
df = pd.merge(Parks, Playgrounds, right_index = True, left_index = True )
df['neighborhood'] = df['neighborhood_x'] + df['neighborhood_y']
total_parks = df['neighborhood']
print(total_parks)

In [None]:
total_parks.sort_values().tail(5)

## **Introduction**

As part of the best neighborhood for a kid to grow up in, one of the metrics we had to look at was enrollment in schools. We decided to look at enrollment in schools by neighborhood, as we decided a neighborhood with more children enrolled in schools would be "better." 

In [None]:
#import pandas as pd
#import numpy as np
#%matplotlib inline
#import matplotlib.pyplot as plt
neighborhoodEnroll = pd.read_csv("neighborhood_enrollment.csv")


neighborhoodEnroll = neighborhoodEnroll.iloc[:, [0,8]]
neighborhoodEnroll = neighborhoodEnroll.dropna()

In [None]:
kidsSchools = {} # Make Dictionary
df = pd.DataFrame(neighborhoodEnroll)
for index, row in df.iterrows():
    try:
        kidsSchools[row['neighborhood']] += int(row['total_students_enrolled'])
    except:
        kidsSchools[row['neighborhood']] = int(row['total_students_enrolled'])

data = pd.DataFrame(kidsSchools, index=[0])
ax = data.plot.barh(rot=0, figsize=(15,35))
ax.set_xlabel('Enrollment')
ax.set_ylabel('Neighborhoods')
ax.invert_yaxis()
plt.title('Number of Kids Enrolled in School by Neighborhood')

In [None]:
topTen = data.max().nlargest(10)
topTen.plot.barh(rot=0, figsize=(5,10))
plt.title('Top Ten Neighborhoods by Kids Enrolled in School')
plt.xlabel('Enrollment')
plt.ylabel('Neighborhood')

In [None]:
bottomTen = data.min().nsmallest(10)
bottomTen.plot.barh(rot=0, figsize=(5,10))
plt.title('Bottom Ten Neighborhoods by kids Enrolled in School')
plt.xlabel('Enrollment')
plt.ylabel('Neighborhood')

In [None]:
import pandas as pd
import geopandas
%matplotlib inline

neighborhoods = geopandas.read_file('Neighborhoods_.shp')

In [None]:
school_enroll = neighborhoodEnroll.groupby('neighborhood').sum()['total_students_enrolled']
school_enroll.sort_values(ascending=False)

schools_map = neighborhoods.merge(school_enroll, how='left', left_on='hood', right_on='neighborhood')
schools_map[['hood','total_students_enrolled']].head()

In [None]:
schools_map.plot(column='total_students_enrolled', # set the data to be used for coloring
               cmap='OrRd',              # choose a color palette
               edgecolor="black",        # outline the districts in white
               legend=True,              # show the legend
               legend_kwds={'label': "Total Students Enrolled"}, # label the legend
               figsize=(15, 10),         # set the size
               missing_kwds={"color": "lightgrey"} # set disctricts with no data to gray
               )

In [None]:
populations = pd.read_csv("population-density.csv")
populations = populations.iloc[:, [0,9]]
df = pd.DataFrame(populations)

pop_data = df.set_index('Neighborhood').T.to_dict('records')[0]
for key in pop_data:
    pop_data[key] = pop_data[key].replace(',', '')


In [None]:
school_percents = {}
for key in pop_data:
    if key in pop_data and key in kidsSchools:
        school_percents[key] = (kidsSchools[key] / int(pop_data[key])) * 100


In [None]:
s = pd.DataFrame(school_percents, index=[0])
topTenPers = s.max().nlargest(10)
topTenPers.plot.barh(rot=0, figsize=(5,10))
plt.title('Top Ten Neighborhoods by Percentage of Kids Enrolled in School')
plt.xlabel('Percentage')
plt.ylabel('Neighborhood')

In [None]:
school_series = pd.DataFrame(school_percents.items(), columns=['Neighborhood', 'Percent'])
school_percentages = school_series.groupby('Neighborhood').sum()['Percent']
s = school_percentages.sort_values(ascending=False)

percentages_map = neighborhoods.merge(school_percentages, how='left', left_on='hood', right_on='Neighborhood')
percentages_map[['hood','Percent']].head()

In [None]:
percentages_map.plot(column='Percent', # set the data to be used for coloring
               cmap='OrRd',              # choose a color palette
               edgecolor="black",        # outline the districts in white
               legend=True,              # show the legend
               legend_kwds={'label': "Percent of People in Neighboorhood Enrolled in School"}, # label the legend
               figsize=(15, 10),         # set the size
               missing_kwds={"color": "lightgrey"} # set disctricts with no data to gray
               )

## **Combination**

For the combination of our data, we decided to find the ratio of students enrolled in school to the total number of parks and playgrounds by neighborhood. We decided the lowest ratio would be best becuase that'd mean there's more parks for the kids.

In [None]:
enroll_data = pd.Series(kidsSchools)
total_Parks = pd.Series(total_parks)

In [None]:
ratio = {}
for index, value in enroll_data.iteritems():
    if index in enroll_data and index in total_Parks and enroll_data[index] > 200:
        try:
            ratio[index] = float(enroll_data[index] / total_Parks[index])
        except:
            pass


In [None]:
series = pd.Series(ratio)
series.sort_values().head()

In [None]:
df = series.to_frame()
new_column_names = ['Ratio']
df.columns = new_column_names
lowest = df.sort_values(by='Ratio').head(10)
plt.scatter(lowest.index, lowest['Ratio'])
plt.xticks(rotation = 90)
plt.figure(figsize=(35,20))

## **Conclusion**

### The Best Neighborhood in Pittsburgh is: Highland Park

Paul: As someone not from the Pittsburgh area and has not been around all of Pittsburgh yet, I do not know anything about Highland Park. However, after looking at pictures online it does look like a very nice nieghborhood, so I can understand our data led us to it. Based on these pictures, I do agree that Highland Park is the best neighborhood in Pittsburgh.