## **School Enrollment**

This metric focuses on the neighborhood with the highest school enrollment. This is because I define *bestness*  with the notion that it is best to have the biggest social network. To put it simply, the more people there are in school, the more friends you can make in school.

In [None]:
import pandas as pd
import numpy as np
import geopandas
%matplotlib inline
import matplotlib.pyplot as plt

enroll = pd.read_csv("school-enrollment.csv")

This imports third party libraries for data work in Python and uses Python pandas to read the 2015 Pittsburgh public and private school enrollment data from the Western Pennsylvania Regional Data Center (WPRDC) website.

In [None]:
del enroll['Id']
del enroll['Margin of Error; Total:']
del enroll["Margin of Error; Enrolled in school:"]
del enroll['Margin of Error; Enrolled in school: - Enrolled in nursery school, preschool']
del enroll['Margin of Error; Enrolled in school: - Enrolled in kindergarten']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 1']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 2']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 3']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 4']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 5']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 6']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 7']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 8']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 9']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 10']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 11']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 12']
del enroll['Margin of Error; Enrolled in school: - Enrolled in college, undergraduate years']
del enroll['Margin of Error; Enrolled in school: - Graduate or professional school']
del enroll['Margin of Error; Not enrolled in school']


Removal of these columns makes life *a lot* easier when working with the data in Python.

In [None]:
enroll_dict = dict()
for index in enroll.index:
    enroll_dict.update({enroll.loc[index]['Neighborhood'] : enroll.loc[index]['Estimate; Enrolled in school:']})
enroll_dict

To simplify the work with data, I stored the total school enrollment value of each neighborhood in a dictionary with each corresponding neighborhood name as the key.

In [None]:
data_graph = pd.Series(enroll_dict).sort_values(ascending=False)

data_graph.plot.barh(color = ["palevioletred"], #https://matplotlib.org/stable/gallery/color/named_colors.html
                     figsize=(30,25))

plt.title("Neighborhood School Enrollment")
plt.xlabel('Total Enrollment')

Converting the dictionary of enrollment data to a series allows us to call the **plot()** function from pandas. Here is a horizontal bar graph sorted in descending order to visualize the neighborhood with the highest school enrollment. 


In [None]:
avg_dict = {"Average Enrollment" : data_graph.mean(),
           "North Oakland" : data_graph.max()}

avg_compare = pd.Series(avg_dict)
avg_compare.plot.bar(color = ["lightslategrey","palevioletred"],
                     rot=0,
                    figsize=(4.5,5))

This simple vertical bar graph helps emphasize how best North Oakland is, when compared to the average school enrollment.

<img src="https://i.imgur.com/m5e4ki2.jpg" width="800">
<i> Credit: Aimee Obidzinski </i>


**North Oakland is the clear winner in regards to total school enrollment.** But what does the actual makeup of the total look like?

### Breaking down North Oakland:


In [None]:
enroll = pd.read_csv("school-enrollment.csv",
                    index_col = 'Neighborhood')
del enroll['Id']
del enroll['Margin of Error; Total:']
del enroll["Margin of Error; Enrolled in school:"]
del enroll['Margin of Error; Enrolled in school: - Enrolled in nursery school, preschool']
del enroll['Margin of Error; Enrolled in school: - Enrolled in kindergarten']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 1']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 2']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 3']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 4']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 5']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 6']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 7']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 8']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 9']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 10']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 11']
del enroll['Margin of Error; Enrolled in school: - Enrolled in grade 12']
del enroll['Margin of Error; Enrolled in school: - Enrolled in college, undergraduate years']
del enroll['Margin of Error; Enrolled in school: - Graduate or professional school']
del enroll['Margin of Error; Not enrolled in school']

del enroll['Estimate; Enrolled in school:']
del enroll['Estimate; Total:']
del enroll['Estimate; Not enrolled in school']

NorthOak = enroll['North Oakland': 'North Oakland'][:]
NorthOak.plot.bar(rot=0, 
                  figsize = (15,10))

Here, I reread the csv file to a new variable that will only hold data pertaining to North Oakland, and nothing else. I also delete the undesirable columns that make it hard to work with the data. When attempting to compare the data, there is an overwhelming dominance in the data by undergraduates and graduate students.

<img src="https://i.imgur.com/6TllGy2.png" width="700">

<i> Credit: Google Maps </i>

The dominance by college students in the data is most likely explained by the fact that North Oakland is the centerpiece of Pittsburgh. Not only is Pitt within North Oakland, the neighborhood is also surrounded by multiple colleges with multiple campus living spaces.

<img src="https://i.imgur.com/8iPlxk6.png" width="700">

<i> Credit: Google Maps </i>

In [None]:
del NorthOak['Estimate; Enrolled in school: - Enrolled in college, undergraduate years']
del NorthOak['Estimate; Enrolled in school: - Graduate or professional school']

To be able to properly compare the data, I remove the ungrad and graduate columns being considered in the data. This allows me to zoom into the rest of the data that was overshadowed.

In [None]:
NorthOak.squeeze().plot.pie(figsize = (10,10), 
                            colormap = 'rainbow', #https://matplotlib.org/2.0.2/examples/color/colormaps_reference.html
                            ylabel = '')

This pie chart provides a visual explanation that if you were to live in North Oakland and not attending college, the best grade level within North Oakland is Third Grade.

In [None]:
NorthOak #show pie chart just ignores columns with 0.0 value

Surprisingly, North Oakland has 0 people enrolled in a school at multiple grade levels (as of this 2015 data).

## Conclusion

North Oakland is the best neighborhood by the metric comparing total school enrollment and if you were to live in North Oakland, the best grade to make friends in, is as an Undergraduate in college! **😁**

...or as a third grader.