![](http://static1.squarespace.com/static/5576f1c0e4b08f31a497b582/t/5576fcf6e4b0a6d0afa92d0f/1527185421758/)

**ABOUT PASSNYC**

PASSNYC is a not-for-profit organization that facilitates a collective impact that is dedicated to broadening educational opportunities for New York City's talented and underserved students. New York City is home to some of the most impressive educational institutions in the world, yet in recent years, the City’s specialized high schools - institutions with historically transformative impact on student outcomes - have seen a shift toward more homogeneous student body demographics.

PASSNYC uses public data to identify students within New York City’s under-performing school districts and, through consulting and collaboration with partners, aims to increase the diversity of students taking the Specialized High School Admissions Test (SHSAT). By focusing efforts in under-performing areas that are historically underrepresented in SHSAT registration, we will help pave the path to specialized high schools for a more diverse group of students.

![](https://img.huffingtonpost.com/asset/5a9dc61c1f000052001693d6.jpeg?ops=scalefit_950_800_noupscale)

**PROBLEM STATEMENT**

PASSNYC and its partners provide outreach services that improve the chances of students taking the SHSAT and receiving placements in these specialized high schools. The current process of identifying schools is effective, but PASSNYC could have an even greater impact with a more informed, granular approach to quantifying the potential for outreach at a given school. Proxies that have been good indicators of these types of schools include data on English Language Learners, Students with Disabilities, Students on Free/Reduced Lunch, and Students with Temporary Housing.

Part of this challenge is to assess the needs of students by using publicly available data to quantify the challenges they face in taking the SHSAT. The best solutions will enable PASSNYC to identify the schools where minority and underserved students stand to gain the most from services like after school programs, test preparation, mentoring, or resources for parents. 

**What exactly is the SHSAT examination??**

![](https://kentprep.com/wp-content/uploads/2018/04/SHSAT_2018_header.jpg)

**
The Specialized High Schools Admissions Test (SHSAT) is an examination administered to eighth and ninth grade students residing in New York City and used to determine admission to all but one of the city's nine Specialized High Schools.**

**Some Key points regarding the current scenario of students admittance to the city's nine Specialized High Schools**
*  Only 10 percent of New York City’s public school students who are black or Latino received offers to attend a specialized high school last year, even though 67 percent of New York public school students are black or Latino.
*  Asians make up 62 percent of students at specialized high schools and white students make up 24 percent, though only 16 percent of public school students are Asian and 15 percent are white.
Source : [here](https://www.nytimes.com/2018/06/21/nyregion/what-is-the-shsat-exam-and-why-does-it-matter.html)

**For the purpose of analysing the data, we will look at the following main factors:**
* The economic condition of the schools
* The Racial distribution in the schools
* The academic performance of the students and
* Lastly, the location of the school

**Let's do some EDA on the datasets provided to us first and then hopefully we can utilize external data sources to help PASSNYC take a data driven approach to increasing the student diversity in NYC's Specialized High Schools**

**First,  Importing the necessary libraries**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import folium

from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff
from plotly import tools
import plotly.plotly as py

**Let's take a look into the data now**

In [None]:
SE = pd.read_csv("../input/data-science-for-good/2016 School Explorer.csv")
reg_test = pd.read_csv("../input/data-science-for-good/D5 SHSAT Registrations and Testers.csv")
fin_emp_cent = pd.read_csv("../input/nyc-financial-empowerment-centers/financial-empowerment-centers.csv")
pd.set_option('display.max_columns', None) 

In [None]:
SE.head(3)

In [None]:
reg_test.head()

**Displaying the first five schools on the map of NYC.**

In [None]:
CORDS = (40.767937, -73.982155)
map_1 = folium.Map(location=CORDS, zoom_start=13)
folium.Marker([40.721834, -73.978766], popup='Roberto Clemente').add_to(map_1)
folium.Marker([40.729892, -73.984231], popup='Asher Levy').add_to(map_1)
folium.Marker([40.721274, -73.986315], popup='Anna Silver').add_to(map_1)
folium.Marker([40.726147, -73.975043], popup='Franklin D. Roosevelt').add_to(map_1)
folium.Marker([40.724404, -73.986360], popup='The Star Academy').add_to(map_1)
display(map_1)

In [None]:
SE['School Income Estimate'] = SE['School Income Estimate'].str.replace(',', '')
SE['School Income Estimate'] = SE['School Income Estimate'].str.replace('$', '')
SE['School Income Estimate'] = SE['School Income Estimate'].str.replace(' ', '')
SE['School Income Estimate'] = SE['School Income Estimate'].astype(float)

**We will try plotting the different schools of NYC based on their economic need index in order to get a better overview of their financial situation**

In [None]:
data = [
    {
        'x': SE["Longitude"],
        'y': SE["Latitude"],
        'text': SE["School Name"],
        'mode': 'markers',
        'marker': {
            'color': SE["Economic Need Index"],
            'size': SE["School Income Estimate"]/4500,
            'showscale': True,
            'colorscale':'Portland'
        }
    }
]

layout= go.Layout(
    title= 'New York School Population based on Economic Need Index',
    xaxis= dict(
        title= 'Longitude'
    ),
    yaxis=dict(
        title='Latitude'
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='NYC_ECNEED_INDEX')

**The zones in bright red represent the zones with the highest economic need index and the ones in dark blue represent the ones with least economic need index. Now that we have seen the zones in which the schools with highest economic need index lies, we will try and plot the Financial Empowerment Centres in NYC to get an understanding of where they are mostly located.**

***ROLE OF FINANCIAL EMPOWERMENT CENTRES IN CHILDREN'S EDUCATION:***
Millions of U.S. households struggle with large amounts of debt and a lack of information and resources to help them build their financial stability. This is a critical problem for cities – research shows that economically strong families are better able to weather economic shocks, contribute to and grow the local economy, and help their children succeed. By delivering one-on-one financial counseling as a free city service, the Financial Empowerment Center model offers cities a tangible strategy to help those most in need of critical one-on-one assistance and build community financial stability.

![](https://www.theadvocacynet.com/wp-content/uploads/2016/03/Extra_031.png)

In [None]:
fin_emp_cent.head()

In [None]:
data = [
    {
        'x': fin_emp_cent["Longitude"],
        'y': fin_emp_cent["Latitude"],
        'text': fin_emp_cent["Provider"],
        'mode': 'markers',
        'marker': {
            'showscale': False,
            'colorscale':'Jet',
            'size': 20
        }
    }
]

layout= go.Layout(
    title= 'Location of Different Financial Empowerment Centres in NYC',
    xaxis= dict(
        title= 'Longitude'
    ),
    yaxis=dict(
        title='Latitude'
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='FC_LOCATIONS')

**Let's plot the locations of schools color coded by  their respective Income Estimates**

In [None]:
data = [
    {
        'x': SE["Longitude"],
        'y': SE["Latitude"],
        'text': SE["School Name"],
        'mode': 'markers',
        'marker': {
            'color': SE["School Income Estimate"],
            'showscale': True,
            'colorscale':'Jet',
            'size': 10
        }
    }
]

layout= go.Layout(
    title= 'Location of NYC Schools based on their Income Estimates',
    xaxis= dict(
        title= 'Longitude'
    ),
    yaxis=dict(
        title='Latitude'
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='NYC_INCOME')

**Schools with low income estimates(in dark blue) flood most of the central region of our plot. If we plot the distribution of Income Estimates based on City and Districts maybe we will get a more clearer picture**

In [None]:
plt.figure(figsize=(12,14))
ax=plt.subplot(211)
sns.boxplot(y=SE['School Income Estimate'],x=SE["District"])
ax.set_title('District vs Income Estimate')

plt.figure(figsize=(12,14))
ax=plt.subplot(211)
sns.boxplot(y=SE['School Income Estimate'],x=SE["City"])
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)
ax.set_title('City vs Income Estimate')

**Based on the number of economically disadvantaged students per school, we will group them together on a bubble chart , this will help us get an  overview as to which schools have the most economically challenged students.**

In [None]:
data = [
    {
        'x': SE["Longitude"],
        'y': SE["Latitude"],
        'text': SE["School Name"],
        'mode': 'markers',
        'marker': {
            'color': SE["Economic Need Index"],
            'size': SE["Grade 3 ELA 4s - Economically Disadvantaged"],
            'showscale': True,
            'colorscale':'Portland'
        }
    }
]

layout= go.Layout(
    title= 'Distribution of economically challenged students across NYC schools',
    xaxis= dict(
        title= 'Longitude'
    ),
    yaxis=dict(
        title='Latitude'
    )
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='NYC_ECNEED_INDEX')

**As can be seen, schools with higher economic need index(characterised by red color) have got more economically challenged students(denoted by size of the bubble)**

More financial analysis to come, this was just a starting. 