<h1>Introduction</h1>
**The problem**<br>
New York City's Specialized High Schools (SHS) have seen a shift toward more homogeneous student body demographics in recent years. In 2018, 4.1% of Black students and 6.3% Hispanic students who took the Specialized High Schools Admissions Test(SHSAT) received an offer. 29.7% of offers were sent out to Asian students and 26.2% to White students who took the SHSAT. ([Christina Veiga, 2018](http://https://www.chalkbeat.org/posts/ny/2018/03/07/few-black-and-hispanic-students-receive-admissions-offers-to-new-york-citys-top-high-schools-again/))

**PASSNYC's Goal**<br>
To increase diversity in specialized high schools by focusing efforts to increase the socioeconomic diversity of students taking the SHSAT by quantifying the potential for outreach at a given school. 

**How will we reach this goal?**<br>
Identify factors that may affect the low SHSAT registration numbers from the Hispanic and Black student population in New York City's school districts.

**What does the Economic Need Index mean?**<br>
According to the [Equity and Excellence for All: Diversity in New York City Public Schools](http://https://cdn-blob-prd.azureedge.net/prd-pws/docs/default-source/default-document-library/diversity-in-new-york-city-public-schools-english.pdf?sfvrsn=72d7c704_67) brochure from The New York City Department of Education, <br>
>"A school’s Economic Need is defined by its Economic Need Index (ENI), which determines the likelihood that students at the school are in poverty."

The higher the ENI (100% or 1.0) the higher the likelyhood the student is in poverty<br>

>"The ENI is calculated as follows:<br>
* If the student is HRA-eligible or living in temporary housing, the student’s Economic Need Value is 1.0.<br>
* For high school students, if the student has a home language other than English and entered the NYC DOE for the first time within the last four years, the student’s ENI value is 1.0.<br>
* Otherwise, the student’s Economic Need Value Is based on the percentage of families (with school-age children) in the tudent’s census tract whose income is below the poverty level, as estimated by the American Community Survey 5-Year Estimate."<br>

> "The school’s Economic Need Index is the average of its students’ Economic Need Values."

<br>**How is the estimated school income determined?**<br>
Since 2007, NYC has adopted a new way to allocate money for their public schools through the Fair Student Funding formula. According to the [Fair Student Funding & School Budget Resource Guide FY 2018](http://https://www.scribd.com/document/383737013/Fair-Student-Funding-School-Budget-Resource-Guide-FY-2018), <br>
> "the Fair Student Funding (FSF) formula allocates dollars to schools through five basic categories:<br>
> 1. Foundation—a fixed sum of &#36;225,000 for all schools<br>
> 2. Grade weights, based on student grade levels<br>
> 3. Needs weights, based on student needs<br>
> 4. Enhanced weights for students in “portfolio” high schools<br>
> 5. Collective Bargaining related increases for staff funded with FSF"



<h2>The 5 Buroughs of New York City</h2>
Though I have been to New York City before (only once, but plan on more in the future), I am unfamiliar with the geographical locations and borders of the Boroughs<br>
![Boroughs of NYC](https://upload.wikimedia.org/wikipedia/commons/3/34/5_Boroughs_Labels_New_York_City_Map.svg)
![Legend](https://i.imgur.com/kscGbYA.png)
Source: https://commons.wikimedia.org/wiki/File:5_Boroughs_Labels_New_York_City_Map.svg

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# from mpl_toolkits.basemap import Basemap
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
import cufflinks as cf
cf.go_offline()
%matplotlib inline

# import data
se = pd.read_csv('../input/data-science-for-good/2016 School Explorer.csv')
shsat = pd.read_csv('../input/data-science-for-good/D5 SHSAT Registrations and Testers.csv')
evict = pd.read_csv('../input/2017-2018-evictions/Evictions.csv')
finance = pd.read_csv('../input/nyc-financial-empowerment-centers/financial-empowerment-centers.csv')
emerg = pd.read_csv('../input/ny-emergency-response-incidents/emergency-response-incidents.csv')

# clean up school income estimate
se['School Income Estimate'] = se['School Income Estimate'].astype(str).str.replace('$','')
se['School Income Estimate'] = se['School Income Estimate'].astype(str).str.replace(',','')
se['School Income Estimate'] = se['School Income Estimate'].astype(str).str.replace(' ','')
se['School Income Estimate'] = se['School Income Estimate'].apply(lambda x: float(x))

# dropping null values
nullIncome = se[se['School Income Estimate'].isnull()]
df1 = se.drop(nullIncome.index)
nullNeed = df1[df1['Economic Need Index'].isnull()]
df1 = df1.drop(nullNeed.index)

# create borough series
df1['Borough'] = df1['City']
for i in df1[df1['City'] == 'NEW YORK'].index:
    df1.at[i, 'Borough'] = 'MANHATTAN'
for i in df1[df1['City'] == 'ROOSEVELT ISLAND'].index:
    df1.at[i, 'Borough'] = 'MANHATTAN'
for i in df1[df1['City'] == 'ELMHURST'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'WOODSIDE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'CORONA'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'MIDDLE VILLAGE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'MASPETH'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'RIDGEWOOD'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'GLENDALE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'LONG ISLAND CITY'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'FLUSHING'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'COLLEGE POINT'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'WHITESTONE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'BAYSIDE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'QUEENS VILLAGE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'LITTLE NECK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'DOUGLASTON'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'FLORAL PARK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'BELLEROSE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'JAMAICA'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'ARVERNE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'FAR ROCKAWAY'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'SOUTH OZONE PARK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'BROAD CHANNEL'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'RICHMOND HILL'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'WOODHAVEN'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'SOUTH RICHMOND HILL'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'OZONE PARK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'ROCKAWAY PARK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'HOWARD BEACH'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'ROCKAWAY BEACH'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'KEW GARDENS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'FOREST HILLS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'REGO PARK'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'SPRINGFIELD GARDENS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'HOLLIS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'SAINT ALBANS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'ROSEDALE'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'CAMBRIA HEIGHTS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'JACKSON HEIGHTS'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'ASTORIA'].index:
    df1.at[i, 'Borough'] = 'QUEENS'
for i in df1[df1['City'] == 'EAST ELMHURST'].index:
    df1.at[i, 'Borough'] = 'QUEENS'

# convert percent to float
def toFloat(x):
    fin = int(x.replace('%', ''))
    return fin
df1['Percent Asian'] = df1['Percent Asian'].astype(str).apply(toFloat)
df1['Percent Black'] = df1['Percent Black'].astype(str).apply(toFloat)
df1['Percent Hispanic'] = df1['Percent Hispanic'].astype(str).apply(toFloat)
df1['Percent White'] = df1['Percent White'].astype(str).apply(toFloat)
df1['Percent ELL'] = df1['Percent ELL'].astype(str).apply(toFloat)
df1['Percent Black / Hispanic'] = df1['Percent Black / Hispanic'].astype(str).apply(toFloat)

# df1[['Percent Asian', 'Percent Black', 'Percent Hispanic', 'Percent White']].head()
# (nullIncome[nullIncome['School Income Estimate'].isnull()].shape[0] + nullNeed[nullNeed['Economic Need Index'].isnull()].shape[0])/se.shape[0]

# convert float/int to string for labeling
df1['incomeString'] = df1['School Income Estimate'].apply(lambda x: str(x))
df1['econString'] = df1['Economic Need Index'].apply(lambda x: str(round(x,2)))
df1['blackString'] = df1['Percent Black'].apply(lambda x: str(x))
df1['hispanicString'] = df1['Percent Hispanic'].apply(lambda x: str(x))
df1['asianString'] = df1['Percent Asian'].apply(lambda x: str(x))
df1['whiteString'] = df1['Percent White'].apply(lambda x: str(x))
df1['ellString'] = df1['Percent ELL'].apply(lambda x: str(x))
df1['blackhispanicString'] = df1['Percent Black / Hispanic'].apply(lambda x: str(x))
df1['Percent White / Asian'] = df1['Percent Asian'] + df1['Percent White']

# mean DF
cityNumbers = pd.DataFrame()
cityNumbers['Count'] = df1.groupby('Borough')['Percent Black'].count().sort_values(ascending=False)
cityNumbers['ENI Mean'] = df1.groupby('Borough')['Economic Need Index'].mean()
cityNumbers['percentBlackMean'] = df1.groupby('Borough')['Percent Black'].mean()
cityNumbers['percentHispanicMean'] = df1.groupby('Borough')['Percent Hispanic'].mean()

# cleaning up eviction df
evict['EXECUTED_DATE'] = pd.to_datetime(evict['EXECUTED_DATE'])
evict.drop(evict[evict['EXECUTED_DATE'] == evict['EXECUTED_DATE'].max()].index, inplace=True)
evictRes = evict[evict['RESIDENTIAL_COMMERCIAL_IND'] == 'Residential']
evicted = evictRes.groupby('BOROUGH')['COURT_INDEX_NUMBER'].count().values


# financial empowerment centers
finProv = finance.groupby('Borough')['Provider'].count().values

# def cleanUpERI():
# nyc emergency response incidents
emerg['Creation Date'] = pd.to_datetime(emerg['Creation Date'])
emerg['Closed Date'] = pd.to_datetime(emerg['Closed Date'])
emerg['Incident Category'] = emerg['Incident Type'].apply(lambda x: x.split('-')[0])
# clean up borough names
for i in emerg[emerg['Borough'] == 'BRonx'].index:
     emerg.at[i, 'Borough'] = 'bronx'
for i in emerg[emerg['Borough'] == 'BrONX'].index:
     emerg.at[i, 'Borough'] = 'bronx'
for i in emerg[emerg['Borough'] == 'Bronx (NYCHA)'].index:
     emerg.at[i, 'Borough'] = 'bronx'
for i in emerg[emerg['Borough'] == 'Bronx'].index:
     emerg.at[i, 'Borough'] = 'bronx'
for i in emerg[emerg['Borough'] == 'Brooklyn (NYCHA-Brevoort)'].index:
     emerg.at[i, 'Borough'] = 'brooklyn'
for i in emerg[emerg['Borough'] == 'Brooklyn'].index:
     emerg.at[i, 'Borough'] = 'brooklyn'
for i in emerg[emerg['Borough'] == 'Essex'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Mamhattan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manahttan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manahttan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhatan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhattah'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'MANHATTAN'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhattan (Pier 92)'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhattan (Waldorf Astoria)'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhatten'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhhattan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'     
for i in emerg[emerg['Borough'] == 'Manhaatan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhattan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Manhttan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Mnahattan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Nassau'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'New York'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'new york'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'New York/Manhattan'].index:
     emerg.at[i, 'Borough'] = 'manhattan'  
for i in emerg[emerg['Borough'] == 'New Yotk'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'NewYork'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'nyc'].index:
     emerg.at[i, 'Borough'] = 'manhattan'
for i in emerg[emerg['Borough'] == 'Astoria'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'quenns'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'astoria'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Far Rockaway'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'QUEENS'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Jamaice'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Queens'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Flushing'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Hollis'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Jamaica'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'Long Island City'].index:
     emerg.at[i, 'Borough'] = 'queens'
for i in emerg[emerg['Borough'] == 'staten Island'].index:
     emerg.at[i, 'Borough'] = 'staten island'
for i in emerg[emerg['Borough'] == 'Richmond/Staten Island'].index:
     emerg.at[i, 'Borough'] = 'staten island'
for i in emerg[emerg['Borough'] == 'SI'].index:
     emerg.at[i, 'Borough'] = 'staten island'
for i in emerg[emerg['Borough'] == 'Richmond/Staten Island'].index:
     emerg.at[i, 'Borough'] = 'Staten Island (Midland Beach Area)'
for i in emerg[emerg['Borough'] == 'Staten ISland'].index:
     emerg.at[i, 'Borough'] = 'staten island'
for i in emerg[emerg['Borough'] == 'Staten Island'].index:
     emerg.at[i, 'Borough'] = 'Staten Island (Midland Beach Area)'
for i in emerg[emerg['Borough'] == 'Staten Island (Midland Beach Area)'].index:
     emerg.at[i, 'Borough'] = 'staten island'
for i in emerg[emerg['Borough'] == 'Staten island'].index:
     emerg.at[i, 'Borough'] = 'staten island'
emerg.drop(emerg[emerg['Borough'] == 'Bergen'].index, inplace=True)
emerg.drop(emerg[emerg['Borough'] == 'Hoboken'].index, inplace=True)
emerg.drop(emerg[emerg['Borough'] == 'Citywide'].index, inplace=True)
for i in emerg[emerg['Incident Category'] == 'LawEnforcement'].index:
     emerg.at[i, 'Incident Category'] = 'Law Enforcement'



In [None]:
# import os
# os.listdir('../input/ny-emergency-response-incidents/emergency-response-incidents.csv')

In [None]:
# Looking at the data, there are null values for School Income Estimate and the Economic Need Index which account for 32.63% of the data which leaves us with 857 useable records

<h1>Relationship Between ENI and School Income</h1>
To get a feel for the climate of NYC schools, taking a look at the school income and the ENI are 2 good indicators to start with.
* Each school is plotted based on its Longitude and Latitude.<br>
* Schools with **higher average economic need**  will be **<span style="color:red">more red</span>**.<br>
* Schools with **lower estimated income** will be *smaller*

In [None]:
data = [
    {
        'x': df1['Longitude'],
        'y': df1['Latitude'],
        'mode': 'markers',
        'text': df1['School Name'] + ', ' + df1['Borough'],
        'marker': {
            'size': df1['School Income Estimate']/4500,
            'color': df1['Economic Need Index'],
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': 'ENI',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= 'New York School Income with Economic Need',
    xaxis= {
        'title' : 'Longitude'
    },
    yaxis={
        'title': 'Latitude'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
# Lowest school income
df1.loc[df1['School Income Estimate'].idxmin()]
# highest economic need
df1.loc[df1['Economic Need Index'].idxmax()]

From the above graph, ** Brooklyn and the Bronx** contain the primary clusters of schools with the highest economic need and lowest school income which informs that that these areas are very likley in the most need of help. <br>
* The school with the **lowest school income** is P.S. 150 Christopher in Brooklyn with an *income of &#36;16,901* and an **<span style="color:red">economic need index of 0.948</span>** <br>
* The school with the **highest economic need index** is P.S. 065 Mother Hale Academy in the Bronx with an *income of &#36;22,507* and an **<span style="color:red">economic need index of 0.957</span>**

In [None]:
data = [
    {
        'x': df1['Percent White / Asian'],
        'y': df1['Economic Need Index'],
        'mode': 'markers',
        'text': df1['blackhispanicString'] + '% Black/Hispanic, ' + df1['ellString'] + '% ELL',
        'marker': {
            'color': df1['School Income Estimate'],
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': 'School Income',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= 'Percentage of White/Asian Students Vs Economic Need Index',
    xaxis= {
        'title' : 'Percentage of White/Asian Students'
    },
    yaxis={
        'title': 'Economic Need Index'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
data = [
    {
        'x': df1['Percent Black / Hispanic'],
        'y': df1['Economic Need Index'],
        'mode': 'markers',
        'text': df1['blackhispanicString'] + '% Black/Hispanic, ' + df1['ellString'] + '% ELL',
        'marker': {
            'color': df1['School Income Estimate'],
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': 'School Income',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= 'Percentage of Black/Hispanic Students Vs Economic Need Index',
    xaxis= {
        'title' : 'Percentage of Black/Hispanic Students'
    },
    yaxis={
        'title': 'Economic Need Index'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In the 2 graphs above, we do see a correlation between percentage of race and ENI.

In the Percentage of White/Asian Students Vs Economic Need Index graph, we can see that as the percentage of White/Asian students in NYC schools increase, 2 things happen: the ENI decreases and school income increase.<br>

However, in the Percentage of Black/Hispanic Students Vs Economic Need Index graph, we see the the opposite with ENI increasing and school income decreasing as the percentage of Black/Hispanic students increases.

Next, we'll take a look at the demographics Black and Hispanic students within Brooklyn and the Bronx.

<h1>Black and Hispanic Student Breakdown in NYC</h1>
<h2>Relationship Between Ethnicity and ENI</h2>
In the next 2 graphs, we will be taking a look at the breakdown percentage for Black and Hispanic students compared to the ENI of each school. 
* The higher the ENI, the **<span style="color:red">more red</span>** the plot will be
* The higher the percentage of the racial background in the school, the **<span style="font-size:18px">bigger</span>** the size of the plot will be.

In [None]:


data = [
    {
        'x': df1['Longitude'],
        'y': df1['Latitude'],
        'mode': 'markers',
        'text': df1['Borough'] + ', Economic Need: ' + df1['econString'] + ', Percent Black: ' + df1['blackString'],
        'marker': {
            'color': df1['Economic Need Index'],
            'size': df1['Percent Black']/3,
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': 'Economic Need',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= '% Black and Economic Need',
    xaxis= {
        'title' : 'Longitude'
    },
    yaxis={
        'title': 'Latitude'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
data = [
    {
        'x': df1['Longitude'],
        'y': df1['Latitude'],
        'mode': 'markers',
        'text': df1['Borough'] + ', Economic Need: ' + df1['econString'] + ', Percent Hispanic: ' + df1['hispanicString'],
        'marker': {
            'color': df1['Economic Need Index'],
            'size': df1['Percent Hispanic']/3,
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': 'Economic Need',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= '% Hispanic and Economic Need',
    xaxis= {
        'title' : 'Longitude'
    },
    yaxis={
        'title': 'Latitude'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

At a glance, we can see that
* schools in Brooklyn and Queens have the highest precentage of Black students 
* schools in the Bronx have the highest percentage of Hispanic students

As seen earlier in our first graph, these areas of high percentage of Black and Hispanic students fall consistent with schools that have high economic need.

In [None]:

trace1 = go.Bar(
    x= ['Brookyln', 'Queens', 'Bronx','Manhattan', 'Staten Island'],
    y= [cityNumbers['ENI Mean'][0]*100, cityNumbers['ENI Mean'][1]*100, cityNumbers['ENI Mean'][2]*100, cityNumbers['ENI Mean'][3]*100, cityNumbers['ENI Mean'][4]*100],
    name = '% ENI Mean')

trace2 = go.Bar(
    x= ['Brookyln', 'Queens', 'Bronx','Manhattan', 'Staten Island'],
    y= [cityNumbers['percentBlackMean'][0], cityNumbers['percentBlackMean'][1], cityNumbers['percentBlackMean'][2], cityNumbers['percentBlackMean'][3], cityNumbers['percentBlackMean'][4]],
    name = '% Black Mean')

trace3 = go.Bar(
    x= ['Brookyln', 'Queens', 'Bronx','Manhattan', 'Staten Island'],
    y= [cityNumbers['percentHispanicMean'][0], cityNumbers['percentHispanicMean'][1], cityNumbers['percentHispanicMean'][2], cityNumbers['percentHispanicMean'][3], cityNumbers['percentHispanicMean'][4]],
    name = '% Hispanic Mean')



data = [trace1,trace2,trace3]

layout = go.Layout(
    barmode='group',
    title= 'Mean of ENI, Black Students, Hispanic Students by Borough',
    xaxis= {'title': 'Areas'},
    yaxis= {'title': 'Percent'}
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Taking a little deeper dive, I have broken things down to focus on areas listed within the 'City' column in the School Explorer that have 30 or more schools which left me with the Bronx, Brooklyn, New York (basically Manhattan), and Staten Island. From there I have taken the mean of the ENI (increased by a factor of 100), mean of the percent of Black students, and mean of the percent of Hispanic students for each of these areas. <br>

The area with the highest mean of percent of Hispanic students per school is the Bronx with 62% which also has the highest mean ENI at 0.82.<br>
The area with the highest mean of percent of Black students per school is Brooklyn with 43% with the second highest mean ENI at 0.71.
<br><br>
It is obvious to us that our areas of focus for these underrepresented students in SHSAT registration reside in Brooklyn and the Bronx. However, these are very large areas with a lot of schools and it would be difficult to help everyone at once with current resources at hand. It would best to first narrow things down.

<h1>Exploring Possible Factors</h1>
In order to narrow things down, we'll be taking a look at factors that Black and Hispanic students in Brooklyn and the Bronx may be facing or are affected by to a greater degree than their peers who take the SHSAT.

<h2>Evictions</h2>
Evictions dataset grabbed from [NYC OpenData website](http://https://data.cityofnewyork.us/City-Government/Evictions/6z8x-wfk4) which looks at dates from 1/3/2017- 7/11/2018

In [None]:

trace1 = go.Bar(
    x= ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens'],
    y= [evicted[0], evicted[1], evicted[2], evicted[4], evicted[3]],
    name = '% ENI Mean')

data = [trace1]

layout = go.Layout(
    title= 'Number of Residential Evictions by Borough 1/2017- 7/2018',
    xaxis= {'title': 'Boroughs'},
    yaxis= {'title': 'Number of Evictions'}
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Here we can see that the number of evictions are highest in the Bronx (11,371 evictions) and Brooklyn (9,077 evictions). While it's nice to see the totals, we can gain a little more insight by including the estiamated population and calculating people per residential evictions by burough.

In [None]:
bor = ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens']
nycPop = ['1,471,160', '2,648,771', '1,664,727', '479,458', '2,358,582']

pd.DataFrame(index=bor, data={'Population':nycPop})

In [None]:
trace1 = go.Bar(
    x= ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens'],
    y= [1471160/evicted[0], 2648771/evicted[1], 1664727/evicted[2], 479458/evicted[4], 2358582/evicted[3]],
    text= ['Population: 1,471,160', 'Population: 2,648,771', 'Population: 1,664,727', 'Population: 479,458', 'Population: 2,358,582', ],
    name = '% ENI Mean')

data = [trace1]

layout = go.Layout(
    title= 'Number of People per Residential Eviction by Borough 1/2017- 7/2018',
    xaxis= {'title': 'Boroughs'},
    yaxis= {'title': 'Number of People per Eviction'}
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

From the [Current Estimates of New York City's Population for July 2017](https://www1.nyc.gov/site/planning/data-maps/nyc-population/current-future-populations.page), I was able to get the estimated population of each burough. By including population, the 'Number of Residential Evictions per Person by Borough 1/2017- 7/2018' graph shows that for **every 129 people in the Bronx, there is an eviction**. Significantly higher than the next highest borough, Brooklyn which has an eviction for every 291 people

<h2>Financial Empowerment Centers</h2>
Dataset of [Financial Empowerment Centers (FEC)](http://https://www.kaggle.com/new-york-city/nyc-financial-empowerment-centers) from the Department of Consumer Affairs, Office of Financial Empowerment. FECs help residents address their financial challenges, needs, and plan for their future by offering one on one financial counseling as a free public service.

In [None]:
finance.groupby('Borough')['Provider'].count()

In [None]:
trace1 = go.Bar(
    x= ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens'],
    y= [42.47/finProv[0], 69.5/finProv[1], 22.82/finProv[2], 58.69/finProv[4], 108.1/finProv[3]],
    textposition = 'auto',
    name = 'mi^2 per Center')

trace2 = go.Bar(
    x= ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens'],
    y= [147.1160/finProv[0], 264.8771/finProv[1], 166.4727/finProv[2], 47.9458/finProv[4], 235.8582/finProv[3]],
    textposition = 'auto',
    name = 'people(10k)/Center')

trace3 = go.Bar(
    x= ['Bronx','Brookyln','Manhattan', 'Staten Island', 'Queens'],
    y= [cityNumbers['ENI Mean'][2]*100, cityNumbers['ENI Mean'][0]*100, cityNumbers['ENI Mean'][3]*100, cityNumbers['ENI Mean'][4]*100, cityNumbers['ENI Mean'][1]*100],
    textposition = 'auto',
    name = 'Avg ENI')

data = [trace1,trace2,trace3]

layout = go.Layout(
    barmode= 'group',
    title= 'Financial Empowerment Centers',
    xaxis= {'title': 'Boroughs'}
    # ,yaxis= {'title': 'Number of People per Eviction'}
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In the graph above, it shows a couple things:
* Blue: The square mile for every 1 FEC
* Orange: The number of people (in 10 thousands) for every 1 FEC
* Green: The average ENI (based on the School Explorer dataset. Queens was not explicitly listed so it is missing)

The mean ENI within Brooklyn and the Bronx do not appear to be greatly affected by the frequency of FEC. There are significantly less FEC in Staten Island compared to Brooklyn and the Bronx but the ENI of Staten Island schools are still much lower.<br><br>
When focusing only on Brooklyn and the Bronx, the number of FEC per sq mi is almost the same and there are 73,671 more people per FEC in Brooklyn than the Bronx. Although there are more people per FEC, the ENI of Brooklyn schools is 0.11 lower than those in the Bronx.<br><br>
While I do believe financial education and guidance is important, I do not find a correlation between offering free financial counseling and a school's ENI in New York City.

<h2>NYC Emergency Response Incidents</h2>
From the [NY Emergency Response Incidents dataset](https://www.kaggle.com/new-york-city/ny-emergency-response-incidents) which contains incidents from 5/2011- 6/2018, I wanted to determine if Brooklyn or the Bronx had a high number of incidents which may tell us if emergency situations may be disadvantaging students in these boroughs more than other boroughs. 

In [None]:

bor = emerg.groupby(['Incident Category','Borough']).count().index.levels[1].tolist()
mlGroup = emerg.groupby(['Incident Category','Borough']).count()['Incident Type']
inCatList = emerg.groupby(['Incident Category','Borough']).count().index.levels[0].tolist()

trace0 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[0]][0], mlGroup.loc[inCatList[0]][1],mlGroup.loc[inCatList[0]][2], mlGroup.loc[inCatList[0]][3]],
    name = inCatList[0])
trace1 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[1]][0], mlGroup.loc[inCatList[1]][1]],
    name = inCatList[1])
trace2 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[2]][0], mlGroup.loc[inCatList[2]][1],mlGroup.loc[inCatList[2]][2], mlGroup.loc[inCatList[2]][3], mlGroup.loc[inCatList[2]][4]],
    name = inCatList[2])
trace3 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[3]][0], mlGroup.loc[inCatList[3]][1],mlGroup.loc[inCatList[3]][2], mlGroup.loc[inCatList[3]][3], mlGroup.loc[inCatList[3]][4]],
    name = inCatList[3])
trace4 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[4]][0], mlGroup.loc[inCatList[4]][1],mlGroup.loc[inCatList[4]][2], mlGroup.loc[inCatList[4]][3], mlGroup.loc[inCatList[4]][4]],
    name = inCatList[4])
trace5 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[5]][0], mlGroup.loc[inCatList[5]][1],mlGroup.loc[inCatList[5]][2], mlGroup.loc[inCatList[5]][3], mlGroup.loc[inCatList[5]][4]],
    name = inCatList[5])
trace6 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[6]][0], mlGroup.loc[inCatList[6]][1],mlGroup.loc[inCatList[6]][2], mlGroup.loc[inCatList[6]][3],mlGroup.loc[inCatList[6]][4]],
    name = inCatList[6])
trace7 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[7]][0], mlGroup.loc[inCatList[7]][1],mlGroup.loc[inCatList[7]][2], mlGroup.loc[inCatList[7]][3], mlGroup.loc[inCatList[7]][4]],
    name = inCatList[7])
trace8 = go.Bar(
    x= [bor[4]],
    y= [mlGroup.loc[inCatList[8]][0]],
    name = inCatList[8])
trace9 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[9]][0], mlGroup.loc[inCatList[9]][1],mlGroup.loc[inCatList[9]][2], mlGroup.loc[inCatList[9]][3], mlGroup.loc[inCatList[9]][4]],
    name = inCatList[9])
trace10 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[10]][0], mlGroup.loc[inCatList[10]][1],mlGroup.loc[inCatList[10]][2], mlGroup.loc[inCatList[10]][3], mlGroup.loc[inCatList[10]][4]],
    name = inCatList[10])
trace11 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[11]][0], mlGroup.loc[inCatList[11]][1],mlGroup.loc[inCatList[11]][2], mlGroup.loc[inCatList[11]][3], mlGroup.loc[inCatList[11]][4]],
    name = inCatList[11])
trace12 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[12]][0], mlGroup.loc[inCatList[12]][1],mlGroup.loc[inCatList[12]][2], mlGroup.loc[inCatList[12]][3], mlGroup.loc[inCatList[12]][4]],
    name = inCatList[12])
trace13 = go.Bar(
    x= [bor[0],bor[1],bor[2],bor[3],bor[4]],
    y= [mlGroup.loc[inCatList[13]][0], mlGroup.loc[inCatList[13]][1],mlGroup.loc[inCatList[13]][2], mlGroup.loc[inCatList[13]][3], mlGroup.loc[inCatList[13]][4]],
    name = inCatList[13])

data = [trace0,trace1,trace2,trace3,trace4,trace5,trace6,trace7,trace8,trace9,trace10,trace11,trace12,trace13]

layout = go.Layout(
    barmode='stack',
    title= 'Emergency Response Incident Types by Borough',
    xaxis= {'title': 'Borough'},
    yaxis= {'title': 'Number of Incidents'}
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

From the above graph, we can see that the Bronx has the second lowest amount of total incidents. This tells us that the number of incidents do not appear to be one of the contributing factors to SHSAT registration for Brooklyn and the Bronx.

<h2>English-Language Learners (ELL)</h2>
Having to learn a second-language as well as keep up with studies in this second-language is quite a difficult feat. We'll be looking at the School Explorer again to see if there is a correlation of high ELL in Brooklyn and the Bronx.

In [None]:
data = [
    {
        'x': df1['Percent Black / Hispanic'],
        'y': df1['Percent ELL'],
        'mode': 'markers',
        'text': df1['blackhispanicString'] + '% Black/Hispanic, ' + df1['ellString'] + '% ELL',
        'marker': {
            'colorscale': 'Portland'
            }
        }
]

layout= go.Layout(
    title= 'Percentage of Black/Hispanic Students Vs Percentage of ELL students',
    xaxis= {
        'title' : 'Percentage of Black/Hispanic Students'
    },
    yaxis={
        'title': 'Percentage of ELL students'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

In [None]:
print('Number of schools with over 80% Black/Hispanic\n\tStudents and have ELL students: ' + str(df1[(df1['Percent Black / Hispanic'] > 80) & (df1['Percent ELL'] >= 20)].shape[0]))
print('Total number of schools: ' + str(df1.shape[0]))
print('Percentage of schools with over 80% Black/Hispanic\n\tStudents and have ELL students: ' + str(round((df1[(df1['Percent Black / Hispanic'] > 80) & (df1['Percent ELL'] >= 20)].shape[0]/df1.shape[0])*100, 2)) + '%')

When plotting percentage of Black/Hispanic students and percent of ELL students against each other, we can see that there is no apparent correlation. However, we can see that 13% of NYC schools have over 80% Black/Hispanic students and over 20% English-language learners.<br><br>
It would be great to see where these schools lie geographically

In [None]:
ellBH = df1.copy()
ellBH.drop(ellBH[(ellBH['Percent Black / Hispanic'] < 80) | (ellBH['Percent ELL'] < 20)].index, inplace=True)

data = [
    {
        'x': ellBH['Longitude'],
        'y': ellBH['Latitude'],
        'mode': 'markers',
        'text': ellBH['Borough'] + ', ' + ellBH['ellString'] + '% ELL' + ', ' + ellBH['blackhispanicString'] + '% Black/Hispanic',
        'marker': {
            'size': ellBH['Percent Black / Hispanic']/4,
            'color': ellBH['Percent ELL'],
            'showscale': True,
            'colorscale': 'Portland',
            'colorbar': {
                'title': '% ELL',
                'ticks': 'outside'
            }
        }
    }
]

layout= go.Layout(
    title= 'Schools with >80% Black/Hispanic Students Vs Percentage of ELL students',
    xaxis= {
        'title' : 'Longitude'
    },
    yaxis={
        'title': 'Latitude'
    }
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The Bronx immediately grabs my attention there are schools with over 80% Black/Hispanic students and over 20% ELL students in a cluster together bleeding into parts of Upper Manhattan.

<h1>Wrapping Things Up</h1>
So far, these datasets have told us a couple of things:
* Brooklyn and the Bronx have low school incomes and high economic need
* Brooklyn has a large population of Black students
* The Bronx has a large population of Hispanic students
* Eviction rates from 1/2017- 7/2018 is the highest in the Bronx
* Frequency of Financial Empowerment Centers do not have a correlation with economic need
* Frequency and types of Emergency Reponse Incidents do not have a correlation with economic need
* There is no correlation between % of ELL students and % of Black/Hispanic students in schools
* There is however a trend for schools with over 80% Black/Hispanic student population and over 20% ELL student population residing in the Bronx and Upper Manhattan

<h1>Conclusion</h1>
According to [Spectrum New NY1](http://www.ny1.com/nyc/all-boroughs/politics/2018/06/03/bill-de-blasio-announces-push-to-scrap-specialized-high-school-admissions-test.html), New York City, Mayor Bill de Blasio "announced plans aimed to change the admissions policies at the city's specialized high schools" on 6/3/2018. While this may scrap PASSNYC's goal of increasing diversity in SHSAT registration, it will not resolve underlying issues that came up in the first place. By targeting the following groups, SHS may very well see an increase in diversity after all.<br>As a starting point, families of students in these schools in the Bronx can benefit from programs supporting ELL as well as rent or housing assistance:

In [None]:
df1[(df1['Percent Black / Hispanic'] > 80) & (df1['Percent ELL'] > 30) & (df1['Borough'] == 'BRONX')][['Borough','School Name', 'Economic Need Index', 'Percent ELL', 'Percent Black', 'Percent Hispanic']].sort_values('Economic Need Index', ascending=False)