# About PASSNYC

In [None]:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
img=mpimg.imread('../input/passnycmodified/image.jpg')
plt.figure(figsize=[14,8])
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

> **PASSNYC is a not-for-profit, volunteer organization dedicated to broadening educational opportunities for New York City's talented underserved students. PASSNYC aims to identify talented underserved students within New York City’s underperforming school districts in order to increase the diversity of students taking the Specialized High School Admissions Test. **
> 
> **In this kernel, I have tried to explore the ins and outs of the data provided to us in addition to the secondary source of datasets to serve the organization in a better way.**

## Content :

> * [Know Thy Data](#first-bullet)
>  * [District Codes](#first-bullet)
> * [Map of NewYork](#first-bullet)
> * [Take a peek at the neighbourhood](#first-bullet)
>  * [Schools in the neighbourhood](#first-bullet)
> * [The Ethnic Factor](#first-bullet)
>  * [Distribution of races (Count vs Percentage)](#first-bullet)
>  * [Distribution of races in all neighbourhoods](#first-bullet)
> * [Developing a Financial Perspective](#first-bullet)
> * [Bird's eye view of race distribution](#first-bullet)
> * [Attendance Record](#djas)
> * [A glimpse of some other features](#first-bullet)
>  * [Rigorous Instruction](#first-bullet)
>  * [Collaborative Teachers](#first-bullet)
>  * [Supportive Environment](#first-bullet)
>  * [Family-Community Ties](#first-bullet)
>  * [School Leadership](#first-bullet)
>  * [Trust](#first-bullet)
>  * [Correlation Plot of Features](#ha)
> * [Are you smarter than a fifth grader](#first-bullet)
> * [Math Proficiency](#first-bullet)
>  * [Math Grade trend](#hhk)
> * [ELA Proficiency](#first-bullet)
>  * [ELA Grade Trend](#jh)
> * [SHSAT dataset](#ajs)
>  * [Number of students who register and/or take the SHSAT exam : Trend](#mj)
>  * [SHSAT school wise trend of students](#jdhf)
>  
> 

# [1. Know Thy Data ....](#first-bullet)

In [None]:
img=mpimg.imread('../input/otherimages/data.jpg')
plt.figure(figsize=[12,10])
#fig.patch.set_facecolor('lightgray')
#fig.patch.set_alpha(0.3)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)  
schools = pd.read_csv('../input/data-science-for-good/2016 School Explorer.csv')
schools.head(3)




### [1.1 District Codes](#first-bullet)

In [None]:
schools['District'].unique()

## [2. Map of New York](#first-bullet)

In [None]:
import matplotlib.pyplot as plt
img=mpimg.imread('../input/mapspassnyc/map_NYC.jpg')
img1=mpimg.imread('../input/mapspassnyc/boroughs.jpg')
fig=plt.figure(figsize=[15,10])
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)
plt.subplot(1,2,1)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.subplot(1,2,2)
imgplot=plt.imshow(img1)
plt.axis('Off')
plt.show()

> * **The map below illustrates geographical districts and regions: Regions are designated in BOLD Numbers 1-10. Districts are designated 1-32.**
> * New York City encompasses five county-level  administrative divisions called boroughs: **Manhattan, Brooklyn, Queens, The Bronx, and Staten Island.**

In [None]:
import  plotly
plotly.tools.set_credentials_file(username='RishiHazra', api_key='3WYShX1Rc0UlKTzCVggk')
from collections import Counter,OrderedDict
import plotly.offline as py
from plotly import tools
from plotly.graph_objs import *
py.init_notebook_mode(connected=True)

## [3. Take a Peek at the Neighbourhood ....](#first-bullet)

In [None]:
img=mpimg.imread('../input/otherimages/peek.jpg')
plt.figure(figsize=[15,10])
#fig.patch.set_facecolor('lightgray')
#fig.patch.set_alpha(0.3)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
a=schools['Address (Full)'].replace({'NEW YORK':'NewYork','CAMBRIA HEIGHTS':'CambriaHeights','SPRINGFIELD GARDENS':'SpringfieldGardens','REGO PARK':'RegoPark','FOREST HILLS':'ForestHills','ROCKAWAY PARK':'ROCKAWAY','HOWARD BEACH':'HowardBeach','QUEENS VILLAGE':'QueensVillage','COLLEGE POINT':'CollegePoint','RICHMOND HILL':'RichmondHill','FLORAL PARK':'FloralPark','OZONE PARK':'OzonePark','LITTLE NECK':'LittleNeck','LONG ISLAND CITY':'LongIslandCity','MIDDLE VILLAGE':'MiddleVillage','ROOSEVELT ISLAND':'RooseveltIsland','STATEN ISLAND':'StatenIsland','JACKSON HEIGHTS':'JacksonHeights','GREENWICH VILLAGE':'GreenwichVillage','GREAT NECK':'GreatNeck','BROAD CHANNEL':'BroadChannel','BRIGHTON BEACH':'BrightonBeach','MANHATTAN BEACH':'ManhattanBeach','ROCKAWAY BEACH':'RockawayBeach','GRAMERCY PARK':'GramercyPark','PABLEO POINT':'PabloPoint','CARROLL GARDENS':'CarrollGardens','KEW GARDENS':'KewGardens'},regex=True)
division=[None]*len(a)
for i in range(len(a)):
    division[i]=str(a[i]).split(",",2)[0].split()[-1]
schools['neighbourhood']=division

### [3.1 Schools in different neighbourhoods](#first-bullet)

In [None]:
schools[['School Name','neighbourhood']].groupby(['neighbourhood'],as_index=False).agg(lambda x: x.value_counts().index[0]).style.set_properties(**{'background-color': 'black',
                           'color': 'lawngreen','border-color': 'white'})

## [4. The Ethnic Factor...](#first-bullet)

In [None]:
img=mpimg.imread('../input/otherimages/ethnic.jpeg')
plt.figure(figsize=[15,10])
#fig.patch.set_facecolor('lightgray')
#fig.patch.set_alpha(0.3)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

### [4.1 Distribution of races (Count vs Percentage)](#first-bullet)

In [None]:
trace0 =Histogram(
    x=schools['Percent Black'],
    name='Percent Black',
   )
trace1 = Histogram(
    x=schools['Percent Hispanic'],
    name='Percent Hispanic',
    )
trace2 = Histogram(
    x=schools['Percent Asian'],
    name='Percent Asian',
    )
trace3 = Histogram(
    x=schools['Percent White'],
    name='Percent White',
    )

data = [trace0, trace1,trace2, trace3]
layout =Layout(barmode='stack')
fig =Figure(data=data, layout=layout)

py.iplot(fig, filename='stacked histogram')

> * **For lower percentage, the count of White & Asian is more, compared to the higher percentage where Black & Hispanic take over.**

### [4.2  Distribution of races in different neighbourhoods](#first-bullet)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
schools['Percent Black']=schools['Percent Black'].replace({'\%':''},regex=True).astype(float)
schools['Percent White']=schools['Percent White'].replace({'\%':''},regex=True).astype(float)
schools['Percent Asian']=schools['Percent Asian'].replace({'\%':''},regex=True).astype(float)
schools['Percent Hispanic']=schools['Percent Hispanic'].replace({'\%':''},regex=True).astype(float)
grouped= schools[['neighbourhood','Percent Black','Percent White','Percent Asian','Percent Hispanic']].groupby(['neighbourhood'],as_index=False)
z=grouped.agg(np.mean)



plt.figure(figsize=(16,24))
for i in range(len(z)-1):
    plt.subplot(10, 4, i+1)
    sns.set(rc={'figure.facecolor':'white'})
    a=[z.loc[i]['Percent Black'],z.loc[i]['Percent White'],z.loc[i]['Percent Asian'],z.loc[i]['Percent Hispanic']]
    ax = sns.barplot(['Bl','Wh','Asn','His'],a,alpha=.7, palette='cool')
    ax.grid(False)
#     plt.yticks(np.arange(0,100,10))
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    plt.subplots_adjust(wspace =.8,hspace = 1)
    plt.title(z.loc[i]['neighbourhood'], color="black")

## [5. Developing a financial perspective ....](#first-bullet)

In [None]:
img=mpimg.imread('../input/otherimages/cost.jpg')
plt.figure(figsize=[15,10])
#fig.patch.set_facecolor('lightgray')
#fig.patch.set_alpha(0.3)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
schools['School Income Estimate']=schools['School Income Estimate'].replace({'\$':'', ',':''},regex=True).astype(float)

trace1 = Histogram(x=schools['Economic Need Index'], marker=dict(color='#e993f9'))
trace2 = Histogram(x=schools['School Income Estimate'],  marker=dict(color='#fcc45f'))
fig = tools.make_subplots(rows=1, cols=2, print_grid=False, subplot_titles = ["<b>Economic Need Index</b>", "<b>School Income Estimate</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);

fig['layout'].update(height=400, showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ));

py.iplot(fig, filename='3')

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2)
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)

schools.plot(kind="scatter", x="Longitude", y="Latitude",
    s=schools['Economic Need Index']*10,title='Economic Need Index',ax=axes[0],figsize=(15,7),color='y')

def plot_percent(feature1,feature2):
    a=schools[schools[feature1]<0.2][feature2].mean()
    b=schools[schools[feature1].between(0.2,0.4,inclusive=True)][feature2].mean()
    c=schools[schools[feature1].between(0.4,0.6,inclusive=False)][feature2].mean()
    d=schools[schools[feature1].between(0.6,0.8,inclusive=True)][feature2].mean()
    e=schools[schools[feature1]>0.8][feature2].mean()
    return [a,b,c,d,e]

z_black=plot_percent('Economic Need Index', 'Percent Black')
z_white=plot_percent('Economic Need Index', 'Percent White')
z_hispanic=plot_percent('Economic Need Index', 'Percent Hispanic')
z_asian=plot_percent('Economic Need Index', 'Percent Asian')
z2=['0-0.2','0.2-0.4','0.4-0.6','0.6-0.8','0.8-1']

df=pd.DataFrame({'Percent Black':z_black,'Percent White':z_white, 'Percent Hispanic':z_hispanic, 'Percent Asian':z_asian},index=z2)
df.plot.bar(ax=axes[1],figsize=(15,7));

plt.show()

> * **Most schoools have a higher Economic Need Index and a lower Income Estimate.**
> 
> * **Economic Need Index : (%temp housing) + (% HRA eligible *0.5) + (% free lunch eligible *0.5). The higher the index, the higher the need ;)**
> 
> * **Hispanic and Black** have a higher Economic Need Index (more needs).

In [None]:
import numpy as np
import seaborn as sns
#schools['neighbourhood']
grouped= schools[['neighbourhood','Economic Need Index']].groupby(['neighbourhood'],as_index=False)
z=grouped.agg(np.mean).sort_values('Economic Need Index',ascending=False)

grouped1= schools[['neighbourhood','School Income Estimate']].groupby(['neighbourhood'],as_index=False)
z1=grouped1.agg(np.mean).sort_values('School Income Estimate',ascending=True)

grouped= schools[['District','Economic Need Index']].groupby(['District'],as_index=False)
x=grouped.agg(np.mean).sort_values('Economic Need Index',ascending=False)

grouped1= schools[['District','School Income Estimate']].groupby(['District'],as_index=False)
x1=grouped1.agg(np.mean).sort_values('School Income Estimate',ascending=False)

axis_font = {'fontname':'Arial', 'size':'14'}
plt.figure(figsize=(20,15))
plt.subplot(1,2,1)
#plt.vlines(z['Economic Need Index'], 0, z['neighbourhood'], linestyle="dashed")
ax=plt.plot(z['Economic Need Index'],z['neighbourhood'],marker='o',label='Economic Need Index',linewidth=2, markersize=10)
#ax.xaxis.label.set_size(15)
plt.hlines(z['neighbourhood'], 0, z['Economic Need Index'], linestyle="dashed")
plt.legend(loc='best')

plt.subplot(1,2,2)
#plt.vlines(z1['School Income Estimate'], 0, z1['neighbourhood'], linestyle="dashed")
plt.hlines(z1['neighbourhood'], 0, z1['School Income Estimate'], linestyle="dashed")
ax1=plt.plot(z1['School Income Estimate'],z1['neighbourhood'],color='r', marker='o',label='School Income Estimate',linewidth=2, markersize=10)
plt.legend(loc='best')
plt.show()

> * **As is evident from the plots, the neighbourhoods with Hispanic & Black 
>  as the majority have a higher Economic Need Index and vice versa.**
> 
> 

## [6. Bird's eye view of race distribution ....](#first-bullet)

In [None]:
img=mpimg.imread('../input/birdseye/eye.jpg')
plt.figure(figsize=[20,8])
#fig.patch.set_facecolor('lightgray')
#fig.patch.set_alpha(0.3)
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
ged=pd.read_csv('../input/ny-ged-plus-locations/ged-plus-locations.csv')
ged.head(3)

The GED, General Educational Diploma, is for those without a High School Diploma. 
Study and take a battery of tests to certify your aptitude, knowledge and skills.
 It is designed for those that never finished high school. Find a local test center near you. 
The GED, which stands for General Educational Development but is also referred to as a General Education Diploma, is a set of tests that when passed certify the test taker (American or Canadian) has met high-school level academic skills.

> * The GED Tests include five subject area tests: Language Arts/Writing, Language Arts/Reading, Social Studies, Science, and Mathematics. 

> *  In addition to English, the GED tests are available in Spanish, French, large print, audiocassette and Braille.

> * The GED credential  itself is issued by the state, province or territory in which the test taker lives. 

> * Many government institutions and universities  regard the GED as the same as a high school diploma with respect to program eligibility and as a prerequisite for admissions.
> 

In [None]:
z=ged[['Program Site name','Latitude','Longitude','Borough']].dropna()

import branca.colormap as cm
import folium
from folium import plugins

step = cm.StepColormap(
    ['blue','aqua','yellow','red'],
    vmin=0., vmax=100.,
    index=[0, 25, 50, 75],
    caption='step'
)
    
step

> * ** The colormap indicates the increasing percentage **
> * **The white polygons indicate presence of GED centres.**

### [6.1 Percent White](#first-bullet)

In [None]:
m = folium.Map([schools['Latitude'][0], schools['Longitude'][0]], zoom_start=9.5,tiles='cartodbdark_matter')

i=0
for lat, lon in zip(schools['Latitude'], schools['Longitude']):
    folium.CircleMarker([lat, lon], color=step(schools['Percent White'][i]), fill=True, radius=0.9).add_to(m)
    i+=1
i=0
for lat, lon in zip(z['Latitude'], z['Longitude']):
    folium.RegularPolygonMarker([lat, lon],color='white',radius=4).add_to(m)
    if i==41 or i==53:
        i+=1
    i+=1

m

### [6.2 Percent Black](#first-bullet)

In [None]:
m = folium.Map([schools['Latitude'][0], schools['Longitude'][0]], zoom_start=9.5,tiles='cartodbdark_matter')

i=0
for lat, lon in zip(schools['Latitude'], schools['Longitude']):
    folium.CircleMarker([lat, lon], color=step(schools['Percent Black'][i]), fill=True, radius=0.9).add_to(m)
    i+=1
i=0
for lat, lon in zip(z['Latitude'], z['Longitude']):
    folium.RegularPolygonMarker([lat, lon],color='white',radius=4).add_to(m)
    if i==41 or i==53:
        i+=1
    i+=1

m

### [6.3 Percent Hispanic](#first-bullet)

In [None]:
m = folium.Map([schools['Latitude'][0], schools['Longitude'][0]], zoom_start=9.5,tiles='cartodbdark_matter')

i=0
for lat, lon in zip(schools['Latitude'], schools['Longitude']):
    folium.CircleMarker([lat, lon], color=step(schools['Percent Hispanic'][i]), fill=True, radius=0.9).add_to(m)
    i+=1
i=0
for lat, lon in zip(z['Latitude'], z['Longitude']):
    folium.RegularPolygonMarker([lat, lon],color='white',radius=4).add_to(m)
    if i==41 or i==53:
        i+=1
    i+=1

m

### [6.4 Percent Asian](#first-bullet)

In [None]:
m = folium.Map([schools['Latitude'][0], schools['Longitude'][0]], zoom_start=9.5,tiles='cartodbdark_matter')

i=0
for lat, lon in zip(schools['Latitude'], schools['Longitude']):
    folium.CircleMarker([lat, lon], color=step(schools['Percent Asian'][i]), fill=True, radius=0.9).add_to(m)
    i+=1
i=0
for lat, lon in zip(z['Latitude'], z['Longitude']):
    folium.RegularPolygonMarker([lat, lon],color='white',radius=4).add_to(m)
    if i==41 or i==53:
        i+=1
    i+=1

m

> * **Most Hispanic students are from schools in Bronx and Brooklyn.**
> * **Most Black students are from schools in Brooklyn and Queens.**

## [7.Attendance Record !](#first-bullet)

In [None]:
img=mpimg.imread('../input/otherimages/attendance.jpg')
fig.patch.set_facecolor('white')
plt.figure(figsize=[14,8])
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
schools['Student Attendance Rate']=schools['Student Attendance Rate'].replace({'\%':''},regex=True).astype(float)
schools['Percent of Students Chronically Absent']=schools['Percent of Students Chronically Absent'].replace({'\%':''},regex=True).astype(float)

trace1=  Histogram(x=schools['Student Attendance Rate'],  marker=dict(color='tomato'))
trace2=  Histogram(x=schools['Percent of Students Chronically Absent'],  marker=dict(color='lightseagreen'))


fig = tools.make_subplots(rows=1, cols=2, print_grid=False, subplot_titles = ["<b>Student Attendance Rate</b>", "<b>Percent of Students Chronically Absent</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);

fig['layout'].update(height=400,showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=False
    ));

py.iplot(fig, filename='multiple-subplots3')

In [None]:
import warnings
warnings.filterwarnings('ignore')

def plot_percent(feature1, feature2):
    a=schools[schools[feature1]<20][feature2].mean()
    b=schools[schools[feature1].between(20,40,inclusive=True)][feature2].mean()
    c=schools[schools[feature1].between(41,60,inclusive=True)][feature2].mean()
    d=schools[schools[feature1].between(61,80,inclusive=True)][feature2].mean()
    e=schools[schools[feature1]>80][feature2].mean()
    return [a,b,c,d,e]

fig, axes = plt.subplots(nrows=1, ncols=2)
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)
schools.plot(kind="scatter", x="Longitude", y="Latitude",
    s=schools['Percent of Students Chronically Absent'],title='Percent of Students Chronically Absent',ax=axes[0],color='y',figsize=(15,7),alpha=0.5)


z_black=plot_percent('Percent of Students Chronically Absent', 'Percent Black')
z_white=plot_percent('Percent of Students Chronically Absent', 'Percent White')
z_hispanic=plot_percent('Percent of Students Chronically Absent', 'Percent Hispanic')
z_asian=plot_percent('Percent of Students Chronically Absent', 'Percent Asian')
z2=['0-20','20-40','40-60','60-80','80-100']

df=pd.DataFrame({'Percent Black':z_black,'Percent White':z_white, 'Percent Hispanic':z_hispanic, 'Percent Asian':z_asian},index=z2)
df.plot.bar(ax=axes[1],figsize=(15,7));

plt.show()

In [None]:
df=pd.read_csv('../input/schproma/schma19962016.csv')

trace1 = Histogram(x=df['ATTPCTPRK'], marker=dict(color='#e993f9'))
trace2 = Histogram(x=df['ATTPCTKID'])
trace3 = Histogram(x=df['ATTPCTG01'])
trace4 = Histogram(x=df['ATTPCTG02'])
trace5 = Histogram(x=df['ATTPCTG03'])
trace6 = Histogram(x=df['ATTPCTG04'])
trace7 = Histogram(x=df['ATTPCTG05'])
trace8 = Histogram(x=df['ATTPCTG06'])
trace9 = Histogram(x=df['ATTPCTG07'])
trace10 = Histogram(x=df['ATTPCTG08'])
trace11 = Histogram(x=df['ATTPCTG09'])
trace12 = Histogram(x=df['ATTPCTG10'])
trace13 = Histogram(x=df['ATTPCTG11'])
trace14 = Histogram(x=df['ATTPCTG12'])
trace15 = Histogram(x=df['ATTPCTTOT'])

fig = tools.make_subplots(rows=5, cols=3, print_grid=False, subplot_titles = ["<b>PRE KINDERGARTEN</b>", 
                                                                              "<b>KINDERGARTEN</b>",
                                                                              "<b>GRADE 1</b>", 
                                                                              "<b>GRADE 2</b>",
                                                                              "<b>GRADE 3</b>", 
                                                                              "<b>GRADE 4</b>",
                                                                              "<b>GRADE 5</b>", 
                                                                              "<b>GRADE 6</b>",
                                                                              "<b>GRADE 7</b>", 
                                                                              "<b>GRADE 8</b>",
                                                                              "<b>GRADE 9</b>", 
                                                                              "<b>GRADE 10</b>",
                                                                              "<b>GRADE 11</b>", 
                                                                              "<b>GRADE 12</b>",
                                                                              "<b>Total</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);
fig.append_trace(trace2, 1, 3);
fig.append_trace(trace3, 2, 1);
fig.append_trace(trace4, 2, 2);
fig.append_trace(trace4, 2, 3);
fig.append_trace(trace5, 3, 1);
fig.append_trace(trace6, 3, 2);
fig.append_trace(trace6, 3, 3);
fig.append_trace(trace5, 4, 1);
fig.append_trace(trace6, 4, 2);
fig.append_trace(trace6, 4, 3);
fig.append_trace(trace5, 5, 1);
fig.append_trace(trace6, 5, 2);
fig.append_trace(trace6, 5, 3);





In [None]:
fig['layout'].update(height=1000, showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ),yaxis3=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis4=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
),yaxis5=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis6=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis7=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis8=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis9=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis10=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis11=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis12=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis13=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis14=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)
                     , yaxis15=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True)                   
                    );

py.iplot(fig, filename='multiple-subplots10') 

> * The average attendance are as follows:
> * **PRE KINDERGARTEN: 87.43 ; KINDERGARTEN: 90.18 ; GRADE 1: 91.90 ; GRADE 2: 92.78 ; GRADE 3: 93.26 ; GRADE 4: 93.52 ; GRADE 5: 93.43 ; GRADE 6: 89.15 ; GRADE 7: 91.04 ; GRADE 8: 89.97 ; GRADE 9: 81.64 ; GRADE 10: 83.43 ; GRADE 11: 85.95 ; GRADE 12: 84.49   **   

> * The attendance increases till students are in GRADE 5 after which it steadily falls for lower grades.   

> * Most  students are chronically absent from Bronx and Brooklyn areas (where the Black and Hispanic percentage is higher).  

In [None]:
import pandas as pd
ny_attend=pd.read_csv('../input/ny-school-attendance-and-enrollment/2010-2011-school-attendance-and-enrollment-statistics-by-district.csv')

X=ny_attend[['District','YTD % Attendance (Avg)']]
X['District']=X['District'].apply(lambda x : x.replace('DISTRICT','Dist'))[:32]

trace1 = Scatter(x=X['District'], y=X['YTD % Attendance (Avg)'], mode='lines+markers')
data = [trace1]
layout=dict(title='District wise average attendance record')
fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter-basic_pie10')

* **District 26 has the hihgest attendance percentage. **
* **District 16 has the lowest.**

## [8. A glimpse at some other features...](#first-bullet)

In [None]:
schools['Rigorous Instruction %']=schools['Rigorous Instruction %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Rigorous Instruction Rating'].notnull()]
trace1=  Histogram(x=schools['Rigorous Instruction %'],  marker=dict(color='tomato'))
trace2=  Bar(x=list(Counter(schools['Rigorous Instruction Rating']).keys()), y=list(Counter(schools['Rigorous Instruction Rating']).values()), marker=dict(color='lightseagreen'), width=0.7)

schools['Collaborative Teachers %']=schools['Collaborative Teachers %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Collaborative Teachers Rating'].notnull()]
trace3=  Histogram(x=schools['Collaborative Teachers %'],  marker=dict(color='#fcc45f' ))
trace4=  Bar(x=list(Counter(schools['Collaborative Teachers Rating']).keys()), y=list(Counter(schools['Collaborative Teachers Rating']).values()), marker=dict(color="#e993f9"), width=0.7)

schools['Supportive Environment %']=schools['Supportive Environment %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Supportive Environment Rating'].notnull()]
trace5=  Histogram(x=schools['Supportive Environment %'],  marker=dict(color='lightgreen' ))
trace6=  Bar(x=list(Counter(schools['Supportive Environment Rating']).keys()), y=list(Counter(schools['Supportive Environment Rating']).values()), marker=dict(color="yellow"), width=0.7)


fig = tools.make_subplots(rows=3, cols=2, print_grid=False, subplot_titles = ["<b>Rigorous Instruction %</b>", "<b>Rigorous Instruction Rating</b>",
                                                                              "<b>Collaborative Teachers %</b>", "<b>Collaborative Teachers Rating</b>",
                                                                             "<b>Supportive Environment %</b>", "<b>Supportive Environment Rating</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);
fig.append_trace(trace3, 2, 1);
fig.append_trace(trace4, 2, 2);
fig.append_trace(trace5, 3, 1);
fig.append_trace(trace6, 3, 2);

fig['layout'].update(height=900, showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis3=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis4=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis5=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis6=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ));

py.iplot(fig, filename='jupyter-basic_pie1')

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2)
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)
schools.plot(kind="scatter", x="Longitude", y="Latitude",
    s=schools['Supportive Environment %'],title='Supportive Environment %',ax=axes[0],figsize=(15,7),color='lightseagreen',alpha=0.5)


z_black=plot_percent('Supportive Environment %', 'Percent Black')
z_white=plot_percent('Supportive Environment %', 'Percent White')
z_hispanic=plot_percent('Supportive Environment %', 'Percent Hispanic')
z_asian=plot_percent('Supportive Environment %', 'Percent Asian')
z2=['0-20','20-40','40-60','60-80','80-100']

df=pd.DataFrame({'Percent Black':z_black,'Percent White':z_white, 'Percent Hispanic':z_hispanic, 'Percent Asian':z_asian},index=z2)
df.plot.bar(ax=axes[1],figsize=(15,7));

plt.show()

In [None]:
schools['Strong Family-Community Ties %']=schools['Strong Family-Community Ties %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Strong Family-Community Ties Rating'].notnull()]
trace1=  Histogram(x=schools['Strong Family-Community Ties %'],  marker=dict(color='lightseagreen' ))
trace2=  Bar(x=list(Counter(schools['Strong Family-Community Ties Rating']).keys()), y=list(Counter(schools['Strong Family-Community Ties Rating']).values()), marker=dict(color="coral"), width=0.7)

fig = tools.make_subplots(rows=1, cols=2, print_grid=False, subplot_titles = ["<b>Strong Family-Community Ties %</b>", "<b>Strong Family-Community Ties Rating</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);

fig['layout'].update(height=300, showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ));

py.iplot(fig, filename='jupyter-basic_pie5')

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2)
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)

schools.plot(kind="scatter", x="Longitude", y="Latitude",
    s=schools['Strong Family-Community Ties %'],title='Strong Family-Community Ties %',ax=axes[0],figsize=(15,7),alpha=0.5)


z_black=plot_percent('Strong Family-Community Ties %', 'Percent Black')
z_white=plot_percent('Strong Family-Community Ties %', 'Percent White')
z_hispanic=plot_percent('Strong Family-Community Ties %', 'Percent Hispanic')
z_asian=plot_percent('Strong Family-Community Ties %', 'Percent Asian')
z2=['0-20','20-40','40-60','60-80','80-100']

df=pd.DataFrame({'Percent Black':z_black,'Percent White':z_white, 'Percent Hispanic':z_hispanic, 'Percent Asian':z_asian},index=z2)
df.plot.bar(ax=axes[1],figsize=(15,7));

plt.show()

In [None]:
schools['Effective School Leadership %']=schools['Effective School Leadership %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Effective School Leadership Rating'].notnull()]
trace1=  Histogram(x=schools['Effective School Leadership %'],  marker=dict(color='lightseagreen' ))
trace2=  Bar(x=list(Counter(schools['Effective School Leadership Rating']).keys()), y=list(Counter(schools['Effective School Leadership Rating']).values()), marker=dict(color="coral"), width=0.7)

schools['Trust %']=schools['Trust %'].replace({'\%':''},regex=True).astype(float)

schools[schools['Trust Rating'].notnull()]
trace3=  Histogram(x=schools['Trust %'],  marker=dict(color='lightgreen' ))
trace4=  Bar(x=list(Counter(schools['Trust Rating']).keys()), y=list(Counter(schools['Trust Rating']).values()), marker=dict(color="yellow"), width=0.7)

fig = tools.make_subplots(rows=2, cols=2, print_grid=False, subplot_titles = ["<b>Effective School Leadership %</b>", "<b>Effective School Leadership Rating</b>",
                                                                             "<b>Trust %</b>", "<b>Trust Rating</b>"])
fig.append_trace(trace1, 1, 1);
fig.append_trace(trace2, 1, 2);
fig.append_trace(trace3, 2, 1);
fig.append_trace(trace4, 2, 2);

fig['layout'].update(height=650, showlegend=False, yaxis1=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis2=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis3=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ), yaxis4=dict(
        autorange=True,
        showgrid=False,
        zeroline=False,
        showline=False,
        autotick=True,
        ticks='',
        showticklabels=True
    ));

py.iplot(fig, filename='jupyter-basic_pie4')

### [8.1 Correlation Plot of the features](#gf)

In [None]:
from string import ascii_letters
import numpy as np

sns.set(style="white")
#,'Economic Need Index',
#      'School Income Estimate','Rigorous Instruction %','Collaborative Teachers %','Collaborative Teachers %',
#      'Effective School Leadership %','Strong Family-Community Ties %','Trust %']]

d = schools[['Percent Asian','Percent Black','Percent Hispanic','Percent White','Rigorous Instruction %','Collaborative Teachers %',
      'Effective School Leadership %','Strong Family-Community Ties %','Trust %','Economic Need Index','School Income Estimate']]


# Compute the correlation matrix
corr = d.corr()

# Generate a mask for the upper triangle
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

# Set up the matplotlib figure

f = plt.figure(figsize=(30,20))

ax0 = f.add_subplot(121)
# Generate a custom diverging colormap
cmap=sns.cubehelix_palette()
ax0.xaxis.tick_top()
ax0.set_xticklabels(ax0.get_xticklabels(), rotation=90)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, center=0,annot=True,
            square=True, linewidths=.5, cbar_kws={"shrink": .7})


ax1 = f.add_subplot(122)
df1=schools[['Rigorous Instruction %','Collaborative Teachers %',
      'Effective School Leadership %','Strong Family-Community Ties %','Trust %','Economic Need Index','School Income Estimate','Student Attendance Rate','Average ELA Proficiency','Average Math Proficiency']].head()
corrmat = df1.corr()
mask = np.zeros_like(corrmat, dtype=np.bool)
mask[np.tril_indices_from(mask)] = True

sns.heatmap(corrmat, square=True, linewidths=.5, annot=True,mask=mask, center=0,cbar_kws={"shrink": .7})
ax1.xaxis.tick_top()
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=90)

plt.show()

> * **Seems like the "Economic Need Index" is negatively correlated with "School Income Estimate". Therefore, we can assume that higher the need index, more is the need and lower is the School Income Estimate.**
> 
> * **Economic Need Index : (%temp housing) + (% HRA eligible *0.5) + (% free lunch eligible *0.5). Thus we have established our earlier assumption higher the index, the higher the need.**
> 
> * **ELA and Math Proficiency is highly correlated with Rigorous Instruction % and School Income Estimate.**
> 
> * **An interesting point to be noted is, as the attendance falls, ELA and Math proficiency increases.**
> 

## [9. Are you smarter than a 5th Grader](#first-bullet)

In [None]:
img=mpimg.imread('../input/birdseye/question.jpg')
plt.figure(figsize=[20,9])
imgplot = plt.imshow(img)
plt.axis('Off')
plt.show()

In [None]:
grouped=schools.groupby(['Grade Low'],as_index=False)
z=grouped[['Percent Black','Percent White','Percent Asian','Percent Hispanic']].agg(np.mean)

grouped=schools.groupby(['Grade High'],as_index=False)
z1=grouped[['Percent Black','Percent White','Percent Asian','Percent Hispanic']].agg(np.mean)

z1.rename(columns={'Grade High':'Grade'}, inplace=True)
z.rename(columns={'Grade Low':'Grade'}, inplace=True)
data=[]
data.insert(0,{'Grade':'01','Percent Black':39, 'Percent White': 24, 'Percent Asian':4,'Percent Hispanic':31})
z1=pd.concat([pd.DataFrame(data), z1], ignore_index=True)
z1=z1.drop(9,axis=0)
data=[]
data.insert(0,{'Grade':'0K','Percent Black':28.500000, 'Percent White': 11.000000, 'Percent Asian':56.000000,'Percent Hispanic':1.500000})
z1=pd.concat([pd.DataFrame(data), z1], ignore_index=True)
data=[]
data.insert(0,{'Grade':'PK','Percent Black':29.365714, 'Percent White': 15.748571, 'Percent Asian':13.249524,'Percent Hispanic':39.529524})
z1=pd.concat([pd.DataFrame(data), z1], ignore_index=True)

In [None]:
import operator
t = schools['Grades'].value_counts()
x = list(t.index)
y = list(t.values)
r = {}
for i,val in enumerate(x):
    for each in val.split(","):
        x1 = each.strip()
        if x1 not in r:
            r[x1] = y[i]
        r[x1] += y[i]
x1=[]
y1=[]
for key,value in r.items():
    x1.append(key)
    y1.append(value)

x1.remove('K')
y1.remove(2)

In [None]:
fig=plt.figure(figsize=(16,10))
fig.patch.set_facecolor('lightgray')
fig.patch.set_alpha(0.3)
plt.subplot(3,5,1)
plt.barh(x1, y1, color="#8cf2a9")
for i in range(len(z1)):
    plt.subplot(3, 5, i+2)
    a=[z1.loc[i]['Percent Black'],z1.loc[i]['Percent White'],z1.loc[i]['Percent Asian'],z1.loc[i]['Percent Hispanic']]
    plt.pie(a,
            labels=['Bl','Wh','Asn','His'],
            autopct="%1.0f%%", 
            colors=["#9bf46b","#ff8ce0", "#4cdfef",'gold'],
            wedgeprops={"linewidth":1,"edgecolor":"white"})
    plt.title(z1.loc[i]['Grade'])
    centre_circle = plt.Circle((0,0),0.75,color='black', fc='white',linewidth=1.25)
    fig = plt.gcf()
    fig.gca().add_artist(centre_circle)

* **Asian & White people have a realtively lower percentage in all grades as compared to the Black & Hispanians.**
* **Most students, it seems, drop after 5th grade. As the Grades increase, the count of students decrease steadily.**

## [10. Math Proficiency](#first-bullet)


> -  **<font color=red>Ms. Stevenson: 3+3= ?</font>**
> -  **<font color=blue>Mary: Really? That's so easy...</font>**
> -  **<font color=red>Ms. Stevenson: 57 * 135 =?</font>**
> -  **<font color=blue> Mary: (*Wait for it.....*) 7695</font>**   
 @ Gifted

In [None]:
z=schools.groupby(['Average Math Proficiency'],as_index=False)[['Percent Black','Percent White','Percent Asian','Percent Hispanic']].agg(np.mean)
z['max']=z.max(axis=1)

color=['r','y','b','g']
label=['Percent Black','Percent White','Percent Asian','Percent Hispanic']

plt.figure(figsize=(15,8))
for i in range(len(z)):
    for j in range(4):
        if(z.loc[i][j+1]==z.loc[i]['max']):
            ax=plt.plot(z.loc[i]['Average Math Proficiency'],z.loc[i]['max'],color=color[j],marker='o',label=label[j],linewidth=2, markersize=8)

import matplotlib.patches as mpatches

red_patch = mpatches.Patch(color='r', label='Percent Black')
yellow_patch = mpatches.Patch(color='y', label='Percent White')
blue_patch = mpatches.Patch(color='b', label='Percent Asian')
green_patch = mpatches.Patch(color='g', label='Percent Hispanic')
plt.legend(handles=[red_patch,yellow_patch,blue_patch,green_patch])
plt.title('Average Math Proficiency',fontweight="bold",fontsize=20)
plt.xlabel('Average Math Proficiency Score')
plt.show()    

* **It can be seen that a higher percentage of Black & Hispanic people have a lower Math proficiency as compared to the Asian and White who have a average Math proficiency of 3 and above.**
* **The Black does have some erratic nature with their percentage peaking at some higher grades.**
* **The Black & Hispanic percentage keeps decreasing with at higher ELA score.**
* **The White & Asian percentage increases at higher ELA scores.**
* **Black & Hispanic people have a simialr graph. The same is the case for Asian and White**.

### [10.1  Math Grade trend](#first-bullet)* *

In [None]:
#z=schools[['Grade 8 Math - All Students Tested','Grade 8 Math 4s - All Students','Grade 8 Math 4s - American Indian or Alaska Native','Grade 8 Math 4s - Black or African American','Grade 8 Math 4s - Hispanic or Latino','Grade 8 Math 4s - Asian or Pacific Islander','Grade 8 Math 4s - White','Grade 8 Math 4s - Multiracial','Grade 8 Math 4s - Limited English Proficient','Grade 8 Math 4s - Economically Disadvantaged']].head(20)
#z=schools.loc[schools['Grade 8 Math - All Students Tested']!=0][['Grade 8 Math - All Students Tested','Grade 8 Math 4s - All Students','Grade 8 Math 4s - American Indian or Alaska Native','Grade 8 Math 4s - Black or African American','Grade 8 Math 4s - Hispanic or Latino','Grade 8 Math 4s - Asian or Pacific Islander','Grade 8 Math 4s - White','Grade 8 Math 4s - Multiracial','Grade 8 Math 4s - Limited English Proficient','Grade 8 Math 4s - Economically Disadvantaged']].head(20)
def form_grades(x,a,b,y):
    a=a-(20*(8-x))
    b=b-(20*(8-x))
    if x==3 and y=='Math':
        z=schools.iloc[:,b:a][schools['Grade'+' '+str(x)+' '+y+' - All Students tested']!=0]
    else:
        z=schools.iloc[:,b:a][schools['Grade'+' '+str(x)+' '+y+' - All Students Tested']!=0]
    #return(z)
    df={}
    for col in range(len(z.T)):
        df[str(z.columns[col]).split('- ')[1]]=[((z.iloc[:,col]/z.iloc[:,0]).mean())*100]
    if x==3 and y=='Math':
        df=pd.DataFrame(df,index=['Grade'+' '+str(x)]).drop('All Students tested',axis=1) 
    else:
        df=pd.DataFrame(df,index=['Grade'+' '+str(x)]).drop('All Students Tested',axis=1)
    return(df)

math_df=form_grades(3,-1,-11,'Math')
math_df=math_df.append(form_grades(4,-1,-11,'Math'))
math_df=math_df.append(form_grades(5,-1,-11,'Math'))
math_df=math_df.append(form_grades(6,-1,-11,'Math'))
math_df=math_df.append(form_grades(7,-1,-11,'Math'))
math_df=math_df.append(form_grades(8,-1,-11,'Math'))
math_df.style.set_properties(**{'background-color': 'black',
                           'color': 'gold','border-color': 'white'})

> * The "economically disadvantaged" is a term used by government institutions in for example allocating free 
> school meals to "a student who is a member of a household that meets the income eligibility guidelines for 
> free or reduced-price meals (less than or equal to 185% of Federal Poverty Guidelines)" or business grants.
> 

> The following scale is used to evaluate final results:
> 
> 65 - 100 Passing 
> 0 - 64 Failing

Percentage (%) Range ...........................Level               
                                            
     0 – 64                        1
     65 – 79                       2
     80 – 89                       3
     90 – 100                      4

In [None]:
plt.rcParams['font.size'] = 20
fig = plt.figure(figsize=(20,8))
plt.title('How many students got 4 in Maths: Percentage',fontsize=20,fontweight='bold')
ax1 = fig.add_subplot(111)

ax1.plot(math_df.index, math_df['American Indian or Alaska Native'],marker='o',label='American Indian',linewidth=2, markersize=12)
ax1.plot(math_df.index, math_df['Asian or Pacific Islander'],marker='o', label='Asian',linewidth=2, markersize=12)
ax1.plot(math_df.index, math_df['Black or African American'],marker='o', label='Black',linewidth=2, markersize=12)
ax1.plot(math_df.index, math_df['Hispanic or Latino'],marker='o', label='Hispanic',linewidth=2, markersize=12)
ax1.plot(math_df.index, math_df['White'],marker='o', label='White',linewidth=2, markersize=12)
ax1.yaxis.label.set_size(15)
ax1.xaxis.label.set_size(15)
plt.legend(loc='upper right')
plt.grid(True)
plt.show()

## [11. English Proficiency](#first-bullet)


> -  **<font color=blue>Sheldon: But whoa, whoa. Is placed right?</font>**
> 
> -  **<font color=red>Leonard: What do you mean?</font>**
> 
> -  **<font color=blue>Sheldon: Is placed the right tense for something that would’ve happened in the future of a past that was affected by something from the future?</font>**
> 
> -  **<font color=red>Leonard: (*thinks*) Had will have placed?</font>**
> 
> -  **<font color=blue>Sheldon: That’s my boy.</font>**
> 
> @BigBangTheory

In [None]:
z=schools.groupby(['Average ELA Proficiency'],as_index=False)[['Percent Black','Percent White','Percent Asian','Percent Hispanic']].agg(np.mean)
z['max']=z.max(axis=1)

color=['r','y','b','g']
label=['Percent Black','Percent White','Percent Asian','Percent Hispanic']

plt.figure(figsize=(15,8))
for i in range(len(z)):
    for j in range(4):
        if(z.loc[i][j+1]==z.loc[i]['max']):
            ax=plt.plot(z.loc[i]['Average ELA Proficiency'],z.loc[i]['max'],color=color[j],marker='o',label=label[j],linewidth=2, markersize=8)

import matplotlib.patches as mpatches

red_patch = mpatches.Patch(color='r', label='Percent Black')
yellow_patch = mpatches.Patch(color='y', label='Percent White')
blue_patch = mpatches.Patch(color='b', label='Percent Asian')
green_patch = mpatches.Patch(color='g', label='Percent Hispanic')
plt.legend(handles=[red_patch,yellow_patch,blue_patch,green_patch])
plt.title('Average ELA Proficiency',fontweight="bold",fontsize=20)
plt.xlabel('Average ELA Proficiency Score')
plt.show()    

> * **It can be seen that a higher percentage of Black & Hispanic people have a lower ELA proficiency as compared to the Asian and White who have a average ELA proficiency of 3 and above.**
> * **The Black & Hispanic percentage keeps decreasing with at higher ELA score.**
> * **The White & Asian percentage increases at higher ELA scores.**

### [11.1 ELA Grade trend](#1)

In [None]:
ela_df=form_grades(3,-11,-21,'ELA')
ela_df=ela_df.append(form_grades(4,-11,-21,'ELA'))
ela_df=ela_df.append(form_grades(5,-11,-21,'ELA'))
ela_df=ela_df.append(form_grades(6,-11,-21,'ELA'))
ela_df=ela_df.append(form_grades(7,-11,-21,'ELA'))
ela_df=ela_df.append(form_grades(8,-11,-21,'ELA'))
ela_df.style.set_properties(**{'background-color': 'black',
                           'color': 'aqua','border-color': 'white'})

In [None]:
fig = plt.figure(figsize=(20,8))
plt.rcParams['font.size'] = 15
plt.title('How many students got 4 in ELA: Percentage',fontsize=20,fontweight='bold')
ax1 = fig.add_subplot(111)

ax1.plot(ela_df.index, ela_df['American Indian or Alaska Native'],marker='o',label='American Indian',linewidth=2, markersize=12)
ax1.plot(ela_df.index, ela_df['Asian or Pacific Islander'],marker='o', label='Asian',linewidth=2, markersize=12)
ax1.plot(ela_df.index, ela_df['Black or African American'],marker='o', label='Black',linewidth=2, markersize=12)
ax1.plot(ela_df.index, ela_df['Hispanic or Latino'],marker='o', label='Hispanic',linewidth=2, markersize=12)
ax1.plot(ela_df.index, ela_df['White'],marker='o', label='White',linewidth=2, markersize=12)
#ax1.yaxis.label.set_size(15)
#ax1.xaxis.label.set_size(15)
plt.legend(loc='upper right')
plt.grid(True)
plt.show()

# [12.  SHSAT data](#2)

In [None]:
shsat=pd.read_csv('../input/data-science-for-good/D5 SHSAT Registrations and Testers.csv')
shsat.tail()

### [12.1 Number of student who register and/or take the SHSAT exam : Trend](#re) 

In [None]:
z=shsat.groupby(['Year of SHST'],as_index=False)['Number of students who registered for the SHSAT'].agg(np.sum)
z1=shsat.groupby(['Year of SHST'],as_index=False)['Number of students who took the SHSAT'].agg(np.sum)
z1=pd.merge(z,z1,on='Year of SHST')
z1.set_index('Year of SHST', inplace=True)

x=shsat.groupby(['Grade level'],as_index=False)['Number of students who registered for the SHSAT'].agg(np.sum)
x1=shsat.groupby(['Grade level'],as_index=False)['Number of students who took the SHSAT'].agg(np.sum)
x1=pd.merge(x,x1,on='Grade level')
x1.set_index('Grade level', inplace=True)

fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(20,6))

ax1=z1.plot(kind='bar', fontsize=15,ax=axes[0])
ax2=x1.plot(kind='bar', fontsize=15,ax=axes[1])
ax1.set_title('Year of SHST',fontweight='bold',fontsize=17)
ax2.set_title('Grade Level',fontweight='bold',fontsize=17)
plt.show()

### [12.2 SHSAT school wise trend of students](#kd)

In [None]:
z=shsat.groupby(['School name'],as_index=False)['Number of students who registered for the SHSAT'].agg(np.sum)
z1=shsat.groupby(['School name'],as_index=False)['Number of students who took the SHSAT'].agg(np.sum)
z1=pd.merge(z,z1,on='School name').sort_values(['Number of students who registered for the SHSAT'],ascending=False)[:10]
z1.set_index('School name', inplace=True)

ax1=z1.plot(kind='barh',color=['coral','gold'], fontsize=15,figsize=(20,8),legend = False)
blue_patch = mpatches.Patch(color='coral', label='Number of students who registered for the SHSAT')
green_patch = mpatches.Patch(color='gold', label='Number of students who took the SHSAT')
plt.legend(handles=[blue_patch,green_patch],prop={'size': 19})
plt.show()

* **Only the 8th & 9th graders are eligible to take the test.**


* **The count of students who registered for the test, fell since 2014. But, an interesting point to be noted is that, the number of students who took the test has been almost constant throughout. This might indicate that only the ones who are more serious about appearing in the test have registered since 2015.**

### [12.3 Schools with highest  Number of student who register to Number of students who take the test ratio (in District 5)](#lksjd)

In [None]:
shsat['ratio']=shsat['Number of students who took the SHSAT']/shsat['Number of students who registered for the SHSAT']
shsat['ratio']=shsat['ratio'].apply(lambda x: round(x,3))

z=shsat[['DBN','School name','Number of students who registered for the SHSAT','Number of students who took the SHSAT','ratio']]
z=z[['School name','ratio','Number of students who registered for the SHSAT','Number of students who took the SHSAT']]

z.sort_values('ratio',ascending=True)[z['ratio'].between(0.9,1)].style.set_properties(**{'background-color': 'black',
                           'color': 'lawngreen','border-color': 'white'})


### [12.4 Schools with lowest  Number of student who register to Number of students who take the test ratio (in District 5)](#lksjd)

In [None]:
z.sort_values('ratio',ascending=True)[z['ratio'].between(0,0.2)].style.set_properties(**{'background-color': 'black',
                           'color': 'gold','border-color': 'white'})

In [None]:
z=shsat[['DBN','School name','Number of students who registered for the SHSAT','Number of students who took the SHSAT','ratio']]
z.rename(columns={'DBN':'Location Code'}, inplace=True)
z=pd.merge(schools[['Location Code','Percent Black','Percent White','Percent Hispanic','Percent Asian','Economic Need Index','Rigorous Instruction %','Collaborative Teachers %','Effective School Leadership %','Strong Family-Community Ties %','Trust %']],z, on='Location Code')

z.sort_values('ratio',ascending=True)[z['ratio'].between(0,1)]
#'Rigorous Instruction %','Collaborative Teachers %','Effective School Leadership %','Strong Family-Community Ties %'
plt.figure(figsize=(20,4))
plt.subplot(141)
sns.regplot('Rigorous Instruction %','ratio', z)  
plt.subplot(142)
sns.regplot('Collaborative Teachers %','ratio', z,color=sns.color_palette("pastel")[2]) 
plt.subplot(143)
sns.regplot('Effective School Leadership %','ratio', z ,color="orange")
plt.subplot(144)
sns.regplot('Strong Family-Community Ties %','ratio', z ,color=sns.color_palette("pastel")[0]) 

plt.show()

>  * **Democracy Prep Harlem Charter School,  Harlem Village Academy Charter School have the lowest ratio every year in addition to schools like KIPP Infinity Charter School.**

>  * **The reason for this kind of behaviour may be attributed to the fact that the 'Economic Index' pattern. The index is very high for this district (0.75-0.9). They also have a high percentage of Black and Hispanic students. **

> *  **Additionally  Collaborative Teachers % and Effective School Leadership % has been seen to influence the ratio, although by a small extent.**

> # MORE TO COME.... stay tuned !!!
> ## Please upvote if you like it