<h1>PASSNYC - Data Science for Good</h1>
<p>PASSNYC is a not-for-profit organization that facilitates a collective impact that is dedicated to broadening educational opportunities for New York City's talented and underserved students. New York City is home to some of the most impressive educational institutions in the world, yet in recent years, the City’s specialized high schools - institutions with historically transformative impact on student outcomes - have seen a shift toward more homogeneous student body demographics.</p>

<p>PASSNYC uses public data to identify students within New York City’s under-performing school districts and, through consulting and collaboration with partners, aims to increase the diversity of students taking the Specialized High School Admissions Test (SHSAT). By focusing efforts in under-performing areas that are historically underrepresented in SHSAT registration, we will help pave the path to specialized high schools for a more diverse group of students.</p>

<h2>Table of Contents</h2>
1. Distribution of Schools according to Cities
    * Bar Chart
    * Pie Chart
2. School Income Estimate
3. Number of Community colleges
4. Highest Grade offered by the Schools 
5. Economic Need Index 
6. Framework for Great Schools
    * Rigorous Instruction
    * Collaborative Teachers
    * Supportive Environment
    * Effective School Leadership
    * Strong Family-Community Ties
    * Trust
7. Average Math and ELA Proficiency
8. Students Chronically Absent
9. Distrubution of Schools in NY by Latitude and Longitude
10. Geospatial Analysis
    * Asian Percentage in NYC Schools
    * ELL Percentage in NYC Schools
    * Black Percentage in NYC Schools
    * Hispanic Percentage in NYC Schools
    * White Percentage in NYC Schools
11. Scores for ELA and Maths from Grade 3 to Grade 8
12. Number of students who registered for the SHSAT
13. Enrollment on 10/31 Bar Plot

Hi All, The aim of this notebook is complete EDA of the PASSNYC dataset using plotly. Do let me know what do you think about the visualizations in the comments.

Thanks!!!

In [23]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['2016 School Explorer.csv', 'D5 SHSAT Registrations and Testers.csv']


In [24]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from datetime import date
import seaborn as sns
import random 
import warnings
import operator
warnings.filterwarnings("ignore")

In [25]:
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
import plotly.figure_factory as ff

In [26]:
df_school = pd.read_csv("../input/2016 School Explorer.csv")
df_reg = pd.read_csv("../input/D5 SHSAT Registrations and Testers.csv")

In [27]:
(df_school['Adjusted Grade'].notnull()).sum()

2

In [28]:
for i in df_school.columns:
    print(i)

Adjusted Grade
New?
Other Location Code in LCGMS
School Name
SED Code
Location Code
District
Latitude
Longitude
Address (Full)
City
Zip
Grades
Grade Low
Grade High
Community School?
Economic Need Index
School Income Estimate
Percent ELL
Percent Asian
Percent Black
Percent Hispanic
Percent Black / Hispanic
Percent White
Student Attendance Rate
Percent of Students Chronically Absent
Rigorous Instruction %
Rigorous Instruction Rating
Collaborative Teachers %
Collaborative Teachers Rating
Supportive Environment %
Supportive Environment Rating
Effective School Leadership %
Effective School Leadership Rating
Strong Family-Community Ties %
Strong Family-Community Ties Rating
Trust %
Trust Rating
Student Achievement Rating
Average ELA Proficiency
Average Math Proficiency
Grade 3 ELA - All Students Tested
Grade 3 ELA 4s - All Students
Grade 3 ELA 4s - American Indian or Alaska Native
Grade 3 ELA 4s - Black or African American
Grade 3 ELA 4s - Hispanic or Latino
Grade 3 ELA 4s - Asian or Pacific

<h2>School City</h2>
Lets have a look at the distribution of schools in the cities given in the database.

In [29]:
from collections import Counter
city_names = []
city_count = []
city_dict = dict(Counter(df_school.City))
city_dict = sorted(city_dict.items(), key=operator.itemgetter(1))
for tup in city_dict:
    city_names.append(tup[0].lower())
    city_count.append(tup[1])

dataa = [go.Bar(
            y= city_names,
            x = city_count,
            width = 0.9,
            opacity=0.6, 
            orientation = 'h',
            marker=dict(
                color='rgb(158,202,225)',
                line=dict(
                    color='rgb(8,48,107)',
                    width=1.5,
                )
            )
        )]
layout = go.Layout(
    title='Distribution of Schools ',
    autosize = False,
    width=800,
    height=800,
    margin=go.Margin(
        l=250,
        r=50,
        b=100,
        t=100,
        pad=10
    ),
)

fig = go.Figure(data=dataa, layout = layout)
py.iplot(fig, filename='School-City-Bar')

fig2 = {
  "data": [
    {
      "values": city_count,
      "labels": city_names,
      "hoverinfo":"label+percent",
      "hole": .3,
      "type": "pie"
    }],
  "layout": {
        "title":"Percentage of Schools in each City",
        "paper_bgcolor":'rgb(243, 243, 243)',"plot_bgcolor":'rgb(243, 243, 243)'
        
    }
}
py.iplot(fig2, filename='School-City-Pie')

Well, According to the above charts, Most of the Schools are from Brooklyn(32.3%), followed by Bronx(23.3%), New York(18.2%), Staten Island(4.72%), Jamaica(2.52%)

<h2>School Income Estimate</h2>
Let's have a look at how much estimated income do the schools have and how are they distributed. The column is contains string values so, need to change it to float values and plot some more charts.

In [None]:
df_school['School Income Estimate'] = df_school['School Income Estimate'].str.replace(',', '')
df_school['School Income Estimate'] = df_school['School Income Estimate'].str.replace('$', '')
df_school['School Income Estimate'] = df_school['School Income Estimate'].str.replace(' ', '')
df_school['School Income Estimate'] = df_school['School Income Estimate'].astype(float)

trace1 = go.Histogram(
    x = df_school['School Income Estimate'],
    name = 'School Income Estimate'
)
dat = [trace1]

layout = go.Layout(
    title='School Income Estimate',paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)'
)

fig = go.Figure(data=dat, layout = layout)
py.iplot(fig, filename='School-Income-Hist')

Most of the colleges have income from 25k to 45k, There are a few colleges with extremely high incomes like 140k and 180k, Would like to check if they are Community colleges or not. 

<h2>Community College or Not</h2>
How many colleges from the given dataset are Community colleges?

In [None]:
cc_names, cc_count = list(), list()
cc = dict(Counter(df_school['Community School?']))
cc = sorted(cc.items(), key=operator.itemgetter(1))
for tup in cc:
    cc_names.append(tup[0].upper())
    cc_count.append(tup[1])

dataa = [go.Bar(
            y= cc_names,
            x = cc_count,
            width = 0.9,
            opacity=0.6, 
            orientation = 'h',
            marker=dict(
                color='rgb(158,202,225)',
                line=dict(
                    color='rgb(8,48,107)',
                    width=1.5,
                )
            ),
        )]
layout = go.Layout(
    title='Community School or Not?',
    margin=go.Margin(
        l=250,
        r=50,
        b=100,
        t=100,
        pad=10
    ),paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)'
)

fig = go.Figure(data=dataa, layout = layout)
py.iplot(fig, filename='Community-School-Bar')


Out of the given 1272 colleges, only 76 of them are Community colleges.

<h2>High Grade</h2>
Let's see the highest grade offered by the Schools.

In [None]:
df_school['Grade High'] = df_school['Grade High'].map({'09': '9th Grade', '10': '10th Grade', '07': '7th Grade', '02': '2nd Grade', '0K': 'Kindergarten', '04': '4th Grade', '03': '3rd Grade', '06': '6th Grade', '12': '12th Grade', '08': '8th Grade', '05': '5th Grade'})

cc = dict(Counter(df_school['Grade High']))


dataa = [go.Bar(
            y= list(cc.keys()),
            x = list(cc.values()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'h',
            marker=dict(
                color='rgb(158,202,225)',
                line=dict(
                    color='rgb(8,48,107)',
                    width=1.5,
                )
            ),
        )]
layout = go.Layout(
    title='Highest Grades in the Given Schools',
    margin=go.Margin(
        l=250,
        r=50,
        b=100,
        t=100,
        pad=10
    ),paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)'
)

fig = go.Figure(data=dataa, layout = layout)
py.iplot(fig, filename='Community-School-Bar')

<h2>Economic Need Index</h2>
The Economic Need Index reflects the socioeconomics of the school population. It is calculated using the following formula:

Economic Need Index = (Percent Temporary Housing) + (Percent
HRA-eligible * 0.5) + (Percent Free Lunch Eligible * 0.5)

In [None]:
trace1 = go.Histogram(
    x = df_school['Economic Need Index'],
    name = 'Economic Need Index'
)
dat = [trace1]

layout = go.Layout(
    title='Economic Need Index Histogram',paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)'
)

fig = go.Figure(data=dat, layout = layout)
py.iplot(fig, filename='School-Income-Hist')

In [None]:
Margin_common = go.Margin(
        l=450,
        r=50,
        b=100,
        t=100,
        pad=10)
marker_common = dict(
                color='rgb(158,202,225)',
                line=dict(
                    color='rgb(8,48,107)',
                    width=1.5,
                )
            )

<h2>Framework for Great Schools</h2>
The Framework for Great Schools sets forth six elements—Rigorous Instruction,
Collaborative Teachers, Supportive Environment, Effective School Leadership,
Strong Family-Community Ties, and Trust—that drive student achievement and
school improvement.

The School Quality Reports share ratings and information on how schools are
performing on the six Framework elements.

1. **Rigorous Instruction**: This section looks at whether curriculum and instruction  are designed to engage students, foster critical-thinking skills, and are aligned to the Common Core. This section draws upon data from the Quality Review and the NYC School Survey.
2. **Collaborative Teachers**: This section looks at whether teachers participate in opportunities to develop, grow, and contribute to the continuous improvement of the school community. This section draws upon data from the Quality Review and the NYC School Survey.
3. **Supportive Environment**: This section looks at whether the school establishes a culture where students feel safe, challenged to grow, and supported to meet high expectations. This section draws upon data from the Quality Review, the NYC School Survey, chronic absenteeism (or average change in student attendance, for some school types), and movement of students with disabilities to less restrictive environments.
4. **Effective School Leadership**: This section looks at whether school leadership inspires the school community with a clear instructional vision and effectively distributes leadership to realize this vision. This section draws upon data from the NYC School Survey and the Quality Review.
5. **Strong Family-Community Ties**: This section looks at whether the school forms effective partnerships with families to improve the school. This section draws NYC Department of Education 2 upon data from the NYC School Survey and the Quality Review.
6. **Trust**: This section looks at whether relationships between administrators, educators, students, and families are based on trust and respect. This section draws upon data from the NYC School Survey.

Reference : http://schools.nyc.gov/NR/rdonlyres/BC3EADE6-7F28-4E9E-BAED-37A0F0A0F0DF/0/201617EducatorGuideEC10232017.pdf

In [None]:
rir = dict(Counter(df_school['Rigorous Instruction Rating']))
rir_bar = go.Bar(
            y= list(rir.values()),
            x = list(rir.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Rigorous Instruction Rating',
            marker= marker_common
        )

rir2_hist = go.Histogram(
    x = df_school['Rigorous Instruction %'],
    name = 'Rigorous Instruction %'
)


ctr = dict(Counter(df_school['Collaborative Teachers Rating']))
ctr_bar = go.Bar(
            y= list(ctr.values()),
            x = list(ctr.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Collaborative Teachers Rating',
            marker= marker_common
        )

ctr2_hist = go.Histogram(
    x = df_school['Collaborative Teachers %'],
    name = 'Collaborative Teachers %'
)


ser = dict(Counter(df_school['Supportive Environment Rating']))
ser_bar = go.Bar(
            y= list(rir.values()),
            x = list(rir.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Supportive Environment Rating',
            marker= marker_common
        )

ser2_hist = go.Histogram(
    x = df_school['Supportive Environment %'],
    name = 'Supportive Environment %'
)


eslr = dict(Counter(df_school['Effective School Leadership Rating']))
eslr_bar = go.Bar(
            y= list(eslr.values()),
            x = list(eslr.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Effective School Leadership Rating',
            marker= marker_common
        )

eslr2_hist = go.Histogram(
    x = df_school['Effective School Leadership %'],
    name = 'Effective School Leadership %'
)


sfct = dict(Counter(df_school['Strong Family-Community Ties Rating']))
sfct_bar = go.Bar(
            y= list(sfct.values()),
            x = list(sfct.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Strong Family-Community Rating',
            marker= marker_common
        )

sfct2_hist = go.Histogram(
    x = df_school['Strong Family-Community Ties %'],
    name = 'Strong Family-Community Ties %'
)


tr = dict(Counter(df_school['Trust Rating']))
tr_bar = go.Bar(
            y= list(tr.values()),
            x = list(tr.keys()),
            width = 0.9,
            opacity=0.6, 
            orientation = 'v',
            name = 'Trust Rating',
            marker= marker_common
        )

tr2_hist = go.Histogram(
    x = df_school['Trust %'],
    name = 'Trust %'
)


fig = tls.make_subplots(rows=6, cols=2, subplot_titles=('Rigorous Instruction Rating', 'Rigorous Instruction %',
                                                        'Collaborative Teachers Rating', 'Collaborative Teachers %',
                                                        'Supportive Environment Rating', 'Supportive Environment %',
                                                       'Effective School Leadership Rating', 'Effective School Leadership %',
                                                        'Strong Family-Community Ties Rating', 'Strong Family-Community Ties %',
                                                       'Trust Rating', 'Trust %'));
fig.append_trace(rir_bar, 1, 1);
fig.append_trace(rir2_hist, 1, 2);
fig.append_trace(ctr_bar, 2, 1);
fig.append_trace(ctr2_hist, 2, 2);
fig.append_trace(ser_bar, 3, 1);
fig.append_trace(ser2_hist, 3, 2);
fig.append_trace(eslr_bar, 4, 1);
fig.append_trace(eslr2_hist, 4, 2);
fig.append_trace(sfct_bar, 5, 1);
fig.append_trace(sfct2_hist, 5, 2);
fig.append_trace(tr_bar, 6, 1);
fig.append_trace(tr2_hist, 6, 2);

fig['layout'].update(height=2400,title='School Quality Report Charts', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='simple-subplot')

<h2>Student Achievement Rating</h2>

In [None]:
SAR = dict(Counter(df_school['Student Achievement Rating']))

fig2 = {
  "data": [
    {
      "values": list(SAR.values()),
      "labels": list(SAR.keys()),
      "hoverinfo":"label+percent",
      "hole": .3,
      "type": "pie"
    }],
  "layout": {
        "title":"Student Achievement Rating"
        ,"paper_bgcolor":'rgb(243, 243, 243)',"plot_bgcolor":'rgb(243, 243, 243)'
    }
}
py.iplot(fig2, filename='School-City-Pie')


<h2>Average Math and ELA Proficiency</h2>
Understanding Proficiency provides resources that guide educators in analyzing student work on performance tasks in order to develop a deeper understanding of the Maths/English Language Arts (ELA)/Literacy Common Core State Standards

In [None]:
amp_hist = go.Histogram(
    x = df_school['Average Math Proficiency'],
    name = 'Average Math Proficiency'
)

aep_hist = go.Histogram(
    x = df_school['Average ELA Proficiency'],
    name = 'Average ELA Proficiency'
)
print("Average Math Proficiency is : " + str(np.mean(df_school['Average Math Proficiency'])))
print("Average ELA Proficiency is : " + str(np.mean(df_school['Average ELA Proficiency'])))
fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Average-Math-Proficiency-Histogram', 'Average ELA Proficiency'));
fig.append_trace(amp_hist, 1, 1);
fig.append_trace(aep_hist, 1, 2);

fig['layout'].update(height=400,title='Average Proficiency Plot', showlegend=False,paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

<h2>Students Chronically Absent</h2>

In [None]:
PCA_hist = go.Histogram(
    x = df_school['Percent of Students Chronically Absent'],
    name = 'Percent of Students Chronically Absent'
)

dat = [PCA_hist]

layout = go.Layout(
    title='Percent of Students Chronically Absent Histogram',paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)'
)

fig = go.Figure(data=dat, layout = layout)
py.iplot(fig, filename='Percent-of-Students-Chronically-Absent-Hist')

In [None]:
def p2f(x):
    return float(x.strip('%'))/100

df_school['Percent Asian'] = df_school['Percent Asian'].apply(p2f)
df_school['Percent Black'] = df_school['Percent Black'].apply(p2f)
df_school['Percent Hispanic'] = df_school['Percent Hispanic'].apply(p2f)
df_school['Percent White'] = df_school['Percent White'].apply(p2f)
df_school['Percent Black / Hispanic'] = df_school['Percent Black / Hispanic'].apply(p2f)
df_school['Percent ELL'] = df_school['Percent ELL'].apply(p2f)

In [None]:
d3 = pd.DataFrame(df_school.groupby(['City']).mean())
d3[['Economic Need Index','School Income Estimate','Percent Asian','Percent Black','Percent Hispanic','Percent Black / Hispanic','Percent White']]
#d3.head(25)

<h2>Latitude and Longitude</h2>
Let's Have a look at the distribution of locations using jointplot

In [None]:
plt.figure(figsize=(12,12))
sns.jointplot(x=df_school.Latitude.values, y=df_school.Longitude.values, size=10, color = 'red')
#sns.swarmplot(x="Latitude", y="Longitude", hue="Percent Asian" data=df_school)
plt.ylabel('Longitude', fontsize=12)
plt.xlabel('Latitude', fontsize=12)
plt.show()

<h2>Geospatial Analysis of the Asian Percentage in NYC Schools</h2>

Red means less Precentage and Blue means more Percentage

In [None]:
import folium
from folium import plugins
from io import StringIO
import folium 

#colors = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, 12)]
colors = ['red', 'yellow', 'dusty purple', 'blue']
d = (df_school['Percent Asian']*100).astype('int')
cols = [colors[int(i/25)] for i in d]

map_osm2 = folium.Map([df_school['Latitude'][0], df_school['Longitude'][0]], zoom_start=10.2,tiles='cartodbdark_matter')

for lat, long, col in zip(df_school['Latitude'], df_school['Longitude'], cols):
    #rown = list(rown)
    folium.CircleMarker([lat, long], color=col, fill=True, radius=2).add_to(map_osm2)

map_osm2

<h2>Geospatial Analysis of the ELL Percentage in NYC Schools</h2>

In [None]:
d1 = (df_school['Percent ELL']*100).astype('int')
cols = [colors[int(i/25)] for i in d1]

map_osm2 = folium.Map([df_school['Latitude'][0], df_school['Longitude'][0]], zoom_start=10.2,tiles='cartodbdark_matter')

for lat, long, col in zip(df_school['Latitude'], df_school['Longitude'], cols):
    folium.CircleMarker([lat, long], color=col, fill=True, radius=2).add_to(map_osm2)

map_osm2

<h2>Geospatial Analysis of the Black Percentage in NYC Schools</h2>

In [None]:
d3 = (df_school['Percent Black']*100).astype('int')
cols = [colors[int(i/25)] for i in d1]

map_osm2 = folium.Map([df_school['Latitude'][0], df_school['Longitude'][0]], zoom_start=10.2,tiles='cartodbdark_matter')

for lat, long, col in zip(df_school['Latitude'], df_school['Longitude'], cols):
    folium.CircleMarker([lat, long], color=col, fill=True, radius=2).add_to(map_osm2)

map_osm2

<h2>Geospatial Analysis of the Hispanic Percentage in NYC Schools</h2>

In [None]:
d3 = (df_school['Percent Hispanic']*100).astype('int')
cols = [colors[int(i/25)] for i in d1]

map_osm2 = folium.Map([df_school['Latitude'][0], df_school['Longitude'][0]], zoom_start=10.2,tiles='cartodbdark_matter')

for lat, long, col in zip(df_school['Latitude'], df_school['Longitude'], cols):
    folium.CircleMarker([lat, long], color=col, fill=True, radius=2).add_to(map_osm2)

map_osm2

<h2>Geospatial Analysis of the White Percentage in NYC Schools</h2>

In [None]:
d3 = (df_school['Percent White']*100).astype('int')
cols = [colors[int(i/25)] for i in d1]

map_osm2 = folium.Map([df_school['Latitude'][0], df_school['Longitude'][0]], zoom_start=10.2,tiles='cartodbdark_matter')

for lat, long, col in zip(df_school['Latitude'], df_school['Longitude'], cols):
    folium.CircleMarker([lat, long], color=col, fill=True, radius=2).add_to(map_osm2)

map_osm2

In [None]:
df_school.head(10)

<h2>Scores for ELA and Maths from Grade 3 to Grade 8</h2>
**Numeric Grade Scores** : Report Cards Give Up A’s and B’s for 4s and 3s. The lowest mark, 1, indicates a student is not meeting New York State’s academic standards, while the top grade of 4 celebrates “meeting standards with distinction.”

Below plots are bubble charts for the Number of students getting 4 Scores in Grades 3 to 8.  

In [None]:
# Create a trace
colors = ['hsl('+str(h)+',50%'+',50%)' for h in np.linspace(0, 360, 12)]
race = ['All Students','American Indian or Alaska Native','Black or African American',
          'Hispanic or Latino','Asian or Pacific Islander','White',
          'Multiracial','Limited English Proficient','Economically Disadvantaged']

g3_ela_count = list()
for i in race:
    g3_ela_count.append( len( df_school['Grade 3 ELA 4s - ' + i][df_school['Grade 3 ELA 4s - ' + i] > 0] ) )

total = np.sum(g3_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g3_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g3_ela_count],
        color=colors[:len(race)],
    )
)

g3_math_count = list()
for i in race:
    g3_math_count.append( len( df_school['Grade 3 Math 4s - ' + i][df_school['Grade 3 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g3_math_count)
trace1 = go.Scatter(
    x=race,
    y=g3_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g3_math_count],
        color=colors[:len(race)],
    )
)

fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 3 ELA', 'Count of Students scoring 4s in Grade 3 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 3', showlegend=False,paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)' );
py.iplot(fig, filename='Proficiency-subplot')

In [None]:
#---------------------Grade 4 -----------------------------------


g4_ela_count = list()
for i in race:
    g4_ela_count.append( len( df_school['Grade 4 ELA 4s - ' + i][df_school['Grade 4 ELA 4s - ' + i] > 0] ) )


total = np.sum(g4_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g4_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g4_ela_count],
        color=colors[:len(race)],
    )
)

g4_math_count = list()
for i in race:
    g4_math_count.append( len( df_school['Grade 4 Math 4s - ' + i][df_school['Grade 4 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g4_math_count)
trace1 = go.Scatter(
    x=race,
    y=g4_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g4_math_count],
        color=colors[:len(race)],
    )
)

fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 4 ELA', 'Count of Students scoring 4s in Grade 4 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 4', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

In [None]:
#---------------------- Grade 5 --------------------------------------

g5_ela_count = list()
for i in race:
    g5_ela_count.append( len( df_school['Grade 5 ELA 4s - ' + i][df_school['Grade 5 ELA 4s - ' + i] > 0] ) )


total = np.sum(g5_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g5_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g5_ela_count],
        color=colors[:len(race)],
    )
)

g5_math_count = list()
for i in race:
    g5_math_count.append( len( df_school['Grade 5 Math 4s - ' + i][df_school['Grade 5 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g5_math_count)
trace1 = go.Scatter(
    x=race,
    y=g5_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total2)*150) + 20 for x in g5_math_count],
        color=colors[:len(race)],
    )
)

fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 5 ELA', 'Count of Students scoring 4s in Grade 5 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 5', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

In [None]:
# ---------------- Grade 6 ---------------------------


g6_ela_count = list()
for i in race:
    g6_ela_count.append( len( df_school['Grade 6 ELA 4s - ' + i][df_school['Grade 6 ELA 4s - ' + i] > 0] ) )


total = np.sum(g6_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g6_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g6_ela_count],
        color=colors[:len(race)],
    )
)

g6_math_count = list()
for i in race:
    g6_math_count.append( len( df_school['Grade 6 Math 4s - ' + i][df_school['Grade 6 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g6_math_count)
trace1 = go.Scatter(
    x=race,
    y=g6_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total2)*150) + 20 for x in g6_math_count],
        color=colors[:len(race)],
    )
)


fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 6 ELA', 'Count of Students scoring 4s in Grade 6 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 6', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

In [None]:
# -------------------------- Grade 7 ---------------------------

g7_ela_count = list()
for i in race:
    g7_ela_count.append( len( df_school['Grade 7 ELA 4s - ' + i][df_school['Grade 7 ELA 4s - ' + i] > 0] ) )

total = np.sum(g7_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g7_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g7_ela_count],
        color=colors[:len(race)],
    )
)

g7_math_count = list()
for i in race:
    g7_math_count.append( len( df_school['Grade 7 Math 4s - ' + i][df_school['Grade 7 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g7_math_count)
trace1 = go.Scatter(
    x=race,
    y=g7_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total2)*150) + 20 for x in g7_math_count],
        color=colors[:len(race)],
    )
)


fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 7 ELA', 'Count of Students scoring 4s in Grade 7 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 7', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

In [None]:
#------------------------- Grade 8 -----------------------------

g8_ela_count = list()
for i in race:
    g8_ela_count.append( len( df_school['Grade 8 ELA 4s - ' + i][df_school['Grade 8 ELA 4s - ' + i] > 0] ) )

total = np.sum(g8_ela_count)
trace0 = go.Scatter(
    x=race,
    y=g8_ela_count,
    mode='markers',
    marker=dict(
        size=[((x/total)*150) + 20 for x in g8_ela_count],
        color=colors[:len(race)],
    )
)

g8_math_count = list()
for i in race:
    g8_math_count.append( len( df_school['Grade 8 Math 4s - ' + i][df_school['Grade 8 Math 4s - ' + i] > 0] ) )

total2 = np.sum(g8_math_count)
trace1 = go.Scatter(
    x=race,
    y=g8_math_count,
    mode='markers',
    marker=dict(
        size=[((x/total2)*150) + 20 for x in g8_math_count],
        color=colors[:len(race)],
        
    )
)

fig = tls.make_subplots(rows=1, cols=2, subplot_titles=('Count of Students scoring 4s in Grade 8 ELA', 'Count of Students scoring 4s in Grade 8 Math'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 1, 2);

fig['layout'].update(height=400,title='Count of Students scoring 4s in Grade 8', showlegend=False, paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

<h2>2016 School Explorer</h2>
Lets have a look at the second csv file given to us and explore it a bit. 

<h2>Number of students who registered for the SHSAT</h2>


In [None]:
d4 = df_reg[df_reg['Year of SHST'] == 2013]
d4 = pd.DataFrame(d4.groupby(['School name']).sum()).reset_index()
d5 = df_reg[df_reg['Year of SHST'] == 2014]
d5 = pd.DataFrame(d5.groupby(['School name']).sum()).reset_index()
d6 = df_reg[df_reg['Year of SHST'] == 2015]
d6 = pd.DataFrame(d6.groupby(['School name']).sum()).reset_index()
d7 = df_reg[df_reg['Year of SHST'] == 2016]
d7 = pd.DataFrame(d7.groupby(['School name']).sum()).reset_index()

d4['Number of students who did not take the SHSAT after registering'] = d4['Number of students who registered for the SHSAT'] - d4['Number of students who took the SHSAT']
d5['Number of students who did not take the SHSAT after registering'] = d5['Number of students who registered for the SHSAT'] - d5['Number of students who took the SHSAT']
d6['Number of students who did not take the SHSAT after registering'] = d6['Number of students who registered for the SHSAT'] - d6['Number of students who took the SHSAT']
d7['Number of students who did not take the SHSAT after registering'] = d7['Number of students who registered for the SHSAT'] - d7['Number of students who took the SHSAT']

In [None]:

trace1 = go.Bar(
    y=df_reg['School name'],
    x=df_reg['Number of students who registered for the SHSAT'],
    name='Number of students who registered for the SHSAT',
    orientation = 'h'
)
trace2 = go.Bar(
    y=df_reg['School name'],
    x=df_reg['Number of students who took the SHSAT'],
    name='Number of students who took the SHSAT',
    orientation = 'h'
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    showlegend = False,
    margin=go.Margin(
        l=350,
        r=50,
        b=100,
        t=100,
        pad=4
    ),
    height = 800,
    
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='marker-h-bar')


<h2>Number of students who registered for the SHSAT</h2>
Lets have a look at the percentage of students who registered and took the SHSAT.

In [None]:
label_common = ['Number of students who did not take the SHSAT after registering', 'Number of students who took the SHSAT']

values_13 = [np.sum(d4['Number of students who did not take the SHSAT after registering']), np.sum(d4['Number of students who took the SHSAT'])]
values_14 = [np.sum(d5['Number of students who did not take the SHSAT after registering']), np.sum(d5['Number of students who took the SHSAT'])]
values_15 = [np.sum(d6['Number of students who did not take the SHSAT after registering']), np.sum(d6['Number of students who took the SHSAT'])]
values_16 = [np.sum(d7['Number of students who did not take the SHSAT after registering']), np.sum(d7['Number of students who took the SHSAT'])]


labels1 = ['Number of students who registered for the SHSAT','Number of students who took the SHSAT','Number of students who did not take the SHSAT']
val_2013 = [[np.sum(d4['Number of students who registered for the SHSAT']), 
             np.sum(d4['Number of students who took the SHSAT']),
             np.sum(d4['Number of students who did not take the SHSAT after registering'])]]

val_2014 = [[np.sum(d5['Number of students who registered for the SHSAT']), 
             np.sum(d5['Number of students who took the SHSAT']),
             np.sum(d5['Number of students who did not take the SHSAT after registering'])]]

val_2015 = [[np.sum(d6['Number of students who registered for the SHSAT']), 
             np.sum(d6['Number of students who took the SHSAT']),
             np.sum(d6['Number of students who did not take the SHSAT after registering'])]]

val_2016 = [[np.sum(d7['Number of students who registered for the SHSAT']), 
             np.sum(d7['Number of students who took the SHSAT']),
             np.sum(d7['Number of students who did not take the SHSAT after registering'])]]
trace0 = go.Bar(
    y=labels1,
    x=val_2013[0],
    marker=dict(color=['blue', 'yellow','red']),
    orientation ='h'
)

trace1 = go.Bar(
    x=val_2014[0],
    y=labels1,
    marker=dict(color=['blue', 'yellow','red']),
    orientation ='h'
)
fig = tls.make_subplots(rows=2, cols=1, subplot_titles=('Number of students who registered for the SHSAT for 2013', 'Number of students who registered for the SHSAT for 2014'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 2, 1);

fig['layout'].update(title = 'Number of students who registered for the SHSAT for 2013 and 2014', height=600, showlegend=False, margin=go.Margin(
        l=350,
        r=50,
        b=100,
        t=100,
        pad=4
    ),paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

fig = {
  "data": [
    {
      "values": values_13,
      "labels": label_common,
      "domain": {"x": [0, .48]},
      "name": "2013",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    },
    {
      "values": values_14,
      "labels": label_common,
      "text":["2013"],
      "textposition":"inside",
      "domain": {"x": [.52, 1]},
      "name": "2014",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "title":"Number of students who registered for the SHSAT for 2013 and 2014",
        "showlegend": False,
        "annotations": [
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "2013",
                "x": 0.20,
                "y": 0.5
            },
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "2014",
                "x": 0.8,
                "y": 0.5
            }
        ],
        "paper_bgcolor": 'rgb(243, 243, 243)',"plot_bgcolor":'rgb(243, 243, 243)',
    }
}
py.iplot(fig, filename='donut')


In [None]:
trace0 = go.Bar(
    y=labels1,
    x=val_2015[0],
    marker=dict(color=['blue', 'yellow','red']),
    orientation ='h'
)

trace1 = go.Bar(
    x=val_2016[0],
    y=labels1,
    marker=dict(color=['blue', 'yellow','red']),
    orientation ='h'
)
fig = tls.make_subplots(rows=2, cols=1, subplot_titles=('Number of students who registered for the SHSAT for 2015', 'Number of students who registered for the SHSAT for 2016'));
fig.append_trace(trace0, 1, 1);
fig.append_trace(trace1, 2, 1);

fig['layout'].update(title = 'Number of students who registered for the SHSAT', height=600, showlegend=False, margin=go.Margin(
        l=350,
        r=50,
        b=100,
        t=100,
        pad=4
    ),paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)');
py.iplot(fig, filename='Proficiency-subplot')

fig = {
  "data": [
    {
      "values": values_15,
      "labels": label_common,
      "domain": {"x": [0, .48]},
      "name": "2015",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    },
    {
      "values": values_16,
      "labels": label_common,
      "text":["2016"],
      "textposition":"inside",
      "domain": {"x": [.52, 1]},
      "name": "2016",
      "hoverinfo":"label+percent+name",
      "hole": .4,
      "type": "pie"
    }],
  "layout": {
        "title":"Number of students who registered for the SHSAT for 2015 and 2016",
        "showlegend": False,
        "annotations": [
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "2015",
                "x": 0.20,
                "y": 0.5
            },
            {
                "font": {
                    "size": 20
                },
                "showarrow": False,
                "text": "2016",
                "x": 0.8,
                "y": 0.5
            }
        ],
      "paper_bgcolor": 'rgb(243, 243, 243)',"plot_bgcolor":'rgb(243, 243, 243)',
    }
}
py.iplot(fig, filename='donut')



In [None]:
trace1 = go.Bar(
    x=[np.sum(d4['Enrollment on 10/31']), np.sum(d5['Enrollment on 10/31']), np.sum(d6['Enrollment on 10/31']), np.sum(d7['Enrollment on 10/31'])],
    y=['2013', '2014', '2015', '2016'],
    marker=dict(color=['blue', 'yellow','red', 'orange']),
    orientation ='h'
)
data = [trace1]

layout = go.Layout(
    title = "Enrollment on 10/31 Bar Plot",
    barmode='stack',
    height = 400,
     xaxis=dict(
        title='Count',
    ),
    yaxis=dict(
        title='Year',
    ),paper_bgcolor='rgb(243, 243, 243)',plot_bgcolor='rgb(243, 243, 243)')


fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='marker-h-bar')

If you like this Notebook, Please Upvote.
Happy Kaggling!!