# EDA Class - Capstone Project

Pick a dataset from https://registry.opendata.aws

Your objective is to extract at least 3 insights. 

These insights should be more interesting and complex than simple descriptive statistics. Datasets from AWS' Open Data often have a main theme, so try to work with these. Use any or all analytical tools that you learned over the past few weeks, whether transformative (ex. TFIDF, vectorization, etc.) descriptive (find any interesting clustering in the data?) and/or predictive (ex. "Will COVID end soon? Lol"). 

Your project will be evaluated based on how creative your insights are, how difficult the methods you used are, and the quality of the questions you ask. 
Submit all your work in Jupyter notebook.

In [2]:
import pandas as pd

## The Dataset

The datasets are sourced from the Philippine Department of Education. 

They cover every recorded detail by DepEd about all elementary and secondary schools in the Philippines for the year 2015.

List of datasets:
- Teachers data - has the number of regular, instructor, mobile and sped teachers in a particular school
- School location data - has a school's geographic and 2015 enrollment information
- Rooms data - has the number of rooms a school has (standard - academic, unused, nonstandard - academic,unused)
- MOOE data - has a school's MOOE budget for 2015
- School masterlist - masterlist of schools in the deped database containing all relevant information gathered by the deparment
- Enrollment master data S - has a school's number of students (by Gender) per Secondary grade level (Kinder - Grade 7 - 12, SPED) 
- Enrollment master data E - has a school's number of students (by Gender) per Elementary grade level (Kinder - Grade 6, SPED) 


In [19]:
df_teachers = pd.read_csv('data/Teachers data.csv')
df_school_location = pd.read_csv('data/Schools Location Data.csv')
df_rooms_data = pd.read_csv('data/Rooms Data.csv')
df_MOOE_data = pd.read_csv('data/MOOE data.csv')
df_school_masterlist = pd.read_csv('data/Masterlist of Schools.csv')
df_enrollment_S = pd.read_csv('data/Enrollment Master Data_2015_S.csv')
df_enrollment_E = pd.read_csv('data/Enrollment Master Data_2015_E.csv')

In [33]:
print(df_teachers.shape)
df_teachers.head(2) #has the number of regular, instructor, mobile and sped teachers in a particular school

(45040, 5)


Unnamed: 0,school.id,teachers.instructor,teachers.mobile,teachers.regular,teachers.sped
0,100001,0,0,2,0
1,100002,0,6,11,0


In [34]:
print(df_school_location.shape)
df_school_location.head(2) # has a school's geographic and 2015 enrollment information

(46624, 12)


Unnamed: 0,School ID,School Name,Region,Province,Municipality,Division,District,Offering,Name of Principal,Enrolment,Latitude,Longitude
0,100001,Apaleng-Libtong ES,Region I,Ilocos Norte,Bacarra,Ilocos Norte,Bacarra I,ES,Jesusa G. Laeno,90,18.253666,120.60618
1,100002,Bacarra CES,Region I,Ilocos Norte,Bacarra,Ilocos Norte,Bacarra I,ES,Gene A. Reginaldo,456,18.250964,120.608958


In [36]:
print(df_rooms_data.shape)
df_rooms_data.head(2) # has the number of rooms a school has (standard - academic, unused, nonstandard - academic,unused)

(46412, 5)


Unnamed: 0,School ID,rooms.standard.academic,rooms.standard.unused,rooms.nonstandard.academic,rooms.nonstandard.unused
0,101746,15,0,0.0,0.0
1,102193,13,3,0.0,0.0


In [39]:
print(df_MOOE_data.shape)
df_MOOE_data.head(2) # has a school's MOOE budget for 2015

(44028, 5)


Unnamed: 0,school.id,school.name,school.enrollment,school.offering,school.mooe
0,305075,Abra HS,2481,Secondary,2182000.0
1,134966,Agtangao ES,376,Elementary,227000.0


In [40]:
print(df_school_masterlist.shape)
df_school_masterlist.head(2) #masterlist of schools in the deped database containing all relevant information gathered by the deparment

(46603, 23)


Unnamed: 0,school.id,school.name,school.region,school.region.name,school.province,school.cityhall,school.division,school.citymuni,school.district,school.legdistrict,...,school.mother.id,school.address,school.established,school.classification,school.classification2,school.curricularclass,school.organization,school.cityincome,school.cityclass,school.urban
0,101746,"A. Diaz, Sr. ES",Region I,Ilocos Region,PANGASINAN,PANGASINAN,"Pangasinan II, Binalonan",BAUTISTA,Bautista,5th District,...,101746.0,"Brgy. Dias Bautista, Pang",1/1/1930,Elementary,DepED Managed,Elementary,Monograde,P 25 M or more but less than P 35 M,,Partially Urban
1,102193,A. P. Santos ES (SPED Center),Region I,Ilocos Region,ILOCOS NORTE,ILOCOS NORTE,Laoag City,LAOAG CITY (Capital),Laoag City District II,1st District,...,102193.0,A.G. Tupaz,1/1/1944,Elementary,DepED Managed,Kinder & Elementary,Monograde,P 240 M or more but less than P 320 M,Component City,Partially Urban


In [41]:
print(df_enrollment_S.shape)
df_enrollment_S.head(2) # has a school's number of students (by Gender) per Secondary grade level (Kinder - Grade 7 - 12, SPED) 

(7977, 15)


Unnamed: 0,School ID,Grade 7 Male,Grade 7 Female,Grade 8 Male,Grade 8 Female,Grade 9 Male,Grade 9 Female,Grade 10 Male,Grade 10 Female,Grade 11 Male,Grade 11 Female,Grade 12 Male,Grade 12 Female,SPED NG Male,SPED NG Female
0,300001.0,20,12,13,17,10,15,17,14,0.0,0.0,0.0,0.0,0.0,0.0
1,300002.0,240,288,229,258,225,231,261,207,0.0,0.0,0.0,0.0,0.0,0.0


In [42]:
print(df_enrollment_E.shape)
df_enrollment_E.head(2) # has a school's number of students (by Gender) per Elementary grade level (Kinder - Grade 6, SPED) 

(38649, 17)


Unnamed: 0,School ID,Kinder Male,Kinder Female,Grade 1 Male,Grade 1 Female,Grade 2 Male,Grade 2 Female,Grade 3 Male,Grade 3 Female,Grade 4 Male,Grade 4 Female,Grade 5 Male,Grade 5 Female,Grade 6 Male,Grade 6 Female,SPED NG Male,SPED NG Female
0,100001.0,9.0,7.0,7.0,2.0,7.0,7.0,9.0,5.0,7.0,5.0,3.0,2.0,14.0,6.0,0.0,0.0
1,100002.0,41.0,25.0,38.0,33.0,41.0,40.0,28.0,31.0,38.0,30.0,26.0,31.0,22.0,32.0,0.0,0.0


## Motivation

- Spiel about PH educational system
- PH Educational Developmental Goals
- Academic performance in terms of MOOE: https://www.researchgate.net/publication/341103122_Utilization_of_Maintenance_and_Other_Operating_Expenses_MOOE_in_Relation_to_Students%27_Academic_Performance

## EDA

In [45]:
masterlist_locations = pd.merge(df_school_masterlist, df_school_location, how = "left", left_on='school.id', right_on='School ID').head(2)

In [46]:
masterlist_locations.columns

Index(['school.id', 'school.name', 'school.region', 'school.region.name',
       'school.province', 'school.cityhall', 'school.division',
       'school.citymuni', 'school.district', 'school.legdistrict',
       'school.type', 'school.abbrev', 'school.previousname',
       'school.mother.id', 'school.address', 'school.established',
       'school.classification', 'school.classification2',
       'school.curricularclass', 'school.organization', 'school.cityincome',
       'school.cityclass', 'school.urban', 'School ID', 'School Name',
       'Region', 'Province', 'Municipality', 'Division', 'District',
       'Offering', 'Name of Principal', 'Enrolment', 'Latitude', 'Longitude'],
      dtype='object')

In [47]:
masterlist_locations.head(4)

Unnamed: 0,school.id,school.name,school.region,school.region.name,school.province,school.cityhall,school.division,school.citymuni,school.district,school.legdistrict,...,Region,Province,Municipality,Division,District,Offering,Name of Principal,Enrolment,Latitude,Longitude
0,101746,"A. Diaz, Sr. ES",Region I,Ilocos Region,PANGASINAN,PANGASINAN,"Pangasinan II, Binalonan",BAUTISTA,Bautista,5th District,...,Region I,Pangasinan,Bautista,"Pangasinan II, Binalonan",Bautista,ES,Teresita B. Cabrera,781.0,15.799122,120.498531
1,102193,A. P. Santos ES (SPED Center),Region I,Ilocos Region,ILOCOS NORTE,ILOCOS NORTE,Laoag City,LAOAG CITY (Capital),Laoag City District II,1st District,...,Region I,Ilocos Norte,Laoag City (Capital),Laoag City,Laoag City District II,ES,"Christine D. Alipio, Ed.D.",465.0,18.196389,120.586667


In [None]:
masterlist_locations.['']

### Guide Questions