# Schools Analytics

This analysis is based on data collected from education statistics published by the ministry of education in Rwanda each year. <br> The data used today was taken from Rwanda Education Statistics, published yearly by MINEDUC. <br>
This data was collected from 2016 and 2017 publications. 
Data wrangling for this dataset has been done as part of another project. which was undocumented. 

### Dataset description

the dataset has the following features: 
* Level: describe the type of school. unique values are Nursery , Primary, secondary and TVET 1&2
* district, its describe the number of schools, pupils, level for the given district
* province: the province the district belongs to
* schools_16 : describe the number of schools in a particular level in the given district in 2016
* pupils_16: describe the number of pupils in a given level for a district in 2016
* schools_17 : describe the number of schools in a particular level in the given district in 2017
* pupils_17: describe the number of pupils in a given level for a district in 2017

###  Analysis Questions

1. Rank provinces by the numbers of schools in 2016, which provinces had the most schools in 2016? 
2. Was the tendency the same in 2017? which provinces had the most changes? either in number of schools or number of pupils? 
3. which district has the most nursery schools?
4. which district has the most primary schools?
5. which district has the most secondary schools?
6. which district has the most  TVET schools? 
7. which district has the most students?  (selector for level and district)
8. what is the distribution of Nursery schools on map? (drop down for levels)

### Imports

In [24]:
import pandas as pd
import seaborn as sb
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output
import plotly.graph_objs as go

### Importing Data 

In [25]:
df = pd.read_csv('schools1617.csv')
df.sample(10)

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
109,110,TVET level 1&2,Nyagatare,east,2,17,1,8
98,99,TVET level 1&2,Ngororero,west,6,76,5,51
77,78,Secondary School,Kirehe,east,52,15094,51,17134
24,25,Nursery,Gicumbi,north,154,7626,146,8472
30,31,Primary,Huye,south,99,69019,98,67906
112,113,TVET level 1&2,Gakenke,north,4,41,5,28
108,109,TVET level 1&2,Ngoma,east,7,70,7,52
11,12,Nursery,Nyamasheke,west,121,6941,121,9250
62,63,Secondary School,Muhanga,south,61,22227,60,24228
79,80,Secondary School,Nyagatare,east,54,22432,53,23109


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 8 columns):
N              120 non-null object
level          119 non-null object
district       119 non-null object
province       119 non-null object
schools_16     120 non-null int64
pupils_2016    120 non-null int64
schools_17     120 non-null int64
pupils_17      120 non-null int64
dtypes: int64(4), object(4)
memory usage: 7.6+ KB


In [4]:
df.level.unique()

array(['Nursery', 'Primary', 'Secondary School', 'TVET level 1&2', nan],
      dtype=object)

### Grouping Data by Provinces

#### South Province

In [32]:
south_df = df.query('province=="south"')
south_df.sample()

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
61,62,Secondary School,Kamonyi,south,55,18490,56,20484


#### North Province

In [34]:
north_df = df.query('province=="north"')
north_df.sample()

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
23,24,Nursery,Gakenke,north,137,8131,136,8579


#### East Province

In [35]:
east_df = df.query('province=="east"')
east_df.sample()

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
80,81,Secondary School,Rwamagana,east,55,17369,56,19164


#### West Province

In [36]:
west_df = df.query('province=="west"')
west_df.sample()

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
42,43,Primary,Rusizi,west,119,93689,119,95434


### Numbers of Schools By Provinces

In [37]:
south_df.head()

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
0,1,Nursery,Gisagara,south,32,2358,41,3358
1,2,Nursery,Huye,south,101,6067,92,7333
2,3,Nursery,Kamonyi,south,75,5389,118,9487
3,4,Nursery,Muhanga,south,114,7009,139,9638
4,5,Nursery,Nyamagabe,south,78,5975,87,6607


In [33]:
south_nursery_df = south_df.query('level=="Nursery"')
south_nursery_df   

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
0,1,Nursery,Gisagara,south,32,2358,41,3358
1,2,Nursery,Huye,south,101,6067,92,7333
2,3,Nursery,Kamonyi,south,75,5389,118,9487
3,4,Nursery,Muhanga,south,114,7009,139,9638
4,5,Nursery,Nyamagabe,south,78,5975,87,6607
5,6,Nursery,Nyanza,south,85,7533,86,7169
6,7,Nursery,Nyaruguru,south,46,2951,58,4002
7,8,Nursery,Ruhango,south,74,4253,76,4976


In [None]:
north_nursery_df = north_df.query('level=="Nursery"')
south_nursery_df

In [16]:
south_pupils_17 = south_nursery_df['pupils_17'].sum()
south_pupils_16 = south_nursery_df['pupils_2016'].sum()
south_schools_16 = south_nursery_df['schools_16'].sum()
south_schools_17 = south_nursery_df['schools_17'].sum()
south_pupils_17, south_pupils_16, south_schools_16, south_schools_17

(52570, 41535, 605, 697)

In [17]:
north_nursery_df = df.query('level=="Nursery" & province=="north"')
north_nursery_df

Unnamed: 0,N,level,district,province,schools_16,pupils_2016,schools_17,pupils_17
22,23,Nursery,Burera,north,103,6410,111,7597
23,24,Nursery,Gakenke,north,137,8131,136,8579
24,25,Nursery,Gicumbi,north,154,7626,146,8472
25,26,Nursery,Musanze,north,122,7761,142,8633
26,27,Nursery,Rulindo,north,90,7376,97,7490


In [18]:
north_pupils_17 = north_nursery_df['pupils_17'].sum()
north_pupils_16 = north_nursery_df['pupils_2016'].sum()
north_schools_16 = south_nursery_df['schools_16'].sum()
north_schools_17 = south_nursery_df['schools_17'].sum()
north_pupils_17, north_pupils_16, north_schools_16, north_schools_17

(40771, 37304, 605, 697)

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 120 entries, 0 to 119
Data columns (total 8 columns):
N              120 non-null object
level          119 non-null object
district       119 non-null object
province       119 non-null object
schools_16     120 non-null int64
pupils_2016    120 non-null int64
schools_17     120 non-null int64
pupils_17      120 non-null int64
dtypes: int64(4), object(4)
memory usage: 7.6+ KB
