Case Study on Measures of Central Tendency and Dispersion

An Institution wishes to find out their student’s ability in maths, reading and writing skills. The Institution wants to do an exploratory study to check the following information.

1. Find out how many males and females participated in the test. 
2. What do you think about the students' parental level of education? 
3. Who scores the most on average for math, reading and writing based on ● Gender ● Test preparation course
4. What do you think about the scoring variation for math, reading and writing based on ● Gender ● Test preparation course 
5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [1]:
import numpy as np
import pandas as pd

# Reading the Data

In [7]:
dfScores = pd.read_csv('studentsperformance.csv')
dfScores.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group C,some high school,free/reduced,none,0,17,10
1,female,group B,high school,free/reduced,none,8,24,23
2,female,group B,some high school,free/reduced,none,18,32,28
3,female,group B,some college,standard,none,11,38,32
4,female,group C,some college,free/reduced,none,22,39,33


# 1. Find out how many males and females participated in the test. 

In [3]:
dfScores.groupby("gender").size()

gender
female    518
male      482
dtype: int64

In [5]:
#518 Females and 482 Males participated in the test

# 2. What do you think about the students' parental level of education? 

In [8]:
dfScores.groupby("parental level of education").size()

parental level of education
associate's degree    222
bachelor's degree     118
high school           196
master's degree        59
some college          226
some high school      179
dtype: int64

# 59 parents are with master's degree, 118 with bachelor's degree,  222 parents are with associate's degree , 226 - some college , 196 high school and 179 some high school

# 3. Who scores the most on average for math, reading and writing based on ● Gender ● Test preparation course

In [57]:
dfAgg = dfScores.groupby(["gender","test preparation course"]).agg({'math score':['mean']}).round(2)
dfAgg

Unnamed: 0_level_0,Unnamed: 1_level_0,math score
Unnamed: 0_level_1,Unnamed: 1_level_1,mean
gender,test preparation course,Unnamed: 2_level_2
female,completed,67.11
female,none,61.32
male,completed,72.99
male,none,66.47


# Female students who had completed the test preparation course has more math score on average than test preparation course none
# Among males also the same
# Male students with test preparation course completed scores more on maths than females

In [58]:
dfAgg = dfScores.groupby(["gender","test preparation course"]).agg({'reading score':['mean']}).round(2)
dfAgg

Unnamed: 0_level_0,Unnamed: 1_level_0,reading score
Unnamed: 0_level_1,Unnamed: 1_level_1,mean
gender,test preparation course,Unnamed: 2_level_2
female,completed,77.38
female,none,69.96
male,completed,70.79
male,none,62.58


# Female students who had completed the test preparation course has more reading score on average than test preparation course none
# Among males also the same
# Female students with test preparation course completed scores more on reading than male students

In [59]:
dfAgg = dfScores.groupby(["gender","test preparation course"]).agg({'writing score':['mean']}).round(2)
dfAgg

Unnamed: 0_level_0,Unnamed: 1_level_0,writing score
Unnamed: 0_level_1,Unnamed: 1_level_1,mean
gender,test preparation course,Unnamed: 2_level_2
female,completed,78.79
female,none,68.98
male,completed,70.34
male,none,59.55


# Female students who had completed the test preparation course has more writing  score on average than test preparation course none
# Among males also the same
# Female students with test preparation course completed scores more on writing  than male students

# 4. What do you think about the scoring variation for math, reading and writing based on ● Gender ● Test preparation course

In [None]:
Students who completed the test preparation course scores more on math,reading and writing in both the genders
Female students got more on reading and writing where as male students got more on math.
The test preparation course has a positive effect on both Male and female students


In [60]:
 dfScores.groupby(["gender","test preparation course"]).var()

Unnamed: 0_level_0,Unnamed: 1_level_0,math score,reading score,writing score
gender,test preparation course,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,completed,208.173913,161.175546,149.836897
female,none,272.602767,214.24122,225.495172
male,completed,197.098133,185.297987,166.098233
male,none,205.5136,184.843553,181.394687


# 5. The management needs your help to give bonus points to the top 25% of students based on their maths score, so how will you help the management to achieve this.

In [142]:
dfScores[dfScores['math score'] >= dfScores.quantile(.75)['math score']]

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
414,female,group A,high school,free/reduced,completed,77,88,85
415,female,group B,master's degree,free/reduced,completed,77,97,94
416,female,group B,bachelor's degree,free/reduced,none,77,85,87
417,female,group B,master's degree,standard,none,77,90,84
418,female,group B,high school,standard,completed,77,82,89
...,...,...,...,...,...,...,...,...
995,male,group E,some college,standard,completed,99,87,81
996,male,group A,some college,standard,completed,100,96,86
997,male,group D,some college,standard,completed,100,97,99
998,male,group E,associate's degree,free/reduced,completed,100,100,93


In [None]:
in the above result top 25% is taken from the last quartile.
students with math score greater than or equal to 77 are taken out.
so it gives 255 students from the 1000 as top 25%