## Analyzing students performance in Exams

#### Importing libraries

In [1]:
import pandas as pd
import matplotlib as mpb

#### Uploading .csv file

In [2]:
df=pd.read_csv('StudentsPerformance.csv')
df

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75
...,...,...,...,...,...,...,...,...
995,female,group E,master's degree,standard,completed,88,99,95
996,male,group C,high school,free/reduced,none,62,55,55
997,female,group C,high school,free/reduced,completed,59,71,65
998,female,group D,some college,standard,completed,68,78,77


#### Overview of data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   gender                       1000 non-null   object
 1   race/ethnicity               1000 non-null   object
 2   parental level of education  1000 non-null   object
 3   lunch                        1000 non-null   object
 4   test preparation course      1000 non-null   object
 5   math score                   1000 non-null   int64 
 6   reading score                1000 non-null   int64 
 7   writing score                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 43.0+ KB


#### Finding correlations between test scores

In [4]:
df.corr(numeric_only=True)

Unnamed: 0,math score,reading score,writing score
math score,1.0,0.81758,0.802642
reading score,0.81758,1.0,0.954598
writing score,0.802642,0.954598,1.0


There is a strong correlation between all scores, but the highest correlation was spotted between reading and writing.

#### Grouping students by the race/ethnicity and suming total test scores by groups

In [5]:
df.groupby('race/ethnicity').mean(numeric_only=True).sort_values(by='math score', ascending=False)

Unnamed: 0_level_0,math score,reading score,writing score
race/ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
group E,73.821429,73.028571,71.407143
group D,67.362595,70.030534,70.145038
group C,64.46395,69.103448,67.827586
group B,63.452632,67.352632,65.6
group A,61.629213,64.674157,62.674157


Students from race group E are undoubtely performing the best scores in all three disciplines.

#### Grouping students by the gender

In [6]:
df.groupby('gender').mean(numeric_only=True)

Unnamed: 0_level_0,math score,reading score,writing score
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,63.633205,72.608108,72.467181
male,68.728216,65.473029,63.311203


While male students are better in Math, female students are giving much higher scores in Reading and Writing.

#### Grouping students by the level of education

In [7]:
df.groupby('parental level of education').mean(numeric_only=True).sort_values(by='math score', ascending=False)

Unnamed: 0_level_0,math score,reading score,writing score
parental level of education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
master's degree,69.745763,75.372881,75.677966
bachelor's degree,69.389831,73.0,73.381356
associate's degree,67.882883,70.927928,69.896396
some college,67.128319,69.460177,68.840708
some high school,63.497207,66.938547,64.888268
high school,62.137755,64.704082,62.44898


The higher the degree is, the better are the scores. Students with bachelor and master degree are performing much better than those with lower degrees.

#### Grouping students by the test preparation

In [8]:
df.groupby('test preparation course').mean(numeric_only=True)

Unnamed: 0_level_0,math score,reading score,writing score
test preparation course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completed,69.695531,73.893855,74.418994
none,64.077882,66.534268,64.504673


Those students who took preparation course, achieved much better scores.

#### Showing top20 students in Math

In [9]:
math_top_20=df.sort_values(by='math score', ascending=False).head(20)
math_top_20

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
962,female,group E,associate's degree,standard,none,100,100,100
625,male,group D,some college,standard,completed,100,97,99
458,female,group E,bachelor's degree,standard,none,100,100,100
623,male,group A,some college,standard,completed,100,96,86
451,female,group E,some college,standard,none,100,92,97
149,male,group E,associate's degree,free/reduced,completed,100,100,93
916,male,group E,bachelor's degree,standard,completed,100,100,100
263,female,group E,high school,standard,none,99,93,90
306,male,group E,some college,standard,completed,99,87,81
114,female,group E,bachelor's degree,standard,completed,99,100,100


In [10]:
math_top_20.groupby('race/ethnicity').count()

Unnamed: 0_level_0,gender,parental level of education,lunch,test preparation course,math score,reading score,writing score
race/ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
group A,2,2,2,2,2,2,2
group B,1,1,1,1,1,1,1
group C,4,4,4,4,4,4,4
group D,3,3,3,3,3,3,3
group E,10,10,10,10,10,10,10


50% from top 20 students in Math, are the students from Group E. However, in the top 20 list, there is no single student with master's degree! That shows us that best mathematicians are not those with the highest degrees.

In [11]:
reading_top_20=df.sort_values(by='reading score', ascending=False).head(20)
reading_top_20

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
546,female,group A,some high school,standard,completed,92,100,97
970,female,group D,bachelor's degree,standard,none,89,100,100
149,male,group E,associate's degree,free/reduced,completed,100,100,93
566,female,group E,bachelor's degree,free/reduced,completed,92,100,100
712,female,group D,some college,standard,none,98,100,99
179,female,group D,some high school,standard,completed,97,100,100
886,female,group E,associate's degree,standard,completed,93,100,95
381,male,group C,associate's degree,standard,completed,87,100,95
106,female,group D,master's degree,standard,none,87,100,100
903,female,group D,bachelor's degree,free/reduced,completed,93,100,100


In [12]:
reading_top_20.groupby('race/ethnicity').count()

Unnamed: 0_level_0,gender,parental level of education,lunch,test preparation course,math score,reading score,writing score
race/ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
group A,1,1,1,1,1,1,1
group C,3,3,3,3,3,3,3
group D,7,7,7,7,7,7,7
group E,9,9,9,9,9,9,9
