# **Students Performance in Exams**

*We will work with a dataset of 1000 test scores (Math/ Reading/ Writing) from (female/male) students of different races/ethnicities*

![Kaggle_students.jpeg](attachment:6edfa15d-cbe7-4686-a82f-11c03c2ac7ea.jpeg)

**The aim of this study** is to analyze how the performance of students in exams varies depending on various factors (gender/ race/ parental level of education/ lunch taken before test/ test preparation course)


**This analysis will answer the following questions :**

**1-** Are there subjects where a gender is better than the other?

**2-** Is there a relationship between race/ethnicity and test score ?

**3-** How effective the test preparation course is ?

**4-** Does the level of education of parents impact their children's performance ?

**5-** At what extent can the lunch taken before a test affect the student performance ?



In [106]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns

In [107]:
data=pd.read_csv('../input/students-performance-in-exams/StudentsPerformance.csv')
print("There are ",len(df)," entries in this dataset \n")
data.head()

In [108]:
data.describe()

In [109]:
data.isnull().sum()

> The dataset has no missing values

Let's compute the average of all the three tests !


In [110]:
data['Average Score']=(data['math score']+data['reading score']+data['writing score'])/3

**1- Are There Subjects Where a Gender is Better Than the Other?**

In [111]:
plt.figure(figsize=(20,7))
plt.subplot(1, 3, 1) 

sns.barplot(x=data['gender'],y=data['reading score'],palette='hls')
plt.title('READING')
plt.ylabel('Scores')
plt.xlabel('Gender')
plt.subplot(1, 3, 3)

sns.barplot(x=data['gender'],y=data['writing score'],palette='hls')
plt.title('WRITING')
plt.ylabel('Scores')
plt.xlabel('Gender')
plt.subplot(1, 3, 2)

sns.barplot(x=data['gender'],y=data['math score'],palette='hls')
plt.title('MATHEMATHICS')
plt.ylabel('Scores')
plt.xlabel('Gender')
plt.show()


> It is clear that female candidates score higher in READING and WRITING while male candidates have better scores in MATH

**2- Is there a relationship between race/ethnicity and test score ?**

In [118]:
plt.figure(figsize=(20,7))
plt.subplot(1, 3, 1) #WHY

sns.barplot(x=df['race/ethnicity'],y=df['reading score'],palette='magma')
plt.title('READING')
plt.ylabel('Scores')
plt.xlabel('Race/Ethnicity')
plt.subplot(1, 3, 3)

sns.barplot(x=df['race/ethnicity'],y=df['writing score'],palette='magma')
plt.title('WRITING')
plt.ylabel('Scores')
plt.xlabel('Race/Ethnicity')
plt.subplot(1, 3, 2)

sns.barplot(x=df['race/ethnicity'],y=df['math score'],palette='magma')
plt.title('MATHEMATHICS')
plt.ylabel('Scores')
plt.xlabel('Race/Ethnicity')
plt.show()

In [113]:
data[(data['math score'] > 90) & (data['reading score'] > 90) & (data['writing score']>90)]\
.sort_values(by=['Average Score'],ascending=False)

> Top average score :
    
        Group E : 100.00
        Group D : 99.00
        Group C : 98.67
        Group B : 96.67
        Group A : 96.33
   
In all tests (mathematics/reading/writing) Group E is the one with highest scores, followed by Group D, Group C, Group B then Group A.

The first two toppers did not enroll in a test preparation course. So how effective the test preparation is ? 
    
   

**3- How Effective the Test Preparation Course is ?**

In [114]:
plt.figure(figsize=(20,7))
plt.subplot(1, 3, 1) #WHY

sns.barplot(x=data['test preparation course'],y=data['reading score'],palette='magma')
plt.title('READING')
plt.ylabel('Scores')
plt.xlabel('Preparation Course')
plt.subplot(1, 3, 3)

sns.barplot(x=data['test preparation course'],y=data['writing score'],palette='magma')
plt.title('WRITING')
plt.ylabel('Scores')
plt.xlabel('Preparation Course')
plt.subplot(1, 3, 2)

sns.barplot(x=data['test preparation course'],y=data['math score'],palette='magma')
plt.title('MATHEMATHICS')
plt.ylabel('Scores')
plt.xlabel('Preparation Course')
plt.show()

> Students who prepared for the test scored higher in all subjects. Therefore taking a test preparation course helps students get better grades

**4- Does the level of education of parents impact their children's performance ?**

In [117]:
plt.figure(figsize=(14,7))
plt.title('Parents Education Impact on Students Performance')
sns.barplot(x='parental level of education',y='Average Score',data=df,palette='inferno')

> It is clear that the more educated the parents are the better their children score in all subjects

**5- At what extent can the lunch taken before a test affect the student performance ?**

In [None]:
plt.figure(figsize=(20,7))
plt.subplot(1, 3, 1)

sns.barplot(x=data['lunch'],y=data['reading score'],palette='magma')
plt.title('READING')
plt.ylabel('Scores')
plt.xlabel('Lunch')
plt.subplot(1, 3, 3)

sns.barplot(x=data['lunch'],y=data['writing score'],palette='magma')
plt.title('WRITING')
plt.ylabel('Scores')
plt.xlabel('Lunch')
plt.subplot(1, 3, 2)

sns.barplot(x=data['lunch'],y=data['math score'],palette='magma')
plt.title('MATHEMATHICS')
plt.ylabel('Scores')
plt.xlabel('Lunch')
plt.show()

> Students who ate the standard meal scored higher in all subjects. Therefore lunch type affects the students performance

**Conclusion**

It turned ou that all this dataset elements affect the students performance. Factors such as the gender, ethnicity, parental level of education, test preparation to lunch type impact the grades of students