# Analyzing Student Performance

In this notebook we analyze student performance based on factors such as Gender, Race, Parent Degree and Scores. The dataset can be found here (http://roycekimmons.com/tools/generated_data/exams)

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.figure_factory as ff
import warnings
warnings.filterwarnings('ignore')

In [None]:
data=pd.read_csv('/kaggle/input/students-performance-in-exams/StudentsPerformance.csv')

In [None]:
data.columns = ['Gender', 'Race', 'Parent Degree', 'Lunch', 'Preparation Course', 'Math Score', 'Reading Score', 'Writing Score'] 
data.head()

### Checking for null values in the dataset

In [None]:
data.isnull().sum()

### Creating a Column called Total Score which is an average of all scores

In [None]:
data['Total Score']=(data['Math Score'] + data['Reading Score'] + data['Writing Score'])/3
data.head()

## Gender vs Score Analysis

In [None]:
data.groupby(['Gender']).mean()

In [None]:

fig = ff.create_distplot([data[data['Gender']=='male']['Math Score'], data[data['Gender']=='female']['Math Score']], 
                         ['male', 'female'],
                         colors = ['#2BCDC1', '#F66095'],
                         bin_size = [2, 2]
                        )
fig.layout.update({'title': 'Gender and Math Scores'})
fig.show()

fig = ff.create_distplot([data[data['Gender']=='male']['Reading Score'], data[data['Gender']=='female']['Reading Score']], 
                         ['male', 'female'],
                         colors = ['#2BCDC1', '#F66095'],
                         bin_size = [2, 2]
                        )
fig.layout.update({'title': 'Gender and Reading Scores'})
fig.show()

fig = ff.create_distplot([data[data['Gender']=='male']['Writing Score'], data[data['Gender']=='female']['Writing Score']], 
                         ['male', 'female'],
                         colors = ['#2BCDC1', '#F66095'],
                         bin_size = [2, 2]
                        )
fig.layout.update({'title': 'Gender and Writing Scores'})
fig.show()

In [None]:
plt.figure(figsize=(15,5))

plt.subplot(1,3,1)
sns.boxplot(x = 'Gender', y = 'Math Score', data = data)

plt.subplot(1,3,2)
sns.boxplot(x = 'Gender', y = 'Reading Score', data = data)

plt.subplot(1,3,3)
sns.boxplot(x = 'Gender', y = 'Writing Score', data = data)

The above visualizations clearly shows that Male students have higher score in Math and Female students are better performers in Reading and Writing

## Preparation Course vs Scores

In [None]:
plt.figure(figsize=(15,5))

plt.subplot(1,3,1)
sns.boxplot(x = 'Preparation Course', y = 'Math Score', data = data)

plt.subplot(1,3,2)
sns.boxplot(x = 'Preparation Course', y = 'Reading Score', data = data)

plt.subplot(1,3,3)
sns.boxplot(x = 'Preparation Course', y = 'Writing Score', data = data)

## Lunch type vs Score Analysis

In [None]:
sns.barplot(x='Lunch',y='Total Score',data=data)

Students having a standard Meal plan have higher overall score than students with free/reduced meal plan. This shows that students from economically backward families have a lower scores when compared to others.

## Degree of Parents vs Total Score

In [None]:
plt.figure(figsize=(13,4))

plt.subplot(1,3,1)
sns.barplot(x = "Parent Degree" , y="Reading Score" , data=data)
plt.xticks(rotation = 90)
plt.title("Reading Scores")

plt.subplot(1,3,2)
sns.barplot(x = "Parent Degree" , y="Writing Score" , data=data)
plt.xticks(rotation=90)
plt.title("Writing Scores")

plt.subplot(1,3,3)
sns.barplot(x = "Parent Degree" , y="Math Score" , data=data)
plt.xticks(rotation=90)
plt.title("Math Scores")

plt.tight_layout()
plt.show()

From the plots there is an indication, that students of parents with bachelor's or master's degree tend to perform in academics when compared to students of parent's with high school degrees.

## Race vs Total Scores

In [None]:
sns.barplot(x='Race',y='Total Score',data=data)