# Student Performance in Exams Data Visualization
In this kernel we are going to analyze students preformance in exams with multiple aspects.

This kernel has 3 major concepts and each concept has multiple minor sebsets.

<h3><b>Table of Content</b></h3>
<ul>
    <a href='#1'><li>Initial Data Analysis</li></a>
        <ul>
            <a href='#2'><li>Importing Essential Libraries</li></a>
            <a href='#3'><li>Importing CSV File and Overview</li></a>
        </ul>
    <a href='#4'><li>Analyzing Dataset and Feature Engineering</li></a>
        <ul>
            <a href='#5'><li>Feature Engineering</li></a>
            <a href='#6'><li>Sorting Values</li></a>
            <a href='#7'><li>Group By</li></a>
                <ul>
                    <a href='#8'><li>Multi-level Index Data Frame Indexing</li></a>
                </ul>
        </ul>
    <a href='#9'><li>Seaborn Library</li></a>
        <ul>
            <a href='#10'><li>Pair Plot</li></a>
            <a href='#11'><li>Count Plot</li></a>
            <a href='#12'><li>Bar Plot</li></a>
            <a href='#13'><li>Box Plot</li></a>
            <a href='#14'><li>Violin Plot</li></a>
            <a href='#15'><li>Boxen Plot</li></a>
            <a href='#16'><li>Swarm Plot</li></a>
            <a href='#17'><li>Strip Plot</li></a>
            <a href='#18'><li>Pie Chart</li></a>
            <a href='#19'><li>Dist Plot</li></a>
        </ul>
   <a href='#20'><li>Sources</li></a>
</ul>

<p id='1'><h2><b>Initial Data Analysis</b></h2></p>
<p id='2'><h3><b>Importing Essential Libraries</b></h3></p>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('poster')
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import os
print(os.listdir("../input"))

<p id='3'><h3><b>Importing CSV File and Overview</b></h3></p>

In [None]:
df = pd.read_csv('../input/students-performance-in-exams/StudentsPerformance.csv')
df.head()

**Taking 5 random samples from dataset :**

In [None]:
df.sample(5)

**General Information about columns of dataset :**

In [None]:
df.info()

**General statistics of dataset :**

In [None]:
df.describe()

**Categorical features unique values :**

In [None]:
df.select_dtypes('object').nunique()

<p id='4'><h2><b>Analyzing Dataset and Feature Engineering</b></h2></p>
<p id='5'><h3><b>Feature Engineering</b></h3></p>

In [None]:
passmark = 40
df['math pass'] = np.where(df['math score'] < passmark , 'No' , 'Yes')
df['reading pass'] = np.where(df['reading score'] < passmark , 'No' , 'Yes')
df['writing pass'] = np.where(df['writing score'] < passmark , 'No' , 'Yes')
df.head()

In [None]:
df['final student status'] = df.apply(lambda x : 'Passed' if x['math pass'] == 'Yes' and x['reading pass'] == 'Yes' and x['writing pass'] == 'Yes' else 'Failed' , axis = 1)

df.head()

In [None]:
df['total score'] = df['math score'] + df['reading score'] + df['writing score']
df.head()

In [None]:
df['final student status'].value_counts()

<p id='6'><h3><b>Sorting Values</b></h3></p>

**Top 10 students :**

In [None]:
df.sort_values(by=['total score'] , ascending=False)[:10]

<p id='7'><h3><b>Group By</b></h3></p>

In [None]:
group_by_course_gender_data = df[['test preparation course' , 'gender' , 'math score' , 'math score' , 'writing score']]\
.groupby(['test preparation course' , 'gender']).agg('median')

group_by_course_gender_data

<p id='8'><h4><b>Multi-level Index Data Frame Indexing</b></h4></p>

In [None]:
group_by_course_gender_data.loc['none'].loc['female']['writing score']

In [None]:
df[['race/ethnicity' , 'gender' , 'total score']]\
.groupby(['race/ethnicity' , 'gender']).agg(['max' , 'min' , 'median'])

<p id='9'><h2><b>Seaborn Library</b></h2></p>
<p id='10'><h3><b>Pair Plot</b></h3></p>

**Correlations between continuous features with pair plot :**

In [None]:
sns.set()
sns.pairplot(df , diag_kind = 'kde')
plt.show()

In [None]:
sns.pairplot(df , hue = 'gender')
plt.show()

<p id='11'><h3><b>Count Plot</b></h3></p>

**How many of students are male and how many are female :**

In [None]:
sns.set_style("whitegrid")

plt.rcParams['figure.figsize'] = (7 ,  4)
sns.countplot(df['gender'] , data = df , palette = 'Set1' , linewidth = 1.5 , edgecolor = 'black')
plt.xlabel('Gender' , fontsize = 15)
plt.ylabel('Count', fontsize = 15)
plt.show()

**Math scores distribution :**

In [None]:
plt.rcParams['figure.figsize'] = (20 , 8)
sns.set_style("ticks")

sns.countplot(df['math score'], palette = 'muted')
plt.xlabel('Math Score' , fontsize = 15)
plt.ylabel('Count' , fontsize = 15)
plt.xticks(rotation = 90)
plt.show()

<p id='12'><h3><b>Bar Plot</b></h3></p>

**performances of each group on math subject :**

As this graph shows, group E has the best performance and group A has the worst performance

In [None]:
plt.rcParams['figure.figsize'] = (16 , 8)
sns.barplot(x = df['race/ethnicity'] , y = df['total score'] , data = df , order = ['group A' , 'group B' , 'group C' , 'group D',  'group E']  ,
            hue = df['test preparation course'] , palette = 'Blues_r' , linewidth = 2 , edgecolor = 'blue' , alpha = .8 , capsize = .1)
plt.xlabel('Race' , fontsize = 15)
plt.ylabel('Total Score' , fontsize = 15)

plt.legend(loc = 4)
plt.show()

In [None]:
plt.rcParams['figure.figsize'] = (18 , 6)

plt.subplot(1 , 3 , 1)
sns.barplot(x = df['lunch'] , y = df['math score'] , data = df  , hue = df['gender'] , palette = 'Blues_r' , linewidth = 2 , edgecolor = 'blue' , alpha = .8 , capsize = .1)
plt.title('Math' , fontsize = 20)
plt.legend(loc = 4)

plt.subplot(1 , 3 , 2)
sns.barplot(x = df['lunch'] , y = df['reading score'] , data = df  , hue = df['gender'] , palette = 'Reds_r' , linewidth = 2 , edgecolor = 'red' , alpha = .8 , capsize = .1)
plt.title('Reading', fontsize = 20)
plt.legend(loc = 4)

plt.subplot(1 , 3 , 3)
sns.barplot(x = df['lunch'] , y = df['writing score']   , hue = df['gender'] , palette = 'Greens_r' , linewidth = 2 , edgecolor = 'green' , alpha = .8 , capsize = .1)
plt.title('Writing', fontsize = 20)
plt.legend(loc = 4)

plt.show()

<p id='13'><h3><b>Box Plot</b></h3></p>

In [None]:
plt.rcParams['figure.figsize'] = (7 , 3)
sns.boxplot(x = df['total score'] , data = df , palette = 'Accent')
plt.xlabel('Total Score' , fontsize  = 15)
plt.show()

**According to bottom graph males have better performance than females on math subject. In general if anybody compeleted the 'test preparation course'  would get a better score in math exam**

In [None]:
sns.set()
plt.rcParams['figure.figsize'] = (16 , 8)

sns.boxplot(x = df['gender'] , y = df['math score'] , data = df , hue = df['test preparation course'] , palette="Set3" , linewidth = 2.5)
plt.xlabel('Gender' , fontsize = 15)
plt.ylabel('Math Score' , fontsize = 15)
plt.show()

<p id='14'><h3><b>Violin Plot</b></h3></p>

In [None]:
plt.rcParams['figure.figsize'] = (16 , 8)

sns.violinplot(x = df['gender'] , y = df['reading score'] , data = df , hue = df['test preparation course'] , palette="Set2" , linewidth = 2.5)
plt.xlabel('Gender' , fontsize = 15)
plt.ylabel('Reading Score' , fontsize = 15)
plt.legend(loc = 4)
plt.show()

<p id='15'><h3><b>Boxen Plot</b></h3></p>

In [None]:
sns.set()
plt.rcParams['figure.figsize'] = (16 , 8)

sns.boxenplot(x = df['gender'] , y = df['writing score'] , data = df , hue = df['test preparation course'] , palette="Set1")
plt.xlabel('Gender' , fontsize = 15)
plt.ylabel('Writing Score' , fontsize = 15)
plt.legend(loc = 4)
plt.show()

<p id='16'><h3><b>Swarm Plot</b></h3></p>

In [None]:
sns.set()
plt.rcParams['figure.figsize'] = (16 , 8)

sns.swarmplot(x = df['lunch'] , y = df['total score'] , data = df)
plt.xlabel('Lunch' , fontsize = 15)
plt.ylabel('Total Score' , fontsize = 15)
plt.show()

<p id='17'><h3><b>Strip Plot</b></h3></p>

**As this graph shows, students with higher degree would get better scores than students with lower degree :**

In [None]:
sns.set_style("ticks")
plt.rcParams['figure.figsize'] =(16 , 6)

sns.stripplot(x = df['parental level of education']  , y = df['total score'] , data = df , marker = '*')
plt.xlabel('Parental Level of Education' , fontsize = 15)
plt.ylabel('Total Score' , fontsize = 15)
plt.show()

<p id='18'><h3><b>Pie Chart</b></h3></p>

In [None]:
plt.rcParams['figure.figsize'] = (8  , 8)
df['final student status'].value_counts().plot.pie(colors = ['orange' , 'green'] , explode = (0.1 , 0))
plt.title('Final Student Status', fontweight = 30, fontsize = 20)
plt.xlabel('')
plt.ylabel('')
plt.show()

<p id='19'><h3><b>Dist Plot</b></h3></p>

**Math , reading and writing scores distributions :**

In [None]:
sns.set()
plt.rcParams['figure.figsize'] = (20 , 6)

plt.subplot(1 , 3 , 1)
sns.distplot(df['math score'] , color = 'red')
plt.title('Math' , fontsize = 20)
plt.xlabel('Math Score' , fontsize = 15)

plt.subplot(1 , 3 , 2)
sns.distplot(df['reading score'] , color = 'red')
plt.title('Reading', fontsize = 20)
plt.xlabel('Reading Score' , fontsize = 15)

plt.subplot(1 , 3 , 3)
sns.distplot(df['writing score']   , color = 'red')
plt.title('Writing', fontsize = 20)
plt.xlabel('Writing Score' , fontsize = 15)

plt.show()

<p><h2>If you like it, please upvote.</h2></p>
<p><h2>Thank You.</h2></p>
<p>Last Updated: <b>03/09/2020</b></p>

<p id='20'><h2><b>Sources</b></h2></p>
<p><h4>https://seaborn.pydata.org</h4></p>
<p><h4>https://www.codecademy.com/articles/seaborn-design-i</h4></p>
<p><h4>https://www.kaggle.com/roshansharma/student-performance-analysis</h4></p>
<p><h4>https://www.kaggle.com/kralmachine/seaborn-tutorial-for-beginners</h4></p>

