<img src='https://thumbs.gfycat.com/JitteryDelectableFieldmouse-size_restricted.gif'>

# HOW STUDENTS ARE PERFORMING IN EXAMS

# CONTENTS

<ul>
    <li>Importing Libraries </li>
    <li>Reading data</li>
    <li>Data Cleaning </li>
    <li>Visualisations </li>
    <li>Relations between scores </li>
    <li> Analysis of data </li>
    <li>Conclusions </li>

# **This data shows how students are performing in exams using demographic and socioeconomic information.**

# Importing required libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import scipy as sp
import re
import time
import matplotlib.pyplot as plt
import seaborn as sns
import os
import matplotlib as mpl
mpl.rcParams.update(mpl.rcParamsDefault)

# Reading the data

In [None]:
df=pd.read_csv("../input/students-performance-in-exams/StudentsPerformance.csv")

``Getting output of first 5 elements``

In [None]:
df.shape

``Dataset has 1000 rows and 9 columns``

In [None]:
df.head()

In [None]:
df

``It shows that this data has 1000 rows and 8 columns``

In [None]:
df.info()

In [None]:
df.describe()

``It shows that reading score has most significant mean``

In [None]:
df.isnull().sum()

``Here we can see we have 8 null values``

In [None]:
df.columns=[c.replace(' ','') for c in df.columns]
df.columns=[c.replace('/','') for c in df.columns]
df.columns

# Visualisations

In [None]:
fig,ax = plt.subplots(1,1,figsize=(8,3))
group=df['raceethnicity'].value_counts().sort_values(ascending=False)
ax.bar(group.index,group.values)
for an in group.index:
    ax.annotate(group[an],xy=(an,group[an]+10),va='center',ha='center')
plt.show()

In [None]:
plt.figure(figsize=(4,3))
sns.set_style("dark")
sns.countplot(x="gender",data=df)
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()
print(df.gender.value_counts())


``Shows  female    518 male      482``

**Data shows that females are more as compared to males**

In [None]:
sns.set_style("dark")
sns.countplot(x="raceethnicity",data=df)
plt.xlabel("Race/Ethnicity")
plt.ylabel("Counts")
plt.show()



In [None]:
sns.countplot(df['raceethnicity'],hue=df['gender'])
plt.show()

``Data shows 5 types of groups with majority of group C and minority of group B``

In [None]:
sns.set_style("dark")
sns.countplot(y="parentallevelofeducation",data=df)
plt.ylabel("Parental level of education")
plt.xlabel("Counts")
plt.show()
print(df.parentallevelofeducation.value_counts())

``some college 226   associate's degree  222  high school   196 some high school   179 bachelor's degree 118 master's degree  59``

In [None]:
sns.set_style("dark")
plt.figure(figsize=(4,2))
sns.countplot(y="lunch",data=df)
plt.ylabel("lunch")
plt.xlabel("Counts")
plt.show()

``Standard lunch is taken more as compared to free/reduced lunch``

In [None]:
plt.figure(figsize=(4,2))
sns.set_style("darkgrid")
sns.countplot(y="testpreparationcourse",data=df)
plt.ylabel("Testpreparationcourse")
plt.xlabel("Counts")
plt.show()
print(df.testpreparationcourse.value_counts())

`` Not prepared 642 prepared 358``

# Lets find out relation between test scores

In [None]:
sns.set_style("darkgrid")
plt.title("Maths score vs Reading score")
plt.xlabel('Maths score',size=20)
plt.ylabel('Reading score',size=20)
sns.scatterplot(x='mathscore',y='readingscore',data=df,hue='gender',hue_order=['male','female'])
plt.show()

In [None]:
sns.set_style("darkgrid")
plt.title("Reading score VS Writing score")
plt.ylabel('Writing score',size=20)
plt.xlabel('Reading score',size=20)
sns.scatterplot(y='writingscore',x='readingscore',data=df,hue='gender',hue_order=['male','female'])
plt.show()

In [None]:
sns.set_style("darkgrid")
plt.title("Maths score vs writing score")
plt.xlabel('Maths score',size=20)
plt.ylabel('Writing score',size=20)
sns.scatterplot(x='mathscore',y='writingscore',data=df,hue='gender',hue_order=['male','female'])
plt.show()

``Student score in maths vs (reading and writing) are little spread out but they generally follow an uptrend so if a student score more in maths he/she will also generally score more in other subjects. While scores in reading vs writing are more linear.``

In [None]:
total_marks=((df['mathscore']+df['readingscore']+df['writingscore'])/300)*100
df['total_marks']=total_marks
kde_data=df[['mathscore','readingscore', 'writingscore','total_marks']]

# Now lets find out that these features play an impact on marks


In [None]:
sns.set_style("darkgrid")
sns.kdeplot(data=kde_data,shade=True)
plt.show()

``Writing score is less and total score is more``

In [None]:
sns.catplot(x='raceethnicity',y='total_marks',data=df,hue='testpreparationcourse')
plt.show()
print(df.testpreparationcourse.value_counts())

``Group C has prepared most for the test and group D ha least``

In [None]:
sns.catplot(y='parentallevelofeducation',x='total_marks',data=df,kind='bar')
plt.title('Parental level of education vs total marks')
plt.show()

In [None]:
sns.catplot(x='lunch',y='total_marks',data=df,hue='lunch',palette='cubehelix')
plt.title('Lunch vs Total marks')
plt.show()

In [None]:
numaricdata=df[['mathscore','readingscore','writingscore','total_marks']]
fig,ax=plt.subplots(2,2,figsize=(10,8))
for i,idx in enumerate(numaricdata):
    sns.histplot(ax=ax[i%2,i//2],data=numaricdata[idx],kde=True)
plt.show()    

In [None]:
readingdata=df[['gender','raceethnicity','parentallevelofeducation','lunch','testpreparationcourse']]
sns.catplot(x='gender',y='total_marks',data=df)
plt.title('Gender vs Total marks')
plt.show()

In [None]:
plt.figure(figsize=(4,3))
sns.heatmap(df.corr(),annot=True)
plt.show()

``To score high in math you must need high reading and writing score <br>
Reading and writing score are highly corelated <br>
The best way to score high is to focus mainly on corelated subjects``

# Conclusions made from the data

# ``Student with standard lunch have more average total score`` <br/>
# ``Test preprations depend upon parental level of education`` <br/>
# ``Test prepration of group D is minimum while prepration of group C is maximum``

# THIS KERNEL ENDS HERE 

**Guys if you like this kernel please show some respect**

<img src="https://contenthub-static.grammarly.com/blog/wp-content/uploads/2017/10/thank-you.jpg">