# Introduction
The main aim of this kernel is to analyse how are the scores impacted based on different variables which include gender, race, lunch, test preparation course etc...

Each column is picked and has been analysed how they affect the scores. For easy understanding I have used graphs and plots.
After all visualisation is the best way to understand....


![PIC](https://showmeinstitute.org/sites/default/files/pros-cons-of-standardized-tests-860x420.jpg)

# Update Log

### V8
* Improving aesthetics of the notebook
* Adding comments and more visualizations

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('ggplot')

In [None]:
df=pd.read_csv('../input/StudentsPerformance.csv')

In [None]:
df['Total score']=df['math score']+df['reading score']+df['writing score']

In [None]:
df.head(8).T

In [None]:
df.info()

From the above output we can see there are 1000 entries for each

In [None]:
# Converting score from int --> float
df['math score']=pd.to_numeric(df['math score'],downcast='float')

In [None]:
# Printing again 
print("Average math score is    : {}".format(np.mean(df['math score'])))
print("Average reading score is : {}".format(np.mean(df['reading score'])))
print("Average writing score is : {}".format(np.mean(df['writing score'])))
print("Average total score is   : {}".format(np.mean(df['Total score'])/3))

* Performance by students in math section is lower when compared to reading and writing
* Best performance is in reading section 

In [None]:
# # Used for setting the figures in the middle
# from IPython.core.display import HTML
# HTML("""
# <style>
# .output_png {
#     display: table-cell;
#     text-align: center;
#     vertical-align: middle;
# }
# </style>
# """)

### NOTE: 
The plots that are shown below are just for showing you guys how different params that can be passed to plots.

Most of the professional works follow uniform and simple aesthetics.


<img src="https://media1.tenor.com/images/2a077aec57e04dc42bdb8233261a5fb7/tenor.gif?itemid=12042935">

In [None]:
plt.rcParams['axes.facecolor'] = "#b3ffff"
plt.rcParams['figure.facecolor'] ="#b3ffff"
plt.figure(figsize=(18,8))
plt.subplot(1, 4, 1)
plt.title('MATH SCORES')
sns.violinplot(y='math score',data=df,color='red',linewidth=3)
plt.subplot(1, 4, 2)
plt.title('READING SCORES')
sns.violinplot(y='reading score',data=df,color='green',linewidth=3)
plt.subplot(1, 4, 3)
plt.title('WRITING SCORES')
sns.violinplot(y='writing score',data=df,color='blue',linewidth=3)
plt.show()

From the above three plots its clearly visible that most of the students score in between 60-80 in Maths whereas in reading and writing most of them score from 50-80

Some fancy arugments that can be passed to `**kwargs`

    * 'hatch' : {'/', '\', '|', '-', '+', 'x', 'o', 'O', '.', '*'}
    * 'linestyle' : {'-', '--', '-.', ':', '', (offset, on-off-seq), ...}
    * 'alpha' : {0.0-1.0}
Below I will change each value and show how things change

In [None]:
plt.rcParams['figure.facecolor'] = "#e6ecff"
plt.rcParams['axes.facecolor'] = "#e6ecff"
plt.figure(figsize=(14,8))
plt.subplot(1, 3, 1)
sns.barplot(x='test preparation course',y='math score',data=df,hue='gender',palette='seismic',**{'hatch':'*','alpha':0.6,'linewidth':2})
plt.title('MATH SCORES')
plt.subplot(1, 3, 2)
sns.barplot(x='test preparation course',y='reading score',data=df,hue='gender',palette='seismic',**{'hatch':'.','alpha':0.8,'linewidth':2})
plt.title('READING SCORES')
plt.subplot(1, 3, 3)
sns.barplot(x='test preparation course',y='writing score',data=df,hue='gender',palette='seismic',**{'hatch':'x','linewidth':2})
plt.title('WRITING SCORES')
plt.show()

* From the first plot we can see the math scores of boys are better irrespective of wether they completed the course or no.
* From the next two plots we can see that girls perform more better in reading and writing
* From all three graphs its clear that if the course is completed we can achieve higher scores
* We are increasing `alpha` by 0.2 starting from 0.6-1.0

In [None]:
plt.rcParams['axes.facecolor'] = "#ffe5e5"
plt.rcParams['figure.facecolor'] = "#ffe5e5"
sns.pairplot(data=df,hue='gender',plot_kws={'alpha':0.3},palette='hot_r')

From the above plot it is clear that all the scores increase linearly with each other.

In [None]:
corr = df.corr()
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
    f, ax = plt.subplots(figsize=(8, 8))
    ax = sns.heatmap(corr,mask=mask,square=True,linewidths=.8,cmap="autumn",annot=True)

Points noted from above heatmap:
* High correlations between `total_scores` and individual_scores
* `Writing_score` and `reading_score` are also highly correlated which tells us that if a student reads well then he/she also writes well.
* `Math_score` doesnt have much high correlation so it is not neccessary that if a student performs well in maths has to perform well in other aspects or vice-versa

Parameters that can be passed to `palette` and each palette has `_r` which means reversed:

        Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, 
        BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, 
        Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, 
        Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, 
        PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, 
        Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, 
        PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples,
        Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, 
        RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, 
        Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, 
        Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, 
        YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, 
        afmhot_r, autumn, autumn_r, binary, binary_r, bone, 
        bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r,
        cool, cool_r, coolwarm, coolwarm_r, copper, copper_r,
        cubehelix, cubehelix_r, flag, flag_r, gist_earth, 
        gist_earth_r, gist_gray, gist_gray_r, gist_heat,
        gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow,
        gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, 
        gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r,
        gray, gray_r, hot, hot_r, hsv, hsv_r, icefire, icefire_r, 
        inferno, inferno_r, jet, jet_r,magma, magma_r, mako, 
        mako_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, 
        pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, 
        rainbow_r, rocket, rocket_r, seismic, seismic_r, spring,
        spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, 
        tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, 
        viridis, viridis_r, vlag, vlag_r, winter, winter_r


In [None]:
print("Unique Lunch types :",df['lunch'].unique())

Now lets see how lunch affects the scores 

In [None]:
plt.rcParams['axes.facecolor'] = "#ccffda"
plt.rcParams['figure.facecolor'] = "#ccffda"

plt.figure(figsize=(14,8))
plt.subplot(1, 3, 1)
sns.barplot(x='test preparation course',y='math score',data=df,hue='lunch',palette='viridis',edgecolor='black',**{'hatch':'/','linewidth':2})
plt.title('MATH SCORES')
plt.subplot(1, 3, 2)
sns.barplot(x='test preparation course',y='reading score',data=df,hue='lunch',palette='viridis',edgecolor='black',**{'hatch':"|",'linewidth':2,'linestyle':':'})
plt.title('READING SCORES')
plt.subplot(1, 3, 3)
sns.barplot(x='test preparation course',y='writing score',data=df,hue='lunch',palette='viridis',edgecolor='black',**{'hatch':'-','linewidth':2,'linestyle':'--'})
plt.title('WRITING SCORES')
plt.show()

In all the cases the scores are higher by having the standard lunch.

### **Checking out the toppers.**

In [None]:
df[(df['math score'] > 90) & (df['reading score'] > 90) & (df['writing score']>90)]\
.sort_values(by=['Total score'],ascending=False)

The first two toppers are either **geniuses** or they did some **malpractice**  as their test preparation course was **none**.

In [None]:
plt.rcParams['figure.facecolor'] = "#ffffe6"

plt.rcParams['axes.facecolor'] = "#ffcccc"
plt.figure(figsize=(14,8))
plt.subplot(1, 3, 1)
plt.title('MATH SCORES')
sns.barplot(x='race/ethnicity',y='math score',data=df,hue='gender',palette='Reds_r',edgecolor='#ff0000',**{'alpha':0.8,'linewidth':2})

plt.rcParams['axes.facecolor'] = "#ccffcc"
plt.subplot(1, 3, 2)
plt.title('READING SCORES')
sns.barplot(x='race/ethnicity',y='reading score',data=df,hue='gender',palette='Greens_r',edgecolor='#00ff00',**{'alpha':0.8,'linewidth':2})

plt.rcParams['axes.facecolor'] = "#e6e6ff"
plt.subplot(1, 3, 3)
plt.title('WRITING SCORES')
sns.barplot(x='race/ethnicity',y='writing score',data=df,hue='gender',palette='Blues_r',edgecolor='#0000ff',**{'alpha':0.8,'linewidth':2})
plt.show()

The above plot shows in depth how people score from different groups.

In [None]:
plt.rcParams['figure.facecolor'] = "#ffffe6"
plt.rcParams['axes.facecolor'] = "#ffffe6"
plt.figure(figsize=(12,6))
plt.title('PARENTS LEVEL OF EDUCATION')
sns.countplot(x='parental level of education',data=df,palette='inferno')
plt.tight_layout()

Thw above plot shows most of the parents went to some college or had associate's degree and there are very less people who had higher studies.

In [None]:
plt.rcParams['figure.facecolor'] = "#ffe6f9"
plt.rcParams['axes.facecolor'] = "#ffe6f9"
plt.figure(figsize=(12,6))
plt.title('PARENTS LEVEL OF EDUCATION vs CHILDREN\'s TOTAL SCORE')
sns.barplot(x=df['parental level of education'],y='Total score',data=df,palette='magma')
plt.tight_layout()

From the above plot its clear that **if the parental education is better their children tend to score better in all areas** (math, reading, writing).

## If you like this notebook an upvote will be appreciated.
## Thank you