# Defining Student Grades in Portugal

## Objective:

In this project, the student's academic performance are going to be predicted in terms of social, financial and other factors.


### Variables

1.	school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2.	sex - student's sex (binary: 'F' - female or 'M' - male) 
3.	age - student's age (numeric: from 15 to 22) 
4.	address - student's home address type (binary: 'U' - urban or 'R' - rural) 
5.	famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3) 
6.	Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7.	Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
8.	Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
9.	Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 
10.	Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other') 
11.	reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other') 
12.	guardian - student's guardian (nominal: 'mother', 'father' or 'other') 
13.	traveltime - home to school travel time (numeric: 1 - 1 hour) 
14.	studytime - weekly study time (numeric: 1 - 10 hours) 
15.	failures - number of past class failures (numeric: n if 1<=n<3, else 4) 
16.	schoolsup - extra educational support (binary: yes or no) 
17.	famsup - family educational support (binary: yes or no) 
18.	paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) 
19.	activities - extra-curricular activities (binary: yes or no) 
20.	nursery - attended nursery school (binary: yes or no) 
21.	higher - wants to take higher education (binary: yes or no) 
22.	internet - Internet access at home (binary: yes or no) 
23.	romantic - with a romantic relationship (binary: yes or no) 
24.	famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) 
25.	freetime - free time after school (numeric: from 1 - very low to 5 - very high) 
26.	goout - going out with friends (numeric: from 1 - very low to 5 - very high) 
27.	Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) 
28.	Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) 
29.	health - current health status (numeric: from 1 - very bad to 5 - very good) 
30.	absences - number of school absences (numeric: from 0 to 93) 
These grades are related with the course subject, Math or Portuguese: 
1.	G1 - first period grade (numeric: from 0 to 20) 
2.	G2 - second period grade (numeric: from 0 to 20) 
3.	G3 - final grade (numeric: from 0 to 20, output target) 


## 1. Downloading data and libraries

In [None]:
## Importing Libraries
import pandas as pd
import matplotlib as plt
import numpy as np
import seaborn as sb
import matplotlib.pyplot as pltm

In [None]:
# Importing the data
df = pd.read_csv('../input/student-alcohol-consumption/student-mat.csv')
df.head()

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
## Let's choose variables we need and rename them for sake of simplicity.
df = df[['age', 'sex', 'address', 'famsize', 'Medu','Fedu',
         'studytime','famsup', 'activities', 'romantic', 'absences', 'G3']]
df.columns = ['age', 'sex', 'addr', 'family_size', 'm_edu', 'f_edu',
              'study_time', 'fam_sup', 'activities', 'romantic_rel', 'absences', 'final_grade' ]
df.head()

## 2. Data Visualisation

In [None]:
df01 = df[['absences','final_grade','romantic_rel']]
sb.pairplot(df01, hue='romantic_rel', height=2.5);
## As we see below pairplots, students with none romantic relationship and lower absences have higher grades

In [None]:
df02 = df[['absences','final_grade','activities']]
sb.pairplot(df02, hue='activities', height=2.5)

In [None]:
df03 = df[['absences','final_grade','sex']]
sb.pairplot(df03, hue='sex', height=2.5)
# It seems females have more absences.

In [None]:
df04 = df[['absences','final_grade','addr']]
sb.pairplot(df04, hue='addr', height=2.5)
# It looks like students in Urban area have more absences but better grades.

In [None]:
sb.jointplot("absences", "final_grade", data=df, kind='reg')
# As seem from the below graph, there's a weak correlation between absences and final grade.

## 3. Explatory Data Analysis

In [None]:
# We can replace some values in variables. Here we can use ML techniques too. But it is one of the ways to do it.
df.replace(to_replace = {
    'sex': {'F':0, 'M':1},
    'addr':{'U':1, 'R':0},
    'family_size': {'LE3':0, 'GT3':1},
    'fam_sup': {'no':0, 'yes':1},
    'activities': {'no':0, 'yes':1},
    'romantic_rel': {'no':0, 'yes':1}}, inplace = True)
df.head()

In [None]:
df.corr()

As we see above, Peorson Correlation results show that there is no strong corelation between variables.

## 4. Multiple Linear Regression

y = β_{0} + β1X1 + β2X2 + β3X3 + .... + βpXp + e

In [None]:
## Linear regressin by Stats Models
import statsmodels.api as sm


X = df[['age', 'sex', 'addr', 'family_size', 'm_edu', 'f_edu',
              'study_time', 'fam_sup', 'activities', 'romantic_rel', 'absences']]
y = df['final_grade']

model = sm.OLS(y, X).fit()
predictions = model.predict(X)

model.summary()

## Discussion of Results

First we look at the R2 which is pretty high 0.85/1. Now let's pay attantion to Student's T value. Age, Sex, Address, Mother's education, study time are positively correlated with final grades and our coefficents are statistically significant, however romantic relationship is negatively corelatied with acedemic performance. 