### Student Performance Prediction

Student Performance Prediction App predicts grade of a student by analysing academic and non-academic information of a concerned student

 1. <b>train.csv</b> - Load data to train in this csv file
 2. <b>pred.csv</b>  - Load data to run prediction in this csv file
 3. <b>output.csv</b>- Writes predicted output in this file in the column <b>y_pred</b>

 4. <b>TARGET_VARIABLE</b> - Set this variable to name of the feature/column to be predicted
 5. <b>NUMERICAL_FEATURES</b> - List all feature/column that are of numeric type

<b>Note:</b> Update variables under <b>Variables</b> section to configure inputs if required before running the notebook.

To run notebook cell by cell, click on a cell and click <b>Run</b> button below the <b>Menu</b> bar. Or to run all cells, select <b>Cell --> Run All</b> from Menu bar.

#### Import library to read file and load to dataframe

In [1]:
import pandas as pd

#### Import library to run the app

In [2]:
import model

#### Variables

In [3]:
TARGET_VARIABLE='G3' #variable to be predicted
NUMERICAL_FEATURES=["age", "G1", "G2"] #features to be interpreted as numeric

Default values

In [4]:
INPUT_TRAIN_FILE="train.csv"
INPUT_TEST_FILE="pred.csv"
OUTPUT_PRED_FILE="output.csv"
OUTPUT_PRED="y_pred"
MODEL_FILE="model"

####  Description of example data loaded
Attributes for train.csv and test.csv (Portuguese language course) datasets:

1. school - student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
2. sex - student's sex (binary: 'F' - female or 'M' - male)
3. age - student's age (numeric: from 15 to 22)
4. address - student's home address type (binary: 'U' - urban or 'R' - rural)
5. famsize - family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
6. Pstatus - parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
7. Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
8. Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
9. Mjob - mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
10. Fjob - father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
11. reason - reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
12. guardian - student's guardian (nominal: 'mother', 'father' or 'other')
13. traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
14. studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
15. failures - number of past class failures (numeric: n if 1<=n<3, else 4)
16. schoolsup - extra educational support (binary: yes or no)
17. famsup - family educational support (binary: yes or no)
18. paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
19. activities - extra-curricular activities (binary: yes or no)
20. nursery - attended nursery school (binary: yes or no)
21. higher - wants to take higher education (binary: yes or no)
22. internet - Internet access at home (binary: yes or no)
23. romantic - with a romantic relationship (binary: yes or no)
24. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
25. freetime - free time after school (numeric: from 1 - very low to 5 - very high)
26. goout - going out with friends (numeric: from 1 - very low to 5 - very high)
27. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
28. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
29. health - current health status (numeric: from 1 - very bad to 5 - very good)
30. absences - number of school absences (numeric: from 0 to 93)

These grades are related to the course subject, Portuguese:

31. G1 - first period grade (numeric: from 0 to 20)
32. G2 - second period grade (numeric: from 0 to 20)
33. G3 - final grade (numeric: from 0 to 20, output target)

#### Preview of the data

In [5]:
pd.read_csv(INPUT_TRAIN_FILE)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,reason,guardian,traveltime,studytime,failures,schoolsup,famsup,paid,activities,nursery,higher,internet,romantic,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,course,mother,2,2,0,yes,no,no,no,yes,yes,no,no,4,3,4,1,1,3,4,0,11,11
1,GP,F,17,U,GT3,T,1,1,at_home,other,course,father,1,2,0,no,yes,no,no,no,yes,yes,no,5,3,3,1,1,3,2,9,11,11
2,GP,F,15,U,LE3,T,1,1,at_home,other,other,mother,1,2,0,yes,no,no,no,yes,yes,yes,no,4,3,2,2,3,3,6,12,13,12
3,GP,F,15,U,GT3,T,4,2,health,services,home,mother,1,3,0,no,yes,no,yes,yes,yes,yes,yes,3,2,2,1,1,5,0,14,14,14
4,GP,F,16,U,GT3,T,3,3,other,other,home,father,1,2,0,no,yes,no,no,yes,yes,no,no,4,3,2,1,2,5,0,11,13,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
644,MS,F,19,R,GT3,T,2,3,services,other,course,mother,1,3,1,no,no,no,yes,no,yes,yes,no,5,4,2,1,2,5,4,10,11,10
645,MS,F,18,U,LE3,T,3,1,teacher,services,course,mother,1,2,0,no,yes,no,no,yes,yes,yes,no,4,3,4,1,1,1,4,15,15,16
646,MS,F,18,U,GT3,T,1,1,other,other,course,mother,2,2,0,no,no,no,yes,yes,yes,no,no,1,1,1,1,1,5,6,11,12,9
647,MS,M,17,U,LE3,T,3,1,services,services,course,mother,2,1,0,no,no,no,no,no,yes,yes,no,2,4,5,3,4,2,6,10,10,10


#### Training

In [6]:
model.train(INPUT_TRAIN_FILE, TARGET_VARIABLE, NUMERICAL_FEATURES)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
en,Elastic Net,0.765,1.6104,1.2059,0.8498,0.2108,0.0792,0.005
lasso,Lasso Regression,0.788,1.6494,1.2198,0.846,0.2074,0.0822,0.005
br,Bayesian Ridge,0.7994,1.6585,1.2409,0.8388,0.2041,0.0825,0.006
huber,Huber Regressor,0.8319,1.7507,1.2735,0.8311,0.2108,0.0848,0.019
omp,Orthogonal Matching Pursuit,0.8584,1.7807,1.2854,0.8268,0.2028,0.0866,0.005
et,Extra Trees Regressor,0.8404,1.8627,1.3075,0.82,0.2071,0.0872,0.113
rf,Random Forest Regressor,0.8316,1.7901,1.2905,0.8196,0.2211,0.0877,0.103
ridge,Ridge Regression,0.9134,1.9687,1.3607,0.8058,0.2067,0.0912,0.005
gbr,Gradient Boosting Regressor,0.8583,1.9153,1.3417,0.805,0.219,0.0916,0.035
lightgbm,Light Gradient Boosting Machine,0.89,1.9616,1.3591,0.8012,0.2397,0.0906,0.015


Finalising the model ...
Transformation Pipeline and Model Successfully Saved
Saved model file: model


#### Prediction

In [7]:
model.pred(INPUT_TEST_FILE)

Loading model file: model
Transformation Pipeline and Model Successfully Loaded
Output is written to the file: output.csv


#### Preview of the predicted data

In [8]:
pd.read_csv(OUTPUT_PRED_FILE)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3,y_pred
0,GP,F,15,R,LE3,T,2,2,health,services,...,1,3,1,3,4,0,11,10,11,11.0
1,MS,F,18,U,GT3,T,1,2,other,other,...,4,4,2,3,5,9,9,8,8,9.0
2,GP,M,15,U,GT3,A,3,4,services,other,...,4,4,1,1,1,0,16,16,16,16.0
3,GP,M,15,U,GT3,T,4,4,services,teacher,...,3,3,1,1,5,0,14,15,15,15.0
4,GP,M,18,U,GT3,T,4,2,teacher,other,...,3,2,1,4,5,2,15,16,16,16.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,GP,F,17,U,GT3,T,3,3,at_home,other,...,2,5,2,5,5,2,11,12,11,12.0
61,MS,F,16,R,GT3,T,2,2,at_home,other,...,4,4,2,3,5,2,12,11,12,12.0
62,MS,M,18,R,GT3,T,3,2,other,other,...,5,5,5,5,5,8,9,10,11,10.0
63,GP,F,15,R,GT3,T,2,2,at_home,other,...,3,1,1,1,2,8,14,13,12,14.0


### Customisation

#### Variables

In [9]:
TARGET_VARIABLE='G3' #variable to be predicted
NUMERICAL_FEATURES=["age", "G1", "G2"] #features to be interpreted as numeric

Default values

In [10]:
INPUT_TRAIN_FILE="student-mat.csv"
INPUT_TEST_FILE="pred.csv"
OUTPUT_PRED_FILE="output.csv"
OUTPUT_PRED="y_pred"
MODEL_FILE="model"

### Data preview

In [11]:
pd.read_csv(INPUT_TRAIN_FILE)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
390,MS,M,20,U,LE3,A,2,2,services,services,...,5,5,4,4,5,4,11,9,9,9
391,MS,M,17,U,LE3,T,3,1,services,services,...,2,4,5,3,4,2,3,14,16,16
392,MS,M,21,R,GT3,T,1,1,other,other,...,5,5,3,3,3,3,3,10,8,7
393,MS,M,18,R,LE3,T,3,2,services,other,...,4,4,1,3,4,5,0,11,12,10


### Training

In [12]:
model.train(INPUT_TRAIN_FILE, TARGET_VARIABLE, NUMERICAL_FEATURES)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
lasso,Lasso Regression,0.9899,3.2566,1.7373,0.8257,0.422,0.0678,0.005
en,Elastic Net,1.0227,3.2284,1.7364,0.8257,0.4325,0.0711,0.004
knn,K Neighbors Regressor,1.1597,3.1451,1.7512,0.8232,0.3824,0.0991,0.007
br,Bayesian Ridge,1.138,3.419,1.7908,0.8114,0.4274,0.0847,0.006
gbr,Gradient Boosting Regressor,1.0928,3.1081,1.7077,0.8102,0.3879,0.0904,0.029
et,Extra Trees Regressor,1.1563,3.374,1.7919,0.8067,0.424,0.0912,0.085
rf,Random Forest Regressor,1.0744,3.169,1.7376,0.806,0.3944,0.0876,0.082
omp,Orthogonal Matching Pursuit,1.216,3.837,1.9022,0.7878,0.4477,0.09,0.006
lightgbm,Light Gradient Boosting Machine,1.2087,3.4025,1.7966,0.7876,0.4182,0.1023,0.009
huber,Huber Regressor,1.1051,3.9277,1.9044,0.7876,0.4571,0.0752,0.019


Finalising the model ...
Transformation Pipeline and Model Successfully Saved
Saved model file: model


### Prediction

In [13]:
model.pred(INPUT_TEST_FILE)

Loading model file: model
Transformation Pipeline and Model Successfully Loaded
Output is written to the file: output.csv


### Preview of predicted data

In [14]:
pd.read_csv(OUTPUT_PRED_FILE)

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3,y_pred
0,GP,F,15,R,LE3,T,2,2,health,services,...,1,3,1,3,4,0,11,10,11,10.0
1,MS,F,18,U,GT3,T,1,2,other,other,...,4,4,2,3,5,9,9,8,8,8.0
2,GP,M,15,U,GT3,A,3,4,services,other,...,4,4,1,1,1,0,16,16,16,16.0
3,GP,M,15,U,GT3,T,4,4,services,teacher,...,3,3,1,1,5,0,14,15,15,15.0
4,GP,M,18,U,GT3,T,4,2,teacher,other,...,3,2,1,4,5,2,15,16,16,16.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,GP,F,17,U,GT3,T,3,3,at_home,other,...,2,5,2,5,5,2,11,12,11,12.0
61,MS,F,16,R,GT3,T,2,2,at_home,other,...,4,4,2,3,5,2,12,11,12,11.0
62,MS,M,18,R,GT3,T,3,2,other,other,...,5,5,5,5,5,8,9,10,11,10.0
63,GP,F,15,R,GT3,T,2,2,at_home,other,...,3,1,1,1,2,8,14,13,12,13.0
