# Classification with an Academic Success Dataset

In the Classification with Academic Success Dataset project, I employed Automated Machine Learning (AutoML) to predict student outcomes in higher education. The aim is to develop a model that can classify students based on their likelihood of academic success or dropout, using a dataset rich with features such as demographic details, academic records, and socio-economic indicators. AutoML streamlines the process by automating tasks like model selection, feature engineering, and hyperparameter tuning, leading to efficient and accurate predictions. This approach facilitates early identification of at-risk students, enabling educational institutions to provide timely interventions and support.

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/playground-series-s4e6/sample_submission.csv
/kaggle/input/playground-series-s4e6/train.csv
/kaggle/input/playground-series-s4e6/test.csv


In [3]:
ss=pd.read_csv("/kaggle/input/playground-series-s4e6/sample_submission.csv")
train=pd.read_csv("/kaggle/input/playground-series-s4e6/train.csv")
test=pd.read_csv("/kaggle/input/playground-series-s4e6/test.csv")

In [4]:
ss.head()

Unnamed: 0,id,Target
0,76518,Graduate
1,76519,Graduate
2,76520,Graduate
3,76521,Graduate
4,76522,Graduate


In [5]:
train.head()

Unnamed: 0,id,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,...,Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,0,1,1,1,9238,1,1,126.0,1,1,...,0,6,7,6,12.428571,0,11.1,0.6,2.02,Graduate
1,1,1,17,1,9238,1,1,125.0,1,19,...,0,6,9,0,0.0,0,11.1,0.6,2.02,Dropout
2,2,1,17,2,9254,1,1,137.0,1,3,...,0,6,0,0,0.0,0,16.2,0.3,-0.92,Dropout
3,3,1,1,3,9500,1,1,131.0,1,19,...,0,8,11,7,12.82,0,11.1,0.6,2.02,Enrolled
4,4,1,1,2,9500,1,1,132.0,1,19,...,0,7,12,6,12.933333,0,7.6,2.6,0.32,Graduate


In [6]:
test.head()

Unnamed: 0,id,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
0,76518,1,1,1,9500,1,1,141.0,1,3,...,0,0,8,0,0,0.0,0,13.9,-0.3,0.79
1,76519,1,1,1,9238,1,1,128.0,1,1,...,0,0,6,6,6,13.5,0,11.1,0.6,2.02
2,76520,1,1,1,9238,1,1,118.0,1,1,...,0,0,6,11,5,11.0,0,15.5,2.8,-4.06
3,76521,1,44,1,9147,1,39,130.0,1,1,...,0,3,8,14,5,11.0,0,8.9,1.4,3.51
4,76522,1,39,1,9670,1,1,110.0,1,1,...,0,0,6,9,4,10.666667,2,7.6,2.6,0.32


In [7]:
train.describe()

Unnamed: 0,id,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,...,Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP
count,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,...,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0,76518.0
mean,38258.5,1.111934,16.054419,1.64441,9001.286377,0.915314,3.65876,132.378766,1.2266,19.837633,...,0.05796,0.137053,5.933414,7.234468,4.007201,9.626085,0.062443,11.52034,1.228218,-0.080921
std,22088.988286,0.441669,16.682337,1.229645,1803.438531,0.278416,8.623774,10.995328,3.392183,15.399456,...,0.40849,0.93383,1.627182,3.50304,2.772956,5.546035,0.462107,2.653375,1.398816,2.251382
min,0.0,1.0,1.0,0.0,33.0,0.0,1.0,95.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.6,-0.8,-4.06
25%,19129.25,1.0,1.0,1.0,9119.0,1.0,1.0,125.0,1.0,1.0,...,0.0,0.0,5.0,6.0,1.0,10.0,0.0,9.4,0.3,-1.7
50%,38258.5,1.0,17.0,1.0,9254.0,1.0,1.0,133.1,1.0,19.0,...,0.0,0.0,6.0,7.0,5.0,12.142857,0.0,11.1,1.4,0.32
75%,57387.75,1.0,39.0,2.0,9670.0,1.0,1.0,140.0,1.0,37.0,...,0.0,0.0,6.0,9.0,6.0,13.244048,0.0,12.7,2.6,1.79
max,76517.0,6.0,53.0,9.0,9991.0,1.0,43.0,190.0,109.0,44.0,...,12.0,19.0,23.0,33.0,20.0,18.0,12.0,16.2,3.7,3.51


In [8]:
train.isnull().sum()

id                                                0
Marital status                                    0
Application mode                                  0
Application order                                 0
Course                                            0
Daytime/evening attendance                        0
Previous qualification                            0
Previous qualification (grade)                    0
Nacionality                                       0
Mother's qualification                            0
Father's qualification                            0
Mother's occupation                               0
Father's occupation                               0
Admission grade                                   0
Displaced                                         0
Educational special needs                         0
Debtor                                            0
Tuition fees up to date                           0
Gender                                            0
Scholarship 

In [9]:
pip install pycaret

Note: you may need to restart the kernel to use updated packages.


In [10]:
from pycaret.classification import *

In [11]:
setup(data=train,target="Target")

Unnamed: 0,Description,Value
0,Session id,3640
1,Target,Target
2,Target type,Multiclass
3,Target mapping,"Dropout: 0, Enrolled: 1, Graduate: 2"
4,Original data shape,"(76518, 38)"
5,Transformed data shape,"(76518, 38)"
6,Transformed train set shape,"(53562, 38)"
7,Transformed test set shape,"(22956, 38)"
8,Numeric features,37
9,Preprocess,True


<pycaret.classification.oop.ClassificationExperiment at 0x7d58ba45b220>

In [12]:
compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
catboost,CatBoost Classifier,0.8312,0.9438,0.8312,0.8295,0.8291,0.7279,0.7293,28.694
lightgbm,Light Gradient Boosting Machine,0.8307,0.9438,0.8307,0.8301,0.8292,0.7274,0.7289,5.046
xgboost,Extreme Gradient Boosting,0.8289,0.9423,0.8289,0.8272,0.827,0.7244,0.7257,3.611
gbc,Gradient Boosting Classifier,0.8282,0.0,0.8282,0.8274,0.8264,0.7231,0.7247,47.022
rf,Random Forest Classifier,0.8244,0.9377,0.8244,0.823,0.8221,0.7166,0.7184,15.851
et,Extra Trees Classifier,0.8228,0.9366,0.8228,0.8214,0.8203,0.7138,0.7158,10.317
ada,Ada Boost Classifier,0.82,0.0,0.82,0.8207,0.8186,0.7101,0.7119,4.914
lda,Linear Discriminant Analysis,0.8052,0.0,0.8052,0.8098,0.8033,0.6846,0.6889,0.616
ridge,Ridge Classifier,0.7952,0.0,0.7952,0.7911,0.782,0.6596,0.672,0.469
lr,Logistic Regression,0.7719,0.0,0.7719,0.7639,0.7601,0.6236,0.6321,6.69


Processing:   0%|          | 0/69 [00:00<?, ?it/s]

<catboost.core.CatBoostClassifier at 0x7d588a29fd30>

In [13]:
pd.set_option("display.max_columns",None)
train.head()

Unnamed: 0,id,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Previous qualification (grade),Nacionality,Mother's qualification,Father's qualification,Mother's occupation,Father's occupation,Admission grade,Displaced,Educational special needs,Debtor,Tuition fees up to date,Gender,Scholarship holder,Age at enrollment,International,Curricular units 1st sem (credited),Curricular units 1st sem (enrolled),Curricular units 1st sem (evaluations),Curricular units 1st sem (approved),Curricular units 1st sem (grade),Curricular units 1st sem (without evaluations),Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,0,1,1,1,9238,1,1,126.0,1,1,19,5,5,122.6,0,0,0,1,0,1,18,0,0,6,6,6,14.5,0,0,6,7,6,12.428571,0,11.1,0.6,2.02,Graduate
1,1,1,17,1,9238,1,1,125.0,1,19,19,9,9,119.8,1,0,0,1,0,0,18,0,0,6,8,4,11.6,0,0,6,9,0,0.0,0,11.1,0.6,2.02,Dropout
2,2,1,17,2,9254,1,1,137.0,1,3,19,2,3,144.7,0,0,0,1,1,0,18,0,0,6,0,0,0.0,0,0,6,0,0,0.0,0,16.2,0.3,-0.92,Dropout
3,3,1,1,3,9500,1,1,131.0,1,19,3,3,2,126.1,1,0,0,1,0,1,18,0,0,7,9,7,12.59125,0,0,8,11,7,12.82,0,11.1,0.6,2.02,Enrolled
4,4,1,1,2,9500,1,1,132.0,1,19,37,4,9,120.1,1,0,0,1,0,0,18,0,0,7,12,6,12.933333,0,0,7,12,6,12.933333,0,7.6,2.6,0.32,Graduate


In [15]:
best_model=create_model("catboost")

Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.8266,0.9461,0.8266,0.8242,0.8243,0.7205,0.7218
1,0.829,0.9432,0.829,0.8282,0.8274,0.7248,0.7261
2,0.8344,0.9422,0.8344,0.8337,0.8324,0.7327,0.7346
3,0.8312,0.9429,0.8312,0.8289,0.829,0.728,0.7293
4,0.8348,0.9456,0.8348,0.8322,0.8322,0.7334,0.7349
5,0.8305,0.9425,0.8305,0.829,0.8286,0.727,0.7283
6,0.8322,0.9434,0.8322,0.8305,0.8301,0.7294,0.7308
7,0.8336,0.945,0.8336,0.8308,0.8308,0.7313,0.7329
8,0.8252,0.9409,0.8252,0.8239,0.8231,0.7181,0.7197
9,0.8346,0.9462,0.8346,0.8334,0.833,0.7337,0.7349


Processing:   0%|          | 0/4 [00:00<?, ?it/s]

In [17]:
import pickle
with open('model.pkl', 'wb') as file:
    pickle.dump(best_model, file)
with open('model.pkl', 'rb') as file:
    loaded_model = pickle.load(file)
print(loaded_model)

<catboost.core.CatBoostClassifier object at 0x7d58886203a0>


In [18]:
pred=best_model.predict(test)
pred

array([[0],
       [2],
       [2],
       ...,
       [0],
       [0],
       [0]])

In [20]:
d={0:"Dropout", 1:"Enrolled", 2:"Graduate"}
ss["Target"]=pred
ss["Target"]=ss["Target"].map(d)
ss

Unnamed: 0,id,Target
0,76518,Dropout
1,76519,Graduate
2,76520,Graduate
3,76521,Enrolled
4,76522,Enrolled
...,...,...
51007,127525,Dropout
51008,127526,Dropout
51009,127527,Dropout
51010,127528,Dropout


In [21]:
ss.to_csv("submission.csv",index=False)