Here's a description of the features in the dataset:

1. **Gender**: The gender of the student (e.g., Boy, Girl).
2. **Age**: The age group of the student (e.g., 21-25, 16-20).
3. **Education Level**: The education level of the student (e.g., University, College, School).
4. **Institution Type**: The type of institution the student attends (e.g., Government, Non Government).
5. **IT Student**: Whether the student is an IT student or not (Yes/No).
6. **Location**: Location of the student (e.g., Yes).
7. **Load-shedding**: Level of load-shedding experienced by the student (e.g., Low, High).
8. **Financial Condition**: Financial condition of the student (e.g., Mid, Poor).
9. **Internet Type**: Type of internet connection used by the student (e.g., Wifi, Mobile Data).
10. **Network Type**: Type of network used by the student (e.g., 4G, 3G).
11. **Class Duration**: Duration of class attended by the student (e.g., 3-6, 1-3).
12. **Self Lms**: Whether the student uses self-learning management system or not (Yes/No).
13. **Device**: Device used by the student (e.g., Tab, Mobile).
14. **Adaptivity Level**: Target variable representing the adaptivity level of the student.


In [1]:
import numpy as np
import pandas as pd 

In [2]:
df = pd.read_csv("datasets\students_adaptability_level_online_education.csv")
df.head()

Unnamed: 0,Gender,Age,Education Level,Institution Type,IT Student,Location,Load-shedding,Financial Condition,Internet Type,Network Type,Class Duration,Self Lms,Device,Adaptivity Level
0,Boy,21-25,University,Non Government,No,Yes,Low,Mid,Wifi,4G,3-6,No,Tab,Moderate
1,Girl,21-25,University,Non Government,No,Yes,High,Mid,Mobile Data,4G,1-3,Yes,Mobile,Moderate
2,Girl,16-20,College,Government,No,Yes,Low,Mid,Wifi,4G,1-3,No,Mobile,Moderate
3,Girl,11-15,School,Non Government,No,Yes,Low,Mid,Mobile Data,4G,1-3,No,Mobile,Moderate
4,Girl,16-20,School,Non Government,No,Yes,Low,Poor,Mobile Data,3G,0,No,Mobile,Low


In [3]:
df.shape

(1205, 14)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1205 entries, 0 to 1204
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Gender               1205 non-null   object
 1   Age                  1205 non-null   object
 2   Education Level      1205 non-null   object
 3   Institution Type     1205 non-null   object
 4   IT Student           1205 non-null   object
 5   Location             1205 non-null   object
 6   Load-shedding        1205 non-null   object
 7   Financial Condition  1205 non-null   object
 8   Internet Type        1205 non-null   object
 9   Network Type         1205 non-null   object
 10  Class Duration       1205 non-null   object
 11  Self Lms             1205 non-null   object
 12  Device               1205 non-null   object
 13  Adaptivity Level     1205 non-null   object
dtypes: object(14)
memory usage: 131.9+ KB


In [5]:
df.isnull().sum()

Gender                 0
Age                    0
Education Level        0
Institution Type       0
IT Student             0
Location               0
Load-shedding          0
Financial Condition    0
Internet Type          0
Network Type           0
Class Duration         0
Self Lms               0
Device                 0
Adaptivity Level       0
dtype: int64

In [6]:
df.columns

Index(['Gender', 'Age', 'Education Level', 'Institution Type', 'IT Student',
       'Location', 'Load-shedding', 'Financial Condition', 'Internet Type',
       'Network Type', 'Class Duration', 'Self Lms', 'Device',
       'Adaptivity Level'],
      dtype='object')

In [7]:
# Seperating target variable from data
target_variable = df['Adaptivity Level']
target_variable.head()

0    Moderate
1    Moderate
2    Moderate
3    Moderate
4         Low
Name: Adaptivity Level, dtype: object

In [8]:
# Independent variables
predictor_variable = df[['Gender', 'Age', 'Education Level', 'Institution Type', 'IT Student',
       'Location', 'Load-shedding', 'Financial Condition', 'Internet Type',
       'Network Type', 'Class Duration', 'Self Lms', 'Device']]

predictor_variable.head()

Unnamed: 0,Gender,Age,Education Level,Institution Type,IT Student,Location,Load-shedding,Financial Condition,Internet Type,Network Type,Class Duration,Self Lms,Device
0,Boy,21-25,University,Non Government,No,Yes,Low,Mid,Wifi,4G,3-6,No,Tab
1,Girl,21-25,University,Non Government,No,Yes,High,Mid,Mobile Data,4G,1-3,Yes,Mobile
2,Girl,16-20,College,Government,No,Yes,Low,Mid,Wifi,4G,1-3,No,Mobile
3,Girl,11-15,School,Non Government,No,Yes,Low,Mid,Mobile Data,4G,1-3,No,Mobile
4,Girl,16-20,School,Non Government,No,Yes,Low,Poor,Mobile Data,3G,0,No,Mobile


## Encoding 

In [9]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

In [10]:
# We will apply Label encoder on target variable and OneHotEncoder on predictor variable.
le = LabelEncoder()
target_variable = le.fit_transform(target_variable)
target_variable

array([2, 2, 2, ..., 2, 1, 2])

In [11]:
predictor_variable = pd.get_dummies(predictor_variable, drop_first=True)
predictor_variable.head()

Unnamed: 0,Gender_Girl,Age_11-15,Age_16-20,Age_21-25,Age_26-30,Age_6-10,Education Level_School,Education Level_University,Institution Type_Non Government,IT Student_Yes,...,Financial Condition_Poor,Financial Condition_Rich,Internet Type_Wifi,Network Type_3G,Network Type_4G,Class Duration_1-3,Class Duration_3-6,Self Lms_Yes,Device_Mobile,Device_Tab
0,False,False,False,True,False,False,False,True,True,False,...,False,False,True,False,True,False,True,False,False,True
1,True,False,False,True,False,False,False,True,True,False,...,False,False,False,False,True,True,False,True,True,False
2,True,False,True,False,False,False,False,False,False,False,...,False,False,True,False,True,True,False,False,True,False
3,True,True,False,False,False,False,True,False,True,False,...,False,False,False,False,True,True,False,False,True,False
4,True,False,True,False,False,False,True,False,True,False,...,True,False,False,True,False,False,False,False,True,False


In [12]:
# Converting into numbers
predictor_variable = predictor_variable.astype(int)
predictor_variable.head()

Unnamed: 0,Gender_Girl,Age_11-15,Age_16-20,Age_21-25,Age_26-30,Age_6-10,Education Level_School,Education Level_University,Institution Type_Non Government,IT Student_Yes,...,Financial Condition_Poor,Financial Condition_Rich,Internet Type_Wifi,Network Type_3G,Network Type_4G,Class Duration_1-3,Class Duration_3-6,Self Lms_Yes,Device_Mobile,Device_Tab
0,0,0,0,1,0,0,0,1,1,0,...,0,0,1,0,1,0,1,0,0,1
1,1,0,0,1,0,0,0,1,1,0,...,0,0,0,0,1,1,0,1,1,0
2,1,0,1,0,0,0,0,0,0,0,...,0,0,1,0,1,1,0,0,1,0
3,1,1,0,0,0,0,1,0,1,0,...,0,0,0,0,1,1,0,0,1,0
4,1,0,1,0,0,0,1,0,1,0,...,1,0,0,1,0,0,0,0,1,0


## Standardization

In [13]:
# Scaling  our data
from sklearn.preprocessing import StandardScaler

# Initialize StandardScaler
scaler = StandardScaler()

# Standardize the entire DataFrame
predictor_variable = pd.DataFrame(scaler.fit_transform(predictor_variable), columns=predictor_variable.columns)
predictor_variable.head()

Unnamed: 0,Gender_Girl,Age_11-15,Age_16-20,Age_21-25,Age_26-30,Age_6-10,Education Level_School,Education Level_University,Institution Type_Non Government,IT Student_Yes,...,Financial Condition_Poor,Financial Condition_Rich,Internet Type_Wifi,Network Type_3G,Network Type_4G,Class Duration_1-3,Class Duration_3-6,Self Lms_Yes,Device_Mobile,Device_Tab
0,-0.904155,-0.643676,-0.547624,1.490612,-0.244554,-0.210224,-0.886107,1.281618,0.68129,-0.580864,...,-0.501296,-0.275487,1.167367,-0.719467,0.744875,-1.517027,2.170461,-0.459408,-2.296964,6.258328
1,1.106005,-0.643676,-0.547624,1.490612,-0.244554,-0.210224,-0.886107,1.281618,0.68129,-0.580864,...,-0.501296,-0.275487,-0.856629,-0.719467,0.744875,0.659184,-0.460732,2.176717,0.435357,-0.159787
2,1.106005,-0.643676,1.82607,-0.670865,-0.244554,-0.210224,-0.886107,-0.780264,-1.467805,-0.580864,...,-0.501296,-0.275487,1.167367,-0.719467,0.744875,0.659184,-0.460732,-0.459408,0.435357,-0.159787
3,1.106005,1.553576,-0.547624,-0.670865,-0.244554,-0.210224,1.128532,-0.780264,0.68129,-0.580864,...,-0.501296,-0.275487,-0.856629,-0.719467,0.744875,0.659184,-0.460732,-0.459408,0.435357,-0.159787
4,1.106005,-0.643676,1.82607,-0.670865,-0.244554,-0.210224,1.128532,-0.780264,0.68129,-0.580864,...,1.994828,-0.275487,-0.856629,1.389919,-1.342507,-1.517027,-0.460732,-0.459408,0.435357,-0.159787


## Model building

In [14]:
from sklearn.model_selection import train_test_split

X = predictor_variable
y = target_variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

##  Random Forest

In [15]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(random_state=42)

# Train the classifier on the training data
rf_classifier.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = rf_classifier.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy_rf = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy_rf)

Accuracy: 0.9128630705394191


## Logistic Regression

In [16]:
# Initialize the logistic regression model
from sklearn.linear_model import LogisticRegression
logistic_regression_model = LogisticRegression(random_state=42)

# Train the model on the training data
logistic_regression_model.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = logistic_regression_model.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy_lr = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy_lr)

Accuracy: 0.7219917012448133


## SVM

In [17]:
from sklearn.svm import SVC

# Initialize the SVM classifier
svm_classifier = SVC(kernel='linear', random_state=42)  # You can choose different kernels like 'rbf', 'poly', etc.

# Train the classifier on the training data
svm_classifier.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = svm_classifier.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy_svm = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy_svm)

Accuracy: 0.7261410788381742


## Decision Trees

In [18]:
from sklearn.tree import DecisionTreeClassifier

# Initialize the Decision Tree classifier
dt_classifier = DecisionTreeClassifier(random_state=42)

# Train the classifier on the training data
dt_classifier.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = dt_classifier.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy_dt = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy_dt)

Accuracy: 0.9045643153526971


## Gradient Boosting

In [19]:
from sklearn.ensemble import GradientBoostingClassifier

# Initialize the Gradient Boosting classifier
gb_classifier = GradientBoostingClassifier(random_state=42)

# Train the classifier on the training data
gb_classifier.fit(X_train, y_train)

# Predict the labels for the testing data
y_pred = gb_classifier.predict(X_test)

# Evaluate the accuracy of the classifier
accuracy_gb = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy_gb)

Accuracy: 0.7842323651452282


## Accuracy Score Overview

In [20]:
# Random Forest
print("Random Forest Accuracy: {:.2f}%".format(accuracy_rf * 100))

# Logistic Regression
print("Logistic Regression Accuracy: {:.2f}%".format(accuracy_lr * 100))

# SVM
print("SVM Accuracy: {:.2f}%".format(accuracy_svm * 100))

# Decision Tree
print("Decision Tree Accuracy: {:.2f}%".format(accuracy_dt * 100))

# Gradient Boosting
print("Gradient Boosting Accuracy: {:.2f}%".format(accuracy_gb * 100))


Random Forest Accuracy: 91.29%
Logistic Regression Accuracy: 72.20%
SVM Accuracy: 72.61%
Decision Tree Accuracy: 90.46%
Gradient Boosting Accuracy: 78.42%


The above results explicitly states that the Random Forest algorithm achieved the highest accuracy.