<a href="https://colab.research.google.com/github/shankencedric/cs180proj/blob/main/MantisMinds%E2%84%A2%EF%B8%8F_Bucu_Legara_Project_THX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MantisMinds™️** Final Project
### COMBINING HUMAN AND MACHINE INTELLIGENCE FOR EDUCATIONAL PERFORMANCE PREDICTION
Final project for CS 180 Artificial Intelligence under Ma'am Lyn Gabud, AY 2023-2024 Semester 2.

**BUCU, Clarisse Bianca C.** ---------- 2020-08925 (WFX)

**LEGARA, Sean Ken Cedric G.** ---- 2021-08117 (WFX)

# **Part 1: Introduction**

## Background & Motivation

Predicting student performance is crucial for tailoring educational interventions. Traditional methods often fall short, highlighting the need for innovative approaches. Machine learning (ML) offers a promising avenue, but integrating and interpreting predictions from both human educators and automated systems remains underexplored.

## Problem Statement

The research questions is as follows:

- How does the integration of human predictions in a Random Forest Regressor machine learning model for predicting student performance in exams affect the overall accuracy of the prediction model?

# **Part 2: Importing the Dataset**


## Data sources

The dataset used in the project is "[Performance vs. Predicted Performance](https://www.kaggle.com/datasets/daphnelenders/performance-vs-predicted-performance/)" from Kaggle user CALATHEA21. This dataset contains information about high school students, their human-predicted performance on an exam, and their actual performance.

The dataset is in English, and consists of 856 rows (students). The data for the human-predicted performance was collected by giving 107 participants 8 student profiles each whose final grades they then had to predict. Additionally, a good number of the predicting participants were subjected to some “Stereotype Activation”, suggesting that boys perform less well in school than girls.

Most of the rest of the information in the dataset (student profiles and their actual grade) were derived from another already existing dataset, “[Student Alcohol Consumption](https://www.kaggle.com/datasets/uciml/student-alcohol-consumption)”.

In [None]:
# import ML packages
import sklearn
import pandas as pd

# for cleaner table printing (optional)
pd.set_option('display.expand_frame_repr', False)

# connect to google drive
from google.colab import drive
drive.mount('/content/drive')

# change to correct path
%cd "/content/drive/MyDrive/Year 3 Sem 2/CS 180/Project"

# import csv file of the "Performance vs. Predicted Performance" dataset
orig_dataset = pd.read_csv('Performance vs. Predicted Performance.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Year 3 Sem 2/CS 180/Project


## Data aspects

The features (columns) in the dataset include:

- **index** (int) - the index of each student. This can be used to match to students of the original dataset, “Student Alcohol Consumption”.

- **ParticipantID** (int) - the ID of the participant who made the performance predictions for the corresponding student.

- **name** (string) - fake names for each student, purely to make the prediction task more engaging for participants.

- **sex** (string) - the sex of each student, either female (F) or male (M).

- **studytime** (int) - denotes how long a student studied for their exam, numerically encoded to range from 1 to 3 (where 1 = less than 2 hours, 2 = 2-5 hours, 3 = more than 5 hours).

- **freetime** (int) - the student's amount of freetime, ranges from 1 (low) to 3 (high).

- **romantic** (binary) - denotes whether the student is in a romantic relationship or not.

- **Walc** (int) - the weekly alcohol consumption of student, ranges between 1 (none) to 4 (very high).

- **goout** (int) - amount of time the student goes out per week, ranges from 1 (never) to 4 (thrice or more per week).

- **Parents_edu** (int) - the highest education level of the student's parents. Ranges from 1 to 4, where 4 = highest level of education.

- **absences** (int) - the number of absences per student, ranges from 0 to 7, where 7 represents absences >=7 due to large number of absences being infrequent.

- **reason** (string) - the reason why the student chose to attend their current school. The levels are close to “home”, the school's “reputation”, the school's “course”/curriculum and “other”.

- **G3** (int) - the actual grade each student received for the final exam of the course, ranging from 0-20.

- **Pass** (binary) - a binary variable showing whether G3 is a passing grade (i.e. >=10) or not.

- **Predicted Grade** (binary) - the grade the student was predicted to receive by the participant.

- **Predicted Rank** (int) - how highly the student is ranked (between 1-8) according to their corresponding participant’s grade predictions.

- **Stereotype Activation** (string) - can be one of three forms:
 - “None” - No form of Stereotype Activation was presented.
 - “CaseBased” - Participants were shown three student profiles prior to the prediction task. One belonged to a female student, who was shown to have a high grade for the exam. The other two, belonged to male students, who were shown to have fairly low grades.
 - “Statistics” - Prior to the prediction task, some statistics were shown suggesting that boys perform less well in school than girls.

- **Predicted_Pass_PassFailStrategy** (binary) - one version of the biased binary decision label. This version was obtained by checking whether "Predicted Grade" was >=10 or not.

- **Predicted_Pass_RankingStrategy** (binary) - another version of the biased binary decision label. This version was obtained using "Predicted Rank".

In [None]:
# check the dataset
orig_dataset.head()

Unnamed: 0,index,ParticipantID,name,sex,studytime,freetime,romantic,Walc,goout,Parents_edu,absences,reason,G3,Pass,PredictedGrade,PredictedRank,StereotypeActivation,Predicted_Pass_PassFailStrategy,Predicted_Pass_RankingStrategy
0,132,1,Anna,F,1,2,no,1,2,4,0,course,15,True,17,4,,True,True
1,724,1,Michael,M,1,1,no,4,4,4,1,reputation,11,True,10,7,,True,False
2,637,1,David,M,1,2,no,4,2,2,0,other,11,True,13,6,,True,True
3,884,1,Brian,M,1,1,no,4,4,3,7,home,9,False,10,8,,True,False
4,194,1,Jenny,F,2,2,no,1,4,2,0,reputation,14,True,18,3,,True,True


# **Part 3: Preprocessing**

## Dropping unecessary features

Some columns are not included in the scope of the project and thus have to be dropped. In particular, the following columns are irrelevant:

- **index** : This is merely for identification of students, hence not relevant.
- **ParticipantID** : This is merely for identification of participants, hence not relevant.
- **name** : This is merely to make the prediction task more engaging for participants, hence not relevant.
- **Pass** : We are concerned only with the G3 integer itself, since that is what we aim to predict in the project. Hence, **Pass** is not relevant.
- **PredictedRank** : We will not be predicting ranks, only individual grades. Hence, **PredictedRank** is not relevant.
- **Predicted_Pass_PassFailStrategy, Predicted_Pass_RankingStrategy** : Biased binary decision labels will not be tackled in our project, hence not relevant.

In [None]:
# Drop unnecessary columns
dataset = orig_dataset.drop(['index', 'ParticipantID', 'name', 'Pass', 'PredictedRank', 'Predicted_Pass_PassFailStrategy', 'Predicted_Pass_RankingStrategy'], axis=1)

# check the dataset
dataset.head()

Unnamed: 0,sex,studytime,freetime,romantic,Walc,goout,Parents_edu,absences,reason,G3,PredictedGrade,StereotypeActivation
0,F,1,2,no,1,2,4,0,course,15,17,
1,M,1,1,no,4,4,4,1,reputation,11,10,
2,M,1,2,no,4,2,2,0,other,11,13,
3,M,1,1,no,4,4,3,7,home,9,10,
4,F,2,2,no,1,4,2,0,reputation,14,18,


## Renaming column labels
We will also be renaming the column labels to be more informative and format-consistent, for the sake of easy human interpretation.

In [None]:
# rename the columns
dataset.rename(columns = {
    'sex' : 'Sex',
    'studytime' : 'StudyTime',
    'freetime' : 'FreeTime',
    'romantic' : 'Romantic',
    'Walc' : 'WeeklyAlcoIntake',
    'goout' : 'GoOut',
    'Parents_edu' : 'ParentsEdu',
    'absences' : 'Absences',
    'reason' : 'Reason',
    'G3' : 'FinalGrade'
}, inplace=True)

# check the dataset
dataset.head()

Unnamed: 0,Sex,StudyTime,FreeTime,Romantic,AlcoIntakeWeekend,GoOut,ParentsEdu,Absences,Reason,FinalGrade,PredictedGrade,StereotypeActivation
0,F,1,2,no,1,2,4,0,course,15,17,
1,M,1,1,no,4,4,4,1,reputation,11,10,
2,M,1,2,no,4,2,2,0,other,11,13,
3,M,1,1,no,4,4,3,7,home,9,10,
4,F,2,2,no,1,4,2,0,reputation,14,18,


## Formatting the data
Lastly, we will convert all string data into numbers to aid the machine learning process later on. Note that we will have to fill all null values with `0` first.

In [None]:
# Replace all nulls and NaNs with 0.
dataset.fillna(0, inplace=True)

In [None]:
# Convert all data to numbers.
dataset['Sex'].replace({'F' : 0, 'M' : 1}, inplace=True)
dataset['Romantic'].replace({'no' : 0, 'yes' : 1}, inplace=True)
dataset['Reason'].replace({'course' : 0, 'home' : 3, 'other' : 2, 'reputation' : 1}, inplace=True)
dataset['StereotypeActivation'].replace({'CaseBased' : 1, 'Statistics' : 2}, inplace=True)

# Print new possible values
print('Possible values of Sex:', dataset['Sex'].unique())
print('Possible values of Romantic:', dataset['Romantic'].unique())
print('Possible values of Reason:', dataset['Reason'].unique())
print('Possible values of StereotypeActivation:', dataset['StereotypeActivation'].unique())

# check final preprocessed dataset
dataset.head()

Possible values of Sex: [0 1]
Possible values of Romantic: [0 1]
Possible values of Reason: [0 1 2 3]
Possible values of StereotypeActivation: [0 1 2]


Unnamed: 0,Sex,StudyTime,FreeTime,Romantic,AlcoIntakeWeekend,GoOut,ParentsEdu,Absences,Reason,FinalGrade,PredictedGrade,StereotypeActivation
0,0,1,2,0,1,2,4,0,0,15,17,0
1,1,1,1,0,4,4,4,1,1,11,10,0
2,1,1,2,0,4,2,2,0,2,11,13,0
3,1,1,1,0,4,4,3,7,3,9,10,0
4,0,2,2,0,1,4,2,0,1,14,18,0


## Creating the dataset for comparison

Since our research goal is to determine how the human predictions affect the accuracy of the machine learning model, we must create a copy of the dataset which does not include the human predictions, to be called `datasetNoHuman`, that the same machine learning model will also be trained to. Later on, the result of the two models will be compared to each other to determine if the integration of human predictions led to enhanced accuracy or not.

In [None]:
# Duplicating the dataset and dropping the human-predictions-related columns on one of them
datasetNoHuman = dataset.drop(['PredictedGrade', 'StereotypeActivation'], axis=1)

# Check the 2 datasets
print("Dataset with human predictions:\n", dataset.head())
print("Dataset with no human predictions", datasetNoHuman.head())

Dataset with human predictions:
    Sex  StudyTime  FreeTime  Romantic  AlcoIntakeWeekend  GoOut  ParentsEdu  Absences  Reason  FinalGrade  PredictedGrade  StereotypeActivation
0    0          1         2         0                  1      2           4         0       0          15              17                     0
1    1          1         1         0                  4      4           4         1       1          11              10                     0
2    1          1         2         0                  4      2           2         0       2          11              13                     0
3    1          1         1         0                  4      4           3         7       3           9              10                     0
4    0          2         2         0                  1      4           2         0       1          14              18                     0
Dataset with no human predictions    Sex  StudyTime  FreeTime  Romantic  AlcoIntakeWeekend  GoOut  Pare

# **Part 4: Splitting the Dataset**

## Separating the target feature
We first need to separate the target feature (`FinalGrade`) as the *Y* vector from the rest of the features which comprise the *X* matrix. Note that this is done on both datasets.

In [None]:
# separating the target feature 'FinalGrade' in the w/ human dataset
datasetX = dataset.drop('FinalGrade', axis=1)
datasetY = dataset['FinalGrade']

# check separation results
print(datasetX.head())
print(datasetY.head())

# separating the target feature 'FinalGrade' in the no human dataset
datasetNoHumanX = datasetNoHuman.drop('FinalGrade', axis=1)
datasetNoHumanY = datasetNoHuman['FinalGrade']

# check separation results
print(datasetNoHumanX.head())
print(datasetNoHumanY.head())

   Sex  StudyTime  FreeTime  Romantic  AlcoIntakeWeekend  GoOut  ParentsEdu  Absences  Reason  PredictedGrade  StereotypeActivation
0    0          1         2         0                  1      2           4         0       0              17                     0
1    1          1         1         0                  4      4           4         1       1              10                     0
2    1          1         2         0                  4      2           2         0       2              13                     0
3    1          1         1         0                  4      4           3         7       3              10                     0
4    0          2         2         0                  1      4           2         0       1              18                     0
0    15
1    11
2    11
3     9
4    14
Name: FinalGrade, dtype: int64
   Sex  StudyTime  FreeTime  Romantic  AlcoIntakeWeekend  GoOut  ParentsEdu  Absences  Reason
0    0          1         2         0      

## The 60-20-20 Split
Then, we need to split the dataset into the training, validation, and testing sets with a 60-20-20 respective split. We can do this easily with the `train_test_split` function imported from `sklearn`, but take note that we will be using a constant value (named `randState`) in the `random_state` parameter such that the output is consistent throughout reruns. Note also that we will call this function twice: first to get an 80-20 split (20% for the testing set, 80% for the rest), then to use the rest to get the 60-20 split (60% for the training set, 20% for the validation set).

In [None]:
from sklearn.model_selection import train_test_split
randState = 69

# First split to get testing set (20%) and the rest (80%) [note that the rest is actually the training and validation sets combined]
xRest, xTest, yRest, yTest = train_test_split(datasetX, datasetY, train_size=(80/100), test_size=(20/100), random_state=randState)
xRestNH, xTestNH, yRestNH, yTestNH = train_test_split(datasetNoHumanX, datasetNoHumanY, train_size=(80/100), test_size=(20/100), random_state=randState)

# Second split to get training (60%) and validation sets (20%)
xTrain, xValid, yTrain, yValid = train_test_split(xRest, yRest, train_size=(60/80), test_size=(20/80), random_state=randState)
xTrainNH, xValidNH, yTrainNH, yValidNH = train_test_split(xRestNH, yRestNH, train_size=(60/80), test_size=(20/80), random_state=randState)

# Check shapes
print(f"[Dataset] Training set: {xTrain.shape}")
print(f"[Dataset] Validation set: {xValid.shape}")
print(f"[Dataset] Testing set: {xTest.shape}\n")

print(f"[Dataset with no human predictions] Training set: {xTrainNH.shape}")
print(f"[Dataset with no human predictions] Validation set: {xValidNH.shape}")
print(f"[Dataset with no human predictions] Testing set: {xTestNH.shape}")

[Dataset] Training set: (513, 11)
[Dataset] Validation set: (171, 11)
[Dataset] Testing set: (172, 11)

[Dataset with no human predictions] Training set: (513, 9)
[Dataset with no human predictions] Validation set: (171, 9)
[Dataset with no human predictions] Testing set: (172, 9)


# **Part 4: Training & Validation**

## The Machine Learning Model
As mentioned in the introduction, the machine learning model we will be using is **Random Forest Regressor**. This was the model chosen due to how it excels in predicting continuous values, which in our case would be a student's numerical grade. It is also known to perform well when dealing with data that exhibits non-linear trends. Since we are still not aware how the individual features in the dataset affect the target value, and hence unsure if the trends were linear, this model was a good fit for determining the potentially complex relationships between the inputs.

In [None]:
# importing the model & other needed libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import ParameterGrid
from sklearn.metrics import mean_squared_error

## Grid Search for Hyperparameter Tuning

We chose to create a grid search function to iteratively look for the best combination of hyperparameters among our given candidates in a Parameter Grid. Running the training code below results in the accuracy metrics (Mean Squared Error (MSE)) for both models being printed out.

We ran the training code multiple times, and with each iteration adjusting the Parameter Grid values to improve/lessen the MSE until we got the results currently displayed below.

In [None]:
# Parameter grid: to define all parameter value combinations to test
paramGrid = ParameterGrid({
    'n_estimators': [100, 350, 500],
    'max_depth': [None, 50, 200],
    'min_samples_split': [30, 60, 100],
    'min_samples_leaf': [2, 10, 25]
})

# Create grid search function for getting the
def gridSearch(xTrain, yTrain, xValid, yValid):
  bestScore = float('inf')
  bestParams = None
  for params in paramGrid:
    # Train model with the each hyperparameters
    rf = RandomForestRegressor(**params, n_jobs=-1, random_state=randState)
    rf.fit(xTrain, yTrain)

    # Evaluate on the validation set
    yValidPred = rf.predict(xValid)
    predScore = mean_squared_error(yValid, yValidPred)

    # Record if it is the best so far
    if predScore < bestScore:
      bestScore = predScore
      bestParams = params

  return bestScore, bestParams

# Grid search: to find the optimal hyperparameters
bestScore, bestParams = gridSearch(xTrain, yTrain, xValid, yValid)
bestScoreNH, bestParamsNH = gridSearch(xTrainNH, yTrainNH, xValidNH, yValidNH)

print(f"[Dataset] Found the best hyperparams as:\n{bestParams}")
print(f"[Dataset with no human predictions] Found the best hyperparams as:\n{bestParamsNH}\n")

# Train the data with the optimal hyperparameters
tunedModel = RandomForestRegressor(**bestParams, random_state=randState)
tunedModel.fit(xTrain, yTrain)
tunedModelNH = RandomForestRegressor(**bestParamsNH, random_state=randState)
tunedModelNH.fit(xTrainNH, yTrainNH)

# Evaluate data on the training set
trainPred = tunedModel.predict(xTrain)
trainPredNH = tunedModelNH.predict(xTrainNH)
print(f"[Dataset] Score (TRAINING): {mean_squared_error(yTrain, trainPred)}")
print(f"[Dataset with no human predictions] Score (TRAINING): {mean_squared_error(yTrainNH, trainPredNH)}\n")

# Evaluate data on the validation set
validPred = tunedModel.predict(xValid)
validPredNH = tunedModelNH.predict(xValidNH)
print(f"[Dataset] Score (VALIDATION): {mean_squared_error(yValid, validPred)}")
print(f"[Dataset with no human predictions] Score (VALIDATION): {mean_squared_error(yValidNH, validPredNH)}\n")

# Evaluate data on the testing set
testPred = tunedModel.predict(xTest)
testPredNH = tunedModelNH.predict(xTestNH)
print(f"[Dataset] Score (TESTING): {mean_squared_error(yTest, testPred)}")
print(f"[Dataset with no human predictions] Score (TESTING): {mean_squared_error(yTest, testPredNH)}\n")

[Dataset] Found the best hyperparams as:
{'max_depth': None, 'min_samples_leaf': 25, 'min_samples_split': 100, 'n_estimators': 500}
[Dataset with no human predictions] Found the best hyperparams as:
{'max_depth': 50, 'min_samples_leaf': 2, 'min_samples_split': 60, 'n_estimators': 100}

[Dataset] Score (TRAINING): 7.146913299823962
[Dataset with no human predictions] Score (TRAINING): 6.670418817848976

[Dataset] Score (VALIDATION): 7.383019776512701
[Dataset with no human predictions] Score (VALIDATION): 7.350502910322323

[Dataset] Score (TESTING): 6.248553225347643
[Dataset with no human predictions] Score (TESTING): 6.165145462942863



# **Part 5: Results & Conclusions**

# **Demo Code**
### The contents of this cell is also a file found in the [project repository](https://github.com/shankencedric/cs180proj).

In [31]:
def predictHuman(Sex, StudyTime, FreeTime, Romantic, WeeklyAlcoIntake, GoOut, ParentsEdu, Absences, Reason, FinalGrade, PredictedGrade, StereotypeActivation):
  if PredictedGrade == None and StereotypeActivation == None:
    # Predict with no human predictions
    inp = pd.DataFrame([Sex, StudyTime, FreeTime, Romantic, WeeklyAlcoIntake, GoOut, ParentsEdu, Absences, Reason],
                       columns=['Sex', 'StudyTime', 'FreeTime', 'Romantic', 'WeeklyAlcoIntake', 'GoOut', 'ParentsEdu', 'Absences', 'Reason'])
    pred = tunedModelNH.predict(inp)
  else:
    # Predict with human predictions
    inp = pd.DataFrame([Sex, StudyTime, FreeTime, Romantic, WeeklyAlcoIntake, GoOut, ParentsEdu, Absences, Reason, PredictedGrade, StereotypeActivation],
                       columns=['Sex', 'StudyTime', 'FreeTime', 'Romantic', 'WeeklyAlcoIntake', 'GoOut', 'ParentsEdu', 'Absences', 'Reason'])
    pred = tunedModel.predict(inp)
  return pred


if __name__ == "__main__":
  sex = input("Sex (0 for M, 1 for F): ")
  studytime = input("How long did you study for the exam (1 if <2 hours, 2 if 2-5 hours, 3 if >5 hours): ")
  freetime = input("How much is your free time (1 is low, 3 is high): ")
  romantic = input("Are you in any form of romantic relationship (0 if no, 1 if yes): ")
  alocintake = input("How much alcohol do you consume (1 is low, 4 is high): ")
  goout = input("How much do you go out (1 if never, 4 if very often): ")
  parentsedu = input("Highest educ level of your parents (1 lowest, 4 highest): ")
  absences = input("Your absences (raw amount from 1 to 7, if >7 just put 7): ")
  reason = input("Your main reason for choosing to attend this school (0 if course, 1 if school reputation, 2 if other, 3 if close to home): ")
  finalgrade = input("(Optional) The grade you got for this exam (1-20, or press enter to skip): ")
  predictedgrade = input("(Optional) Enter a grade someone who saw all your input above will predict (1-20, or press enter to skip): ")
  stereotypeactivation = input("(Optional) The stereotype activation of the predictor (0 if none, 1 if case-based, 2 if statistics, or press enter to skip): ")

  if (len(finalgrade) == 0): finalgrade = None
  if (len(predictedgrade) == 0): predictedgrade = None
  if (len(stereotypeactivation) == 0): stereotypeactivation = None

  prediction = predictHuman(sex, studytime, freetime, romantic, alocintake, goout, parentsedu, absences, reason, finalgrade, predictedgrade, stereotypeactivation)
  print("The machine predicted you will have a grade of:", prediction)
  if finalgrade != '':
    print("Note that this is", finalgrade - prediction, "off your actual grade")

Sex (0 for M, 1 for F): 1
How long did you study for the exam (1 if <2 hours, 2 if 2-5 hours, 3 if >5 hours): 1
How much is your free time (1 is low, 3 is high): 1
Are you in any form of romantic relationship (0 if no, 1 if yes): 1
How much alcohol do you consume (1 is low, 4 is high): 1
How much do you go out (1 if never, 4 if very often): 1
Highest educ level of your parents (1 lowest, 4 highest): 1
Your absences (raw amount from 1 to 7, if >7 just put 7): 1
Your main reason for choosing to attend this school (0 if course, 1 if school reputation, 2 if other, 3 if close to home): 1
(Optional) The grade you got for this exam (1-20, or press enter to skip): 
(Optional) Enter a grade someone who saw all your input above will predict (1-20, or press enter to skip): 1
(Optional) The stereotype activation of the predictor (0 if none, 1 if case-based, 2 if statistics, or press enter to skip): 


ValueError: Shape of passed values is (11, 1), indices imply (11, 9)