# Labeling ILOs Automatically

A rough outline of what's happening in this notebook is
1. Wrap an existing (already trained) transformer model with a `FunctionTransformer` to make it compatible with Sklearn API.
2. Load and encode ILO data made public by Yuheng Li et al. $[1]$.
3. Perform hyper-parameter search using Bayesian optimization over an MLP for classifying ILOs according to Bloom's taxonomy.
4. Use the resulting MLP to classify ILOs from our own course data.

Currently, we are using a pretrained transformer. While the model is currently fixed, selected according to Yuheng Li et al. $[1]$, in the future it would be interesting to extend the hyper-parameter search to also look for alternative pretrained transformers.

In [7]:
from typing import Sequence
from sentence_transformers import SentenceTransformer
from sklearn.preprocessing import FunctionTransformer

class TextEncoder(FunctionTransformer):
    def __init__(self, encoder: SentenceTransformer = SentenceTransformer('all-MiniLM-L6-v2')):
        super().__init__(func=self._encode)
        self.encoder = encoder

    def _encode(self, X: Sequence[str]):
        return self.encoder.encode(sentences=list(X), convert_to_numpy=True)

def make_encoder(encoder: SentenceTransformer | None = None):
    return TextEncoder(encoder=encoder) if encoder is not None else TextEncoder()

In [2]:
import pandas as pd
import numpy as np

df = pd.read_csv('sample_full.csv')
df.dropna(subset=['Learning_outcome'], inplace=True)
df.replace(to_replace=np.nan, value=0, inplace=True)
df.replace(to_replace=1., value=1, inplace=True)
df.replace(to_replace=0., value=0, inplace=True)
numeric = df.select_dtypes(include=np.number)
df[numeric.columns] = df[numeric.columns].astype(dtype=int)
df

Unnamed: 0,Learning_outcome,Remember,Understand,Apply,Analyze,Evaluate,Create
0,Analyze the health economic implications of e...,0,0,0,1,0,0
1,Apply research skills to operate effectively ...,0,0,1,0,0,0
2,Assess and synthesise diverse information abo...,0,0,0,0,1,1
3,Describe the general characteristics of the m...,0,1,0,0,0,0
4,Evaluate the different models of perioperativ...,0,0,0,0,1,0
...,...,...,...,...,...,...,...
21375,"Write/type simple sentences using hiragana, ka...",0,0,1,0,0,0
21376,Writing of assessment reports and giving feedb...,0,0,0,0,1,0
21377,You will develop the ability to work in a team...,0,0,1,0,0,0
21378,You will develop their oral presentation skill...,0,0,0,0,0,1


In [8]:
targets = numeric.columns
encoder = make_encoder()
X = encoder.transform(X=df['Learning_outcome'])
y = df[[col for col in df.columns if col != 'Learning_outcome']].to_numpy()

print(X.shape, y.shape)

(21380, 384) (21380, 6)


In [4]:
from skopt import BayesSearchCV
from skopt.space import Real, Integer
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.1)

classifier = BayesSearchCV(
    estimator=MLPClassifier(max_iter=10000, batch_size='auto', early_stopping=True, solver='adam', hidden_layer_sizes=(500, 350)),
    search_spaces={'n_iter_no_change': Integer(low=1, high=10, prior='uniform'),
                   'learning_rate_init': Real(low=0.001, high=0.1, prior='uniform'),
                   'validation_fraction': Real(low=0.1, high=0.5, prior='uniform'),
                   'alpha': Real(low=0.0001, high=2., prior='uniform')},
    n_iter=50,
    scoring='accuracy',
    cv=5,
    refit=True,
    verbose=0,
    n_jobs=-3,
    n_points=5
)
classifier.fit(X=train_X, y=train_y);

Because the classification problem posed by the data is a multilabel classification task, using *accuracy* as a metric results in an unintuitive performance measure that is very harsh on the model. So, in order to hold on to some interpretability, we also employ a dummy classifier which classifies instances randomly. We use this dummy as a baseline to compare against our best model.

In [9]:
from sklearn.dummy import DummyClassifier
from sklearn.multioutput import MultiOutputClassifier
from sklearn.metrics import accuracy_score

dummy = MultiOutputClassifier(estimator=DummyClassifier(strategy='uniform')).fit(X=train_X, Y=train_y)
baseline = accuracy_score(y_true=test_y, y_pred=dummy.predict(X=test_X))
best = accuracy_score(y_true=test_y, y_pred=classifier.predict(X=test_X))
print(f'Best score: {best}\nBaseline: {baseline}')

Best score:0.6188026192703461
Baseline: 0.015434985968194575


Should we need to invoke the model in the future, we can always save it and reload it later.

In [10]:
import pickle
from sklearn.pipeline import make_pipeline

pipeline = make_pipeline(encoder, classifier)

with open('ILOClassifier.pkl', 'wb') as file:
    pickle.dump(pipeline, file)

Now we load our course data and label all ILOs.

In [11]:
import json

with open('P:\\FSE_UCM\\recommender systeem MSLAS\\course data\\course_data.json', 'r') as file:
    course_data = json.load(file)

    for (i, course) in enumerate(course_data):
        ilos = list(course['ilo'])

        if ilos:
            P = pipeline.predict_proba(X=ilos)
            probabilities = {ilos[i]: {targets[j].lower(): float(P[i, j]) for j in range(P.shape[1])} for i in range(P.shape[0])}
            course_data[i]['proba'] = probabilities

course_data

[{'code': 'COR1002',
  'title': 'Philosophy of Science',
  'level': 1,
  'period': [2, 5],
  'ects': 5.0,
  'type': 'COR',
  'device_free': False,
  'prereq_desc': ' None . ',
  'rec_desc': ' It is strongly recommended not to take this course in your first or second semester. ',
  'desc': 'Typical issues in this course are: What is the role of observation in science? What is a scientific explanation? What roles do theories and experiments play in science? What is the nature of scientific progress? Can we rationally decide between scientific viewpoints? In what ways are the social sciences similar to or different from the natural sciences? The course presents an introduction to major issues in the philosophy of science. It can be divided into four parts. In the first, we will deal with traditional positions on the objectivity and methodology of science, like those of logical empiricism. The second focuses on objections to this received view as formulated by critical rationalism and by T

In [12]:
with open('P:\\FSE_UCM\\recommender systeem MSLAS\\course data\\augmented_course_data_v2.json', 'w') as file:
    json.dump(course_data, file)

### Summary

We took a pretrained sentence transformer, a language model using a transformer neural network trained to take sentences and encode them as fixed-length vectors, and used it to transform all ILOs into vectors. Then, we used a multilayer perceptron (MLP) to classify said vectors according to Bloom's taxonomy.

In order to obtain a model that generalizes well, we performed hyper-parameter search using a Bayesian optimization framework. Given that it performed well in cross-validation, we expect it to perform well on new ILOs, should the course catalog be updated. However, as the authors of the dataset pointed out, the labels are noisy. Not two human experts labeling ILOs according to Bloom's taxonomy will agree on all accounts, thus giving rise to noise in the examples they produce.

# Bibliography

1. [1]: Li, Y., Rakovic, M., Poh, B. X., Gasevic, D., & Chen, G. (2022). Automatic Classification of Learning Objectives Based on Bloom’s Taxonomy. In A. Mitrovic & N. Bosch (Eds.), Proceedings of the 15th International Conference on Educational Data Mining (pp. 530-537). International Educational Data Mining Society. https://doi.org/10.5281/zenodo.6853191

**by Dennis Alexander Mertens Velasquez**