#  Lab 07 - Knowledge Tracing

## Introduction

During the last lectures and lab session, you have dealt into one notable application of machine learning in education, namely knowledge tracing. Machine-learning models optimized for this task aim to understand how well a student is learning a portfolio of skills. Monitoring this knowledge by means of automated models allows to personalize online learning platforms, focusing the assessment on skills the student is weak in and accelerating learning of certain skills.

You are asked to work on the ASSISTment data set presented last week and to complete the following tasks:

- Compare three knowledge tracing models (BKT, AFM, PFA) in terms of AUC and RMSE.
- Generate and discuss the learning curves for a BKT model on a specific set of skills. 

You can use [pyBKT](https://github.com/CAHLR/pyBKT) and [pyAFM](https://github.com/cmaclell/pyAFM/) throughout this tutorial.


In [None]:
# Principal package imports
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import scipy as sc

# Scikit-learn package imports
from sklearn import feature_extraction, model_selection, metrics

# PyBKT package imports
from pyBKT.models import Model

# PyAFM package imports
from pyafm.custom_logistic import CustomLogistic

### YOUR ADDITIONAL IMPORT STATEMENTS BELOW (please, do not make any imports elsewhere in the notebook) ###

## The Data Set
---

ASSISTments is a free tool for assigning and assessing math problems and homework. Teachers can select and assign problem sets. Once they get an assignment, students can complete it at their own pace and with the help of hints, multiple chances, and immediate feedback. Teachers get instant results broken down by individual student or for the whole class. More information on the platform can be found [here](https://www.commonsense.org/education/website/assistments). 

In this homework, we will play with a simplified version of a dataset collected from the ASSISTments tool, saved in a CSV file with the following columns:  


| Name                   | Description                         |
| ---------------------- | ------------------------------------------------------------ |
| user_id | The ID of the student who is solving the problem.  | |
| order_id | The temporal ID (timestamp) associated with the student's answer to the problem.  | |
| problem_id | The ID of the problem.  | |
| skill_name | The name of the skill associated with the problem. | |
| correct | The student's performance on the problem: 1 if the problem's answer is correct at the first attempt, 0 otherwise. 
| prior_success | The number of prior problems on that skill the student correctly answered at the first attempt. 
| prior_failure | The number of prior problems on that skill the student wrongly answered at the first attempt.  | |

Load the data set. 

In [None]:
DATA_DIR = "./../../data/"
data = pd.read_csv(DATA_DIR + 'as_hw_cmp.csv')

Compute the total number of interactions, the number of unique students, and the number of unique skills

In [None]:
### YOUR CODE HERE ###

What are those skills?

In [None]:
### YOUR CODE HERE ###

<a id="section1"></a>
## 1  Knowledge Tracing: Model Performance Comparison 
----

In this section, we ask you to evaluate (i) a Bayesian Knowledge Tracing (BKT) model, (ii) an Additive Factor Model (AFM), and (iii) a Performance Factor Analysis (PFA) model on the skills 'Circle Graph', 'Venn Diagram', and 'Mode', by performing a user-stratified 10-fold cross validation and monitoring the Root Mean Squared Error (RMSE) and the Area Under the ROC Curve (AUC) as performance metrics. Then, we ask you to visually report the RMSE and AUC scores achieved by the three student's models in the user-stratified 10-fold cross validation, in such a way that the models' performance can be easily and appropriately compared against each other.

For your convenience, you will be guided in completing this section through six main tasks:
- Task 1.1: Group k-fold initialization.
- Task 1.2: BKT evaluation.
- Task 1.3: AFM evaluation.
- Task 1.4: PFA evaluation.
- Task 1.5: Performance metrics plotting.
- Task 1.6: Performance metrics discussion. 

<a id="section1.1"></a>
### Task 1.1

Given that the main objective of this homework section is to evaluate three student's knowledge tracing models under a `user-stratified 10-fold cross validation`, in this task, we ask you to complete the body of a function named `create_iterator`. This function should create an iterator object able to split student's interactions included in `data` in `10 folds` such that the same student does not appear in multiple folds. To do so, you can appropriately initialize a scikit-learn's GroupKFold iterator with non-overlapping groups and returning the iterator, i.e., `model_selection.GroupKFold(...).split(...)`. 

For convenience, we present you an illustrative example assuming that (i) you have four data samples and that (ii) the first two data samples belong to group 0 and the last two data samples belong to group 2. The data samples associated with a group should not appear in multiple folds or, in other words, the data samples associated with a group should appear all in the same fold. Please, find below a way to use the scikit-learn's GroupKFold object to create folds that meet this property (here, we simulate this scenario by considering only a 2-fold creation strategy):

`X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 2, 3, 4])
groups = np.array([0, 0, 2, 2])
group_kfold = model_selection.GroupKFold(n_splits=2).split(X, y, groups)
`

Finally, we provide an illustrative example not related with the task on how this iterator can be then used to generate training and test folds:

`
for train_index, test_index in group_kfold:
    print('TRAIN:', train_index, 'TEST:', test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    print(X_train, '-', X_test, '-', y_train, '-', y_test)
`

The above for loop generates the following output. It can be observed that the data samples belonging to a group all appear in the same fold, as expected. 

`TRAIN: [0 1] TEST: [2 3]
[[1 2] [3 4]] - [[5 6] [7 8]] - [1 2] - [3 4] 
`

`
TRAIN: [2 3] TEST: [0 1]
[[5 6] [7 8]] - [[1 2] [3 4]] - [3 4] - [1 2]
`

Please, find more information about the GroupKFold iterator in the [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html) documentation.

In [None]:
def create_iterator(data):
    '''
    Create an iterator to split interactions in data in 10 folds, with the same student not appearing in multiple folds.
    :param data:        Dataframe with student's interactions.
    :return:            An initialized GroupKFold iterator.
    '''
    ### YOUR CODE HERE ###
    raise NotImplementedError()

Check outputs of this function and the properties of the iterator. Make sure that at each iteration: there no user_id in both train and test set(no overlap), all user_id in the dataset are contain in the train and test set (union), each user appears in the test set exactly once and other property you find usefull to check.

In [None]:
### YOUR CODE HERE ###

<a id="section1.2"></a>
### Task 1.2

In this task, we ask you to evaluate a `BKT model` with all default parameters, namely `Model(seed=0)` in pyBKT, through a `user-stratified 10-fold cross-validation`, computing the following performance metrics: `RMSE` and `AUC`. To do so, you should use the `create_iterator` function, defined in Task 1.1, to create the training and test set for each fold, starting from the interactions in `data`. 

No plotting is needed, it is enough to print the scores for each metric in the cell.

Please, note that this task may require a long running time (e.g., about 40 to 90 minutes), depending on your implementation and device. Just as an indication, on a Dell XPS 13, one fold lasts around 7 minutes.

Look at the `BKT model` [documentation](https://github.com/CAHLR/pyBKT) 

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
rmse_bkt, auc_bkt = [], []
for iteration, (train_index, test_index) in enumerate(create_iterator(data)):
    # Split data in training and test sets
    X_train, X_test = ...
    # Initialize the model
    model = ...
    # Fit the model
    %time ...
    # Compute RMSE (use the model evaluation methods)
    train_rmse = ...
    test_rmse = ...
    rmse_bkt.append(test_rmse)
    # Compute AUC (use the model evaluation methods)
    train_auc = ...
    test_auc = ...
    auc_bkt.append(test_auc)
    # Print progress
    print('Iteration:', iteration, 'RMSE', (train_rmse, test_rmse), 'AUC', (train_auc, test_auc))

Compute the mean and standard deviation for both metrics

In [None]:
### YOUR CODE HERE ###

<a id="section1.3"></a>
### Task 1.3

In this task, we ask you to evaluate an `AFM model` with all default parameters (e.g., no custom bounds, default l2 regularization, and fit_intercept=True) through a `user-stratified 10-fold cross-validation`, computing the following performance metrics: `RMSE` and `AUC`. To do so, exactly as you should have done for the BKT model in Task 1.2, you should use the `create_iterator` function, defined in Task 1.1, to create the training and test set for each fold, starting from the interactions in `data`. 

No plotting is needed, it is enough to print the scores for each metric in the cell.

The following cells include some utility functions that are needed to generate the `X` and `y` data in a format that is accepted by pyAFM model objects. To complete this task, you can build on top of the `X` and `y` created for you with the following cells. Please, refer to Tutorial 6 for further information on pyAFM. 

Take a look [here](https://github.com/cmaclell/pyAFM/blob/6150afdef7ab2eabff6c439accb5f9f81af34129/afms_workflow_predict.py#L11) to have ideas on how to use the `AFM model`

In [None]:
def read_as_student_step(data):    
    skills, opportunities, corrects, user_ids = [], [], [], []
    
    for row_id, (_, row) in enumerate(data.iterrows()):
        
        # Get attributes for the current interaction 
        user_id = row['user_id']
        skill_name = row['skill_name']
        correct = row['correct']
        prior_success = row['prior_success']
        prior_failure = row['prior_failure']
        
        # Update the number of opportunities this student had with this skill
        opportunities.append({skill_name: prior_success + prior_failure})
        
        # Update information in the current 
        skills.append({skill_name: 1})

        # Answer info
        corrects.append(correct)
        
        # Student info
        user_ids.append({user_id: 1})
        
    return (skills, opportunities, corrects, user_ids)

In [None]:
def prepare_data_afm(skills, opportunities, corrects, user_ids):

    sv = feature_extraction.DictVectorizer()
    qv = feature_extraction.DictVectorizer()
    ov = feature_extraction.DictVectorizer()
    S = sv.fit_transform(user_ids)
    Q = qv.fit_transform(skills)
    O = ov.fit_transform(opportunities)
    X = sc.sparse.hstack((S, Q, O))
    y = np.array(corrects)

    return (X.toarray(), y)

Prepare the X and y arrays to be used to evaluate the AFM model. 

In [None]:
%time skills, opportunities, corrects, user_ids = read_as_student_step(data)
%time X, y = prepare_data_afm(skills, opportunities, corrects, user_ids)

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
rmse_afm, auc_afm = [], []
for iteration, (train_index, test_index) in enumerate(create_iterator(data)):
    # Split data in training and test sets
    X_train, X_test = ...
    y_train, y_test = ...
    # Initialize and fit the model
    afm = ...
    %time ...
    # Make predictions 
    y_train_pred = ...
    y_test_pred = ...
    # Compute RMSE (use metrics package methods)
    train_rmse = ...
    test_rmse = ...
    rmse_afm.append(test_rmse)
    # Compute AUC (use metrics package methods)
    train_auc = ...
    test_auc = ...
    auc_afm.append(test_auc)
    # Print progress
    print('Iteration:', iteration, 'RMSE', (train_rmse, test_rmse), 'AUC', (train_auc, test_auc))

<a id="section1.4"></a>
### Task 1.4

In this task, we ask you to evaluate a `PFA model` with all default parameters (e.g., no custom bounds, default l2 regularization, and fit_intercept=True) through a `user-stratified 10-fold cross-validation`, computing the following performance metrics: `RMSE` and `AUC`. To do so, exactly as you should have done for the BKT and AFM models in Task 1.2 and 1.3, you should use the `create_iterator` function, defined in Task 1.1, to create the training and test set for each fold, starting from the interactions in `data`. 

No plotting is needed, it is enough to print the scores for each metric in the cell.

The following cells include some utility functions that are needed to generate the `X` and `y` data in a format that is accepted by pyAFM model objects. To complete this task, you can build on top of the `X` and `y` created for you with the following cells. Please, refer to Tutorial 6 for further information on pyAFM. 

In [None]:
def read_as_success_failure(data):
    n_succ, n_fail = [], []

    # Create the n_succ and n_fail variables required by pyAFM
    for i, row in data.iterrows():
        n_succ.append({row['skill_name']: int(row['prior_success'])})
        n_fail.append({row['skill_name']: int(row['prior_failure'])})
        
    return n_succ, n_fail

In [None]:
def prepare_data_pfa(skills, corrects, user_ids, n_succ, n_fail):
    
    s = feature_extraction.DictVectorizer()
    q = feature_extraction.DictVectorizer()
    succ = feature_extraction.DictVectorizer()
    fail = feature_extraction.DictVectorizer()
    S = s.fit_transform(user_ids)
    Q = q.fit_transform(skills)
    succ = succ.fit_transform(n_succ)
    fail = fail.fit_transform(n_fail)
    X = sc.sparse.hstack((S, Q, succ, fail))
    y = np.array(corrects)

    return (X.toarray(), y)

Prepare the X and y arrays to be used to evaluate the PFA model. 

In [None]:
%time n_succ, n_fail = read_as_success_failure(data)
%time X, y = prepare_data_pfa(skills, corrects, user_ids, n_succ, n_fail)

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
rmse_afm, auc_afm = [], []
for iteration, (train_index, test_index) in enumerate(create_iterator(data)):
    # Split data in training and test sets
    X_train, X_test = ...
    y_train, y_test = ...
    # Initialize and fit the model
    afm = ...
    %time ...
    # Make predictions 
    y_train_pred = ...
    y_test_pred = ...
    # Compute RMSE (use metrics package methods)
    train_rmse = ...
    test_rmse = ...
    rmse_afm.append(test_rmse)
    # Compute AUC (use metrics package methods)
    train_auc = ...
    test_auc = ...
    auc_afm.append(test_auc)
    # Print progress
    print('Iteration:', iteration, 'RMSE', (train_rmse, test_rmse), 'AUC', (train_auc, test_auc))

<a id="section1.5"></a>
### Task 1.5

In this task, we ask you to visually report the RMSE and AUC scores achieved by the three student's models in the user-stratified 10-fold cross validation performed in Task 1.2, 1.3, and 1.4 respectively, in such a way that the models' performances can be easily and appropriately compared against each other. 

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
raise NotImplementedError()

### Task 1.6 Please compare and discuss the performance metric scores achieved by the student's models.

YOUR ANSWER HERE

<a id="section1"></a>
## 2  Knowledge Tracing: Learning Curves Comparison 
----

In this section, you should fit a Bayesian Knowledge Tracing (BKT) model on the three skills included in the `data` data set, and compute the corresponding predictions. Then, for each skill included in the data dataframe, you should visually report and discuss (i) the learning curve and (ii) the bar plot representing the number of students who reached a given number of opportunities for that skill, obtained through the BKT model fitted on that skill, in such a way that models' learning patterns can be easily and appropriately compared. No comparison with other baseline model is required.

For your convenience, you will be guided in completing this section through three main tasks:
- Task 2.1: BKT fit and prediction. 
- Task 2.2: Learning curves and bar plots generation.
- Task 2.3: Learning curves and bar plots discussion. 

<a id="section2.1"></a>
### Task 2.1

In this task, we ask you to fit a BKT model with all default parameters, i.e., `Model(seed=0)` in pyBKT, on the full `data` data set (no split into train and test set needed as we are not assessing predictive performance of the model here). Once your BKT model is fitted, we ask you to appropriately create a dataframe named `predictions` with four columns `user_id`, `skill_name`, `y_true`, `y_pred_bkt`. This dataframe should include one row per interaction in `data`, where user_id is the id of the student associated with that interaction, skill_name is the name of the skill involved in that interaction, y_true is the true student's performance on that interaction (1 if correct at the first attempt, 0 otherwise), and y_pred_bkt is the prediction made by the pre-trained BKT model for that interaction.  

Please, note that this task may require a long running time (e.g., about 10 to 20 minutes), depending on your implementation and device. Just as an indication, on a Dell XPS 13, the fit process lasts around 7 minutes.  

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
# Initialize the model
model = ...

# Fit the model on the entire dataset
%time ...

In [None]:
### YOUR CODE HERE ###
# Make predictions
predictions = ...

# Rename the dataframe columns as per instructions
predictions.columns = ['user_id', 'skill_name', 'y_true', 'y_pred_bkt']

Print the first ten rows as a double check

In [1]:
### YOUR CODE HERE ###

<a id="section2.4"></a>
### Task 2.2

In this task, for each skill, we ask you to visually report and discuss (i) the `learning curve` and (ii) the `bar plot` representing the number of students who reached a given number of opportunities (similar to the visualizations done in Tutorial 6),  obtained by the BKT model fitted on that skill, in such a way that models' learning patterns can be easily and appropriately compared. To do so, we ask you to use the predictions you stored in the dataframe `predictions`.    

No comparison with other baseline model is required.

Please, refer to Tutorial 6 for further information on learning curve and bar plotting for student's knowledge tracing models.

In [None]:
### YOUR CODE HERE (please, feel free to add extra cells to solve this task, after this first one) ###
raise NotImplementedError()

### Task 2.3 Please discuss all visualizations (learning curves and bar plots) obtained with the BKT model. 

YOUR ANSWER HERE