Skip to content


Repository files navigation

OKT: Open-ended Knowledge Tracing

OKT provides the first exploration into open-ended knowledge tracing by studying the new task of predicting students’ exact open-ended responses to questions. This repository contains code for Open-Ended Knowledge Tracing for Computer Science Education
Naiming Liu, Zichao Wang, Richard G. Baraniuk, Andrew Lan, to be presented at EMNLP 2022.

A block diagram of OKT is shown here:



  • python 3.8.12
  • torch 1.10.0
  • transformers 4.6.1
  • scikit-learn 0.24.2
  • numpy 1.22.1
  • munch 2.5.0
  • nltk 3.7
  • neptune-client 0.14.2


We use CSEDM dataset and preprocess the data by 1). removing all codes that can't be parsed as abstract syntax tree. 2). convert student codes to vector representation using ASTNN. You can download the preprocessed data with the commands below.

Download preprocessed data

cd scripts

Fine-tuned/Pre-trained models

Download fine-tuned GPT models

We provide two fine-tuned GPT-2 models to test the performance of pre-trained response generation model. One with funcom dataset, while the other is further on CSEDM based on the first one. Models can be downloaded with the following commands.

cd scripts

Training LSTM and classifier

In order to pre-train knowledge estimation (LSTM) and classifier, run python main_student_model on the command line. All parameters can be changed in the configs_student_model.yaml file.

Training OKT

In order to train OKT model, run python on the command line. All parameters can be changed in the configs_okt.yaml file. We use to track our experiment results. If you also want to use, you should change neptune_project and neptune_api in the parameter list to your own neptune credentials.
Note: To use other knowledge tracing (KT) models instead of LSTM as knowledge estimation for OKT, you should use pre-trained KT models. We integrate two KT models (AKT, DKVMN) in our code (need to uncomment first). If you want to use them, please follow AKT and DKVMN repo to pretrain corresponding KT models.

Results and Evaluation

We use two metrics: CodeBLEU and Dist-N and integrate their codes into this repo. To understand more about evalution metrics, please follow their corresponding websites. Training models and generation results will be saved in a directory checkpoints\$TIME you just created, where $TIME is the current time in data_time format. It will contain two models (lstm for knowledge tracing and model for generative model). It also includes an eval_log.pkl file, which shows CodeBLEU score, Dist-1 and generated student answers together with ground-truth answers for comparison. A set of trained results can be downloaded here.

cd scripts

Some codes to create the plots included in the paper (visualization of knowledge states in latent space and its trajectory) can be found at directory notebooks.


Please cite our paper if your find it helpful to you work!

  title={Open-Ended Knowledge Tracing},
  author={Liu, Naiming and Wang, Zichao and Baraniuk, Richard G and Lan, Andrew},
  journal={arXiv preprint arXiv:2203.03716},


Code for Open-ended Knowledge Tracing







No releases published


No packages published