-
Notifications
You must be signed in to change notification settings - Fork 16
prediction of models #3
Comments
You're right, and the other models are the same. The only reason is that the NPA model, which uses BiLSTM to encode past interaction, can't be trained to predict all the responses in a given sequence because of its bi-directional property. However, all the other models can be trained by computing losses for all interactions in a sequence (not only the last one), and this actually makes training much faster. Although I'm going to fix it later, and you can fix it and send PR if you want. |
Thank you for your clarification. |
Hi there! Has anyone by chance made an implementation that allows predicting a complete sequence instead of one target_id for a single sequence? If so, I would be very grateful if you could share. Also, I take this opportunity to confirm: considering that current code only makes predictions for one target_id, can we compare the obtained results with state-of-the-art (where whole sequences are considered for prediction)? I apologize in advance if I miss some implementation detail. Regards. |
@bernardoleite First of all, I may not have time to do the implementation for now. Actually I'm planning to do refactoring the whole repository using Pytorch Lightning and adding some recent KT models, but I don't have enough time to do that. I'm also considering about using EduData instead of my own pre-processed datasets. For your second question, I think that making predictions only for one target_id is the right way to evaluate models, but most of the other results and papers actually divide whole sequence into several sub-sequences of fixed length and do prediction for each sequence. This would give worse performance than one-by-one prediction. For example, when you want to do prediction for the first question in the second subsequence, then the input does not include any previous interactions from the first subsequence. However, if a model make predictions for single target id at once, you can feed the previous interactions as much as you can (same size as the maximum length of the model trained), which should give a better prediction result. |
@seewoo5 I believe it's a good option to do refactoring using Pytorch Lighting (I'm becoming a fan of it too). Regarding the second question, I am more enlightened now. Thanks for the comprehensive explanation. Regards, |
Hello @seewoo5,
Thanks for sharing your implementations. I checked your code on DKT and DKVMN. It seems your models predict only one target_id for a single sequence. Am I right?
Thanks.
Chunpai
The text was updated successfully, but these errors were encountered: