Skip to content
This repository has been archived by the owner on Aug 11, 2024. It is now read-only.

prediction of models #3

Closed
Chunpai opened this issue Feb 4, 2021 · 5 comments
Closed

prediction of models #3

Chunpai opened this issue Feb 4, 2021 · 5 comments

Comments

@Chunpai
Copy link

Chunpai commented Feb 4, 2021

Hello @seewoo5,

Thanks for sharing your implementations. I checked your code on DKT and DKVMN. It seems your models predict only one target_id for a single sequence. Am I right?

Thanks.
Chunpai

@seewoo5
Copy link
Owner

seewoo5 commented Feb 5, 2021

You're right, and the other models are the same. The only reason is that the NPA model, which uses BiLSTM to encode past interaction, can't be trained to predict all the responses in a given sequence because of its bi-directional property. However, all the other models can be trained by computing losses for all interactions in a sequence (not only the last one), and this actually makes training much faster. Although I'm going to fix it later, and you can fix it and send PR if you want.

@Chunpai
Copy link
Author

Chunpai commented Feb 5, 2021

Thank you for your clarification.

@bernardoleite
Copy link

bernardoleite commented Nov 24, 2021

Hi there!

Has anyone by chance made an implementation that allows predicting a complete sequence instead of one target_id for a single sequence? If so, I would be very grateful if you could share.

Also, I take this opportunity to confirm: considering that current code only makes predictions for one target_id, can we compare the obtained results with state-of-the-art (where whole sequences are considered for prediction)? I apologize in advance if I miss some implementation detail.

Regards.

@seewoo5
Copy link
Owner

seewoo5 commented Dec 21, 2021

@bernardoleite First of all, I may not have time to do the implementation for now. Actually I'm planning to do refactoring the whole repository using Pytorch Lightning and adding some recent KT models, but I don't have enough time to do that. I'm also considering about using EduData instead of my own pre-processed datasets.

For your second question, I think that making predictions only for one target_id is the right way to evaluate models, but most of the other results and papers actually divide whole sequence into several sub-sequences of fixed length and do prediction for each sequence. This would give worse performance than one-by-one prediction. For example, when you want to do prediction for the first question in the second subsequence, then the input does not include any previous interactions from the first subsequence. However, if a model make predictions for single target id at once, you can feed the previous interactions as much as you can (same size as the maximum length of the model trained), which should give a better prediction result.

@bernardoleite
Copy link

@seewoo5 I believe it's a good option to do refactoring using Pytorch Lighting (I'm becoming a fan of it too). Regarding the second question, I am more enlightened now. Thanks for the comprehensive explanation.

Regards,
Bernardo

@seewoo5 seewoo5 closed this as completed Sep 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants