Skip to content

Regarding PPL and GEN modes, could you please provide more details or clarify your question? Thanks #646

Answered by Leymore
niexufei asked this question in Q&A
Discussion options

You must be logged in to vote

The scores of ppl and gen in multiple-choice questions are not necessarily the same theoretically. This is because LM is doing next token prediction, where the choice range for ppl's next token is only A / B / C / D, while for gen's next token, the range is the entire vocabulary.
When the model's instruction-following ability is weak, it may not be able to output A / B / C / D; or when the model is fine-tuned in a tricky way, it might output a long explanation first, followed by A / B / C / D. These factors can lead to differences in the extracted results, and therefore, differences in accuracy.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by niexufei
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants