-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate a merged expert model's perplexity #53
Comments
@ontocord Can you please review the description of this issue? This is an important one, so I'd like to make sure we're aligned on the details. |
Perplexity script was run on all combinations of: I will post the full results of the 36 experiments below, but to highlight the confusion parts Perplexity of the training set is higher on the expert than the base. When I looked in WandB I found a run titled ![]() All of the datasets showed the expert having higher complexity than the base on both the training split and the validation_domain split. distilgpt2 had very high perplexity on the datasets as expected. Here are the complete results
|
Thanks @Stillerman! Let's close this issue? |
The goal is to do a perplexity calculation on a few models:
The model in (1) can be created using the script in this PR. The list of experts is:
The modes in (2, 3) should be prepared in the following issue.
The evaluation should be done on the evaluation fold of each expert's dataset, but excluding the Pile part of it. The datasets are at MDEL HF. The calculation of the perplexity can be done with the Hugginface Trainer's evaluate() method (see example here).
The deliverables of this issue should be:
The text was updated successfully, but these errors were encountered: