-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better drop_last fix, updated results table (RES-2222) #521
Conversation
Have you tried running with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for catching the error with the first fix. Approved barring
- some nit pick comments about grammar and punctuation,
- and confirmation that this fix work with the lr-range test.
| sparse_80%_kd_onecycle_lr_rigl | 75.3 | 72.83 | 38.29 | 79.72/80.88 | 87.31/82.81 | 87.65 | 89.19/85.25 | 54.3 | 88.89 | 80.74/80.59 | 53.12 | 8.482 | 2.138 | | ||
| bert_1mi | 80.13 | 77.17 | 45.81 | 84.27/84.63 | 88.26/83.82 | 91.21 | 90.54/87.20 | 65.34 | 91.86 | 87.41/87.43 | 53.52 | 5.013 | 1.612 | | ||
| bert_100k | 75.36 | 71.68 | 39.56 | 78.88/79.08 | 82.71/76.23 | 87.77 | 89.31/85.57 | 58.12 | 87.61 | 83.95/83.84 | 42.25 | 8.619 | 2.154 | | ||
| sparse_80%_kd_onecycle_lr_rigl | 75.3 | 72.57 | 36.49 | 79.23/79.66 | 86.55/81.86 | 88.23 | 89.39/85.64 | 54.51 | 90.6 | 81.31/81.24 | 50.7 | 8.482 | 2.138 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updated results are without the simple but hard to beat baseline, correct?
@@ -129,6 +129,38 @@ | |||
model_name_or_path="/mnt/efs/results/pretrained-models/transformers-local/bert_100k", # noqa: E501 | |||
) | |||
|
|||
# the name 'simple' is in reference to the paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be more of a nit pick style comment, but I believe in using proper grammar and punctuation in comments, especially longer ones. I believe it makes it easier to read. This one could be formatted as
# The name 'simple' is in reference to the paper "On the stability of finetuning BERT"
# where they propose a "simple but hard to beat" approach
# https://openreview.net/pdf?id=nzpLWnVAyah
#
# How to decided num_train_epochs: They say rte for 20 epochs is good ...
Notice how I try and use the full line length as well. There should be an extension in vscode to help automatically format comments for line length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well, could you please edit the comment for clarity. It's somewhat hard to understand the logic you're expressing.
projects/transformers/run_utils.py
Outdated
@@ -708,6 +676,22 @@ def init_model(model_args, config, tokenizer, finetuning=False): | |||
return model | |||
|
|||
|
|||
def toggle_drop_last_factory(method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this method is described as a wrapper than a factory. Would you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this solution btw. Seems like a good way to modify the function without having to create a new subclass to Trainer.
projects/transformers/run_utils.py
Outdated
@@ -751,6 +735,13 @@ def init_trainer( | |||
|
|||
trainer = trainer_class(**trainer_kwargs) | |||
|
|||
# issue: labels gettings set to -100 due to drop_last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also adjust the capitalization and punctuation here as well?
# Issue: labels get set to -100 due to drop_last
# Fix: override the evaluate and predict functions
# The previous fix covers any time we call either method.
# This fix should cover every time HF calls it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
projects/transformers/run_utils.py
Outdated
def toggle_drop_last_wrapper(method): | ||
""" | ||
Return a function that turns drop_last off before it is called. Used for ensuring | ||
trainer.args.dataloader_drop_last is False during evaluation steps. After |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the style fixes. One last thing: It seems you accidentally added an indent here in the comments.
Replaced the old drop_last fix with a more durable solution, which overrides the predict and evaluate methods of the trainer. This fix covers the cases where those methods are called internally in HF code, not just by us.
Updated the readme results table: bert100k, bert_1mi, and sparse_80%_kd_onecycle_lr_rigl. The bert_1mi results are calculated using an extra 10 runs on wnli task.
There are some new experiments I'm playing with, like the "simple but hard to beat" baseline in finetuning.py. That stuff might still move around. Wanted to get this PR going for the results and the new fix.