Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better drop_last fix, updated results table (RES-2222) #521

Merged
merged 13 commits into from
Jun 2, 2021

Conversation

benja-matic
Copy link
Contributor

Replaced the old drop_last fix with a more durable solution, which overrides the predict and evaluate methods of the trainer. This fix covers the cases where those methods are called internally in HF code, not just by us.

Updated the readme results table: bert100k, bert_1mi, and sparse_80%_kd_onecycle_lr_rigl. The bert_1mi results are calculated using an extra 10 runs on wnli task.

There are some new experiments I'm playing with, like the "simple but hard to beat" baseline in finetuning.py. That stuff might still move around. Wanted to get this PR going for the results and the new fix.

@mvacaporale
Copy link
Contributor

Have you tried running with tiny_bert_linear_lr_range_test? It would be good to know this can fully run without any issues as the lr-range tests validate every epoch.

Copy link
Contributor

@mvacaporale mvacaporale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for catching the error with the first fix. Approved barring

  • some nit pick comments about grammar and punctuation,
  • and confirmation that this fix work with the lr-range test.

| sparse_80%_kd_onecycle_lr_rigl | 75.3 | 72.83 | 38.29 | 79.72/80.88 | 87.31/82.81 | 87.65 | 89.19/85.25 | 54.3 | 88.89 | 80.74/80.59 | 53.12 | 8.482 | 2.138 |
| bert_1mi | 80.13 | 77.17 | 45.81 | 84.27/84.63 | 88.26/83.82 | 91.21 | 90.54/87.20 | 65.34 | 91.86 | 87.41/87.43 | 53.52 | 5.013 | 1.612 |
| bert_100k | 75.36 | 71.68 | 39.56 | 78.88/79.08 | 82.71/76.23 | 87.77 | 89.31/85.57 | 58.12 | 87.61 | 83.95/83.84 | 42.25 | 8.619 | 2.154 |
| sparse_80%_kd_onecycle_lr_rigl | 75.3 | 72.57 | 36.49 | 79.23/79.66 | 86.55/81.86 | 88.23 | 89.39/85.64 | 54.51 | 90.6 | 81.31/81.24 | 50.7 | 8.482 | 2.138 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These updated results are without the simple but hard to beat baseline, correct?

@@ -129,6 +129,38 @@
model_name_or_path="/mnt/efs/results/pretrained-models/transformers-local/bert_100k", # noqa: E501
)

# the name 'simple' is in reference to the paper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be more of a nit pick style comment, but I believe in using proper grammar and punctuation in comments, especially longer ones. I believe it makes it easier to read. This one could be formatted as

# The name 'simple' is in reference to the paper "On the stability of finetuning BERT"
# where they propose a "simple but hard to beat" approach
#       https://openreview.net/pdf?id=nzpLWnVAyah
# 
# How to decided num_train_epochs: They say rte for 20 epochs is good ...

Notice how I try and use the full line length as well. There should be an extension in vscode to help automatically format comments for line length.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well, could you please edit the comment for clarity. It's somewhat hard to understand the logic you're expressing.

@@ -708,6 +676,22 @@ def init_model(model_args, config, tokenizer, finetuning=False):
return model


def toggle_drop_last_factory(method):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this method is described as a wrapper than a factory. Would you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this solution btw. Seems like a good way to modify the function without having to create a new subclass to Trainer.

@@ -751,6 +735,13 @@ def init_trainer(

trainer = trainer_class(**trainer_kwargs)

# issue: labels gettings set to -100 due to drop_last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also adjust the capitalization and punctuation here as well?

# Issue: labels get set to -100 due to drop_last
# Fix: override the evaluate and predict functions 
# The previous fix covers any time we call either method.
# This fix should cover every time HF calls it.

Copy link
Contributor

@lucasosouza lucasosouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

def toggle_drop_last_wrapper(method):
"""
Return a function that turns drop_last off before it is called. Used for ensuring
trainer.args.dataloader_drop_last is False during evaluation steps. After
Copy link
Contributor

@mvacaporale mvacaporale May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the style fixes. One last thing: It seems you accidentally added an indent here in the comments.

@benja-matic benja-matic merged commit e8c0c3b into numenta:master Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants