Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new GLUE example with no Trainer. #10555

Merged
merged 3 commits into from Mar 10, 2021
Merged

Add new GLUE example with no Trainer. #10555

merged 3 commits into from Mar 10, 2021

Conversation

sgugger
Copy link
Collaborator

@sgugger sgugger commented Mar 5, 2021

What does this PR do?

This PR adds a new GLUE example that does not use the Trainer, leveraging accelerate for the distributed training. The necessary instructions are added in the text-classification README.

cc @JetRunner as it should be of interest to you.

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger Very impressive! accelerate seems very user-friendly and I definitely like it.

I think the arguments should be consistent with transformers==2.11.0 (which is pretty close to Google's old run_glue.py). Not because I'm nostalgic but I think we should respect researchers' customs and try not to go against them. Also, I think all features in the old script are necessary and we shouldn't really cut them out. Just have something compatible with the latest version of transformers but has the same parameters as the old run_glue.py is the best IMO. Researchers typically don't care about fancy features like picking the best checkpoint automatically but let's at least don't have fewer features than Google's run_glue.

Also, making predictions (--do_predict) is critical for researchers so I kinda cannot understand why it's supported in Trainer-version run_glue but not here.

required=True,
)
parser.add_argument(
"--use_slow_tokenizer",
Copy link
Contributor

@JetRunner JetRunner Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--use_native/traditional/python/legacy_tokenizer?
slow tokenizer sounds terrible but it's actually not that slow.

BTW, some users told me fast tokenizer sometimes output different results from the native ones occasionally. But I can't really reproduce that. Should we use the native tokenizers by default just to be safe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the different results are actually bug fixes but it's hard to say without any example.

# In distributed training, the .from_pretrained methods guarantee that only one local process can concurrently
# download model & vocab.
config = AutoConfig.from_pretrained(args.model_name_or_path, num_labels=num_labels, finetuning_task=args.task_name)
tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, use_fast=not args.use_slow_tokenizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer use_slow by default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast is the default of the library since 4.0.0.

optimizer.zero_grad()
progress_bar.update(1)
completed_steps += 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging/saving during training is a kind of necessity for researchers. Should have a --logging_steps here and --saving_steps.

@JetRunner
Copy link
Contributor

This kind of logging is very useful for researchers. Let's add them back?

https://github.com/google-research/bert/blob/master/run_classifier.py#L871

@JetRunner
Copy link
Contributor

In a nutshell, I'll burst into tears if we can just have Google's run_classifier.py back but with accelerate :)

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments.

Comment on lines 137 to 139
parser.add_argument(
"--no_correct_bias_in_adam", action="store_true", help="If passed, no bias correction is applied in AdamW."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really necessary to me? Researchers can just tweak it manually if they need that.

# Some have all caps in their config, some don't.
label_name_to_id = {k.lower(): v for k, v in model.config.label2id.items()}
if list(sorted(label_name_to_id.keys())) == list(sorted(label_list)):
label_to_id = {i: label_name_to_id[label_list[i]] for i in range(num_labels)}
Copy link
Contributor

@JetRunner JetRunner Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a warning too? Users should know their label assignment is not the default config in datasets.

if (
model.config.label2id != PretrainedConfig(num_labels=num_labels).label2id
and args.task_name is not None
and is_regression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is_regression? Do you mean and not is_regression?

Comment on lines +277 to +279
"Your model seems to have been trained with labels, but they don't match the dataset: ",
f"model labels: {list(sorted(label_name_to_id.keys()))}, dataset labels: {list(sorted(label_list))}."
"\nIgnoring the model labels as a result.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Please make sure the label assignment is consistent with {list(sorted(label_list))}.? Then we should let the users know how to overwrite the label assignment?

@JetRunner
Copy link
Contributor

JetRunner commented Mar 6, 2021

Maybe we should tag other researchers (even external) to give some feedback. cc @VictorSanh @TevenLeScao

@sgugger
Copy link
Collaborator Author

sgugger commented Mar 9, 2021

Addressed most of your comments except the logging/saving steps. I do not have time to add this right now, so I suggest we merge the current version and someone from the community can finish it.

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I can make some improvements incl. logging/saving steps.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sgugger. Looking forward to future improvements by @JetRunner too!

@sgugger sgugger merged commit efb5c0a into master Mar 10, 2021
@sgugger sgugger deleted the new_glue_script branch March 10, 2021 14:29
Iwontbecreative pushed a commit to Iwontbecreative/transformers that referenced this pull request Jul 15, 2021
* Add new GLUE example with no Trainer.

* Style

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants