New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new GLUE example with no Trainer. #10555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger Very impressive! accelerate
seems very user-friendly and I definitely like it.
I think the arguments should be consistent with transformers==2.11.0
(which is pretty close to Google's old run_glue.py
). Not because I'm nostalgic but I think we should respect researchers' customs and try not to go against them. Also, I think all features in the old script are necessary and we shouldn't really cut them out. Just have something compatible with the latest version of transformers
but has the same parameters as the old run_glue.py
is the best IMO. Researchers typically don't care about fancy features like picking the best checkpoint automatically but let's at least don't have fewer features than Google's run_glue.
Also, making predictions (--do_predict
) is critical for researchers so I kinda cannot understand why it's supported in Trainer-version run_glue
but not here.
required=True, | ||
) | ||
parser.add_argument( | ||
"--use_slow_tokenizer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--use_native/traditional/python/legacy_tokenizer
?
slow tokenizer
sounds terrible but it's actually not that slow.
BTW, some users told me fast tokenizer sometimes output different results from the native ones occasionally. But I can't really reproduce that. Should we use the native tokenizers by default just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the different results are actually bug fixes but it's hard to say without any example.
# In distributed training, the .from_pretrained methods guarantee that only one local process can concurrently | ||
# download model & vocab. | ||
config = AutoConfig.from_pretrained(args.model_name_or_path, num_labels=num_labels, finetuning_task=args.task_name) | ||
tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path, use_fast=not args.use_slow_tokenizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer use_slow
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast is the default of the library since 4.0.0.
optimizer.zero_grad() | ||
progress_bar.update(1) | ||
completed_steps += 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging/saving during training is a kind of necessity for researchers. Should have a --logging_steps
here and --saving_steps
.
This kind of logging is very useful for researchers. Let's add them back? https://github.com/google-research/bert/blob/master/run_classifier.py#L871 |
In a nutshell, I'll burst into tears if we can just have Google's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments.
parser.add_argument( | ||
"--no_correct_bias_in_adam", action="store_true", help="If passed, no bias correction is applied in AdamW." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really necessary to me? Researchers can just tweak it manually if they need that.
# Some have all caps in their config, some don't. | ||
label_name_to_id = {k.lower(): v for k, v in model.config.label2id.items()} | ||
if list(sorted(label_name_to_id.keys())) == list(sorted(label_list)): | ||
label_to_id = {i: label_name_to_id[label_list[i]] for i in range(num_labels)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a warning too? Users should know their label assignment is not the default config in datasets
.
if ( | ||
model.config.label2id != PretrainedConfig(num_labels=num_labels).label2id | ||
and args.task_name is not None | ||
and is_regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and is_regression
? Do you mean and not is_regression
?
"Your model seems to have been trained with labels, but they don't match the dataset: ", | ||
f"model labels: {list(sorted(label_name_to_id.keys()))}, dataset labels: {list(sorted(label_list))}." | ||
"\nIgnoring the model labels as a result.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Please make sure the label assignment is consistent with {list(sorted(label_list))}.
? Then we should let the users know how to overwrite the label assignment?
Maybe we should tag other researchers (even external) to give some feedback. cc @VictorSanh @TevenLeScao |
Addressed most of your comments except the logging/saving steps. I do not have time to add this right now, so I suggest we merge the current version and someone from the community can finish it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I can make some improvements incl. logging/saving steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @sgugger. Looking forward to future improvements by @JetRunner too!
* Add new GLUE example with no Trainer. * Style * Address review comments
What does this PR do?
This PR adds a new GLUE example that does not use the
Trainer
, leveraging accelerate for the distributed training. The necessary instructions are added in the text-classification README.cc @JetRunner as it should be of interest to you.