Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Adding new tokens to bert tokenizer in QA #11200

Closed
andreabac3 opened this issue Apr 12, 2021 · 14 comments · Fixed by #11322
Closed

Issue: Adding new tokens to bert tokenizer in QA #11200

andreabac3 opened this issue Apr 12, 2021 · 14 comments · Fixed by #11322

Comments

@andreabac3
Copy link
Contributor

andreabac3 commented Apr 12, 2021

WARNING: This issue is a replica of this other issue open by me, I ask you sorry if I have open it in the wrong place.

Hello Huggingface's team (@sgugger , @joeddav, @LysandreJik)
I have a problem with this code base
notebooks/examples/question_answering.ipynb - link
ENV: Google Colab - transformers Version: 4.5.0; datasets Version: 1.5.0; torch Version: 1.8.1+cu101;
I am trying to add some domain tokens in the bert-base-cased tokenizer

model_checkpoint = 'bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
list_of_domain_tokens = ["token1", "token2", "token3"]
tokenizer.add_tokens(list_of_domain_tokens)
...
...
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)
print(model.device)  # cpu
model.resize_token_embeddings(len(tokenizer))
trainer = Trainer(...)

Then during the trainer.fit() call it report the attached error.
Can you please tell me where I'm wrong?
The tokenizer output is the usual bert inputs expressed in the form of List[List[int]] eg inputs_ids and attention_mask.
So I can't figure out where the problem is with the device
Input, output and indices must be on the current device

Kind Regards,
Andrea

@sgugger
Copy link
Collaborator

sgugger commented Apr 12, 2021

I am unable to reproduce: the notebook with your added code works smoothly on my side.

@andreabac3
Copy link
Contributor Author

Thank you @sgugger.
I want ask you sorry, I can't figure out on what's going on my side.
Now I have cloned again the notebook and the example works.
In the next days I want test it again and I will tell you more about it.

Thank you again for your help
Kind Regards,
Andrea

@andreabac3
Copy link
Contributor Author

Hi @sgugger,
I worked on the notebook and I found the problem.
I have not yet had the opportunity to test it with the original squad dataset but this happens to me both on colab and on my machine.
I warn you it seems an absurd and paradoxical situation, moreover I in no way manage the device.
I can provide you with a video while running the notebook.
As you can see from the screenshot I am forced to keep two versions of the training args, one original from the notebook and one customized by me.

If I perform these operations I get the error

  1. I instantiate my training args
  2. I instantiate the Trainer
  3. I run trainer.fit
    I get the error Input, output and indices must be on the current device

To solve I have to:
Instantiate the original training args of the notebook, instantiate the trainer, perform the fit to check that it has started and then do it all over again with the training args I customized.

Screenshot 2021-04-16 at 09 29 34

Kind regards,
Andrea

@andreabac3
Copy link
Contributor Author

Hi @sgugger,
I can confirm, the same bug happens in the original notebook with this TrainingArguments (I have tested with squad v2), the temporary fix is to start the train with the original one, stop it and then run with the customized args.

@sgugger
Copy link
Collaborator

sgugger commented Apr 19, 2021

It looks like a bug in colab (from the screenshots I assume that is what you are using for training?) since I didn't get any error on my side by executing this as a notebook.

@andreabac3
Copy link
Contributor Author

andreabac3 commented Apr 19, 2021

Hi @sgugger
Do you have tested the notebook replacing the trainer args with the following?

args = TrainingArguments(
    f"my-experiment",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=250,
    num_train_epochs=2,
    weight_decay=0.01,
    fp16=True,
    gradient_accumulation_steps=2,
    eval_accumulation_steps=2,
    fp16_opt_level='O2',
    fp16_full_eval=True,
    save_strategy='epoch',
    metric_for_best_model='eval_loss',
    logging_strategy='epoch'
)

Because I encountered the same issue on my machine.
Can you kindly test with it? Please
To test it: remove the old trainer args use the attached one and run the trainer.fit

Kind regards,
Andrea

@sgugger
Copy link
Collaborator

sgugger commented Apr 19, 2021

Ah you're right, I must have made a mistake. This comes from the option fp16_full_eval=True.

@stas00 I'm not sure what the best place is for fixing this but if someone uses fp16_full_eval=True with training, the model is never sent to the proper device and training fails.

@stas00
Copy link
Contributor

stas00 commented Apr 19, 2021

But there is no do_train in the args at #11200 (comment)

The logic is very explicit to not place on the device only for non-train whenfp16_full_eval=True is used:

        if (
            self.is_model_parallel
            or (args.deepspeed and args.do_train)
            or (args.fp16_full_eval and not args.do_train)
            or (self.sharded_ddp in [ShardedDDPOption.ZERO_DP_2, ShardedDDPOption.ZERO_DP_3])
        ):
            self.place_model_on_device = False

You need to add do_train=True to your TrainingArguments, otherwise it defaults to eval only because you have evaluation_strategy set.

@andreabac3
Copy link
Contributor Author

Hi @stas00 & @sgugger,

You need to add do_train=True to your TrainingArguments, otherwise it defaults to eval only because you have evaluation_strategy set.

Ok so do_train=True is also compatible with fp16_full_eval=True?
My objective is to train the model and pick the best one at the lowest point of eval loss.

Regarding the notebook, can I use the same Trainer object for fit and predict? Because these Booleans are never set in the notebook. I mean when I am doing trainer.predict() is obvious for the trainer to set model.eval() and torch.no_grad()?

Thank you both,
Andrea

@stas00
Copy link
Contributor

stas00 commented Apr 19, 2021

Ok so do_train=True is also compatible with fp16_full_eval=True?

Why did you think it shouldn't be compatible?

The only reason there is a special case for non-training is to avoid placing the full model on device before it was half()'ed - as it might not fit in its full size, but might fit in half().

Regarding the notebook, can I use the same Trainer object for fit and predict? Because these Booleans are never set in the notebook. I mean when I am doing trainer.predict() is obvious for the trainer to set model.eval() and torch.no_grad()?

Of course. It was designed for you to pass all the init args at once and then you can call all its functions.

@andreabac3
Copy link
Contributor Author

@stas00 Ok clear, I have just checked and the trainer works perfectly.
What do you think to place a warning to alert the user when call trainer.fit having the trainer.do_train = False?

Because it's clear in the point of view of performance as you said but the documentation don't bring out this things for this reason I have open then issue.

Kind regards,
Andrea

@stas00
Copy link
Contributor

stas00 commented Apr 19, 2021

Oh, I see. Until recently do_train was sort of optional when using user's custom code and what you're saying we need to then require do_train=True if trainer.train() is called. But we started relying on do_train for more than just knowing to call train() from scripts. This makes sense to me.

@sgugger, do you agree if we add this?

    def train(...):
[...]
    if not self.args.do_train:
        raise ValueError("To use `train` please make sure you set `do_train=True` when instantiating the Trainer object")

@sgugger
Copy link
Collaborator

sgugger commented Apr 19, 2021

I would rather avoid adding this, as users have been used to not have to set that argument to True when not using example scripts. Can we just add the proper line in train to put the model on the device if it was not done already?

(Sorry I didn't catch you were using do_train in the PR you added that test, I should have caught it and commented there.)

@stas00
Copy link
Contributor

stas00 commented Apr 19, 2021

We will probably have to rethink the design then, since it's not a simple "put on device if it wasn't already" - there are multiple cases when it shouldn't happen. For now added a hardcoded workaround: #11322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants