Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tokenizers] convert_to_tensors: don't reconvert when the type is already right #8283

Merged
merged 7 commits into from
Nov 19, 2020

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Nov 4, 2020

I was trying to fix this warning:

src/transformers/tokenization_utils_base.py:608: UserWarning: To copy construct from a tensor, 
it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), 
rather than torch.tensor(sourceTensor).
  tensor = as_tensor(value)

which appeared when running:

python eval_rag.py --model_name_or_path facebook/rag-sequence-nq --model_type rag_sequence --evaluation_set output/biencoder-nq-dev.questions --gold_data_path output/biencoder-nq-dev.pages --predictions_path output/retrieval_preds.tsv --eval_mode retrieval --k 1

This appears to have happened since convert_to_tensors was called with data which was already a tensor of the right type.

  • and ended up fixing it for pt and also adding the same fix for tf/jax/np. basically skip the conversion if the value is already of the required type and avoid the pytorch warning.
  • added tests for converting the already converted
  • while at it added a missing test for test_batch_encoding_with_labels_jax

I understand lambda isn't welcome, so I had to define a few helper functions for numpy/jax. partial would have done the trick, but isinstance doesn't accept keyword args.

@LysandreJik, @mfuntowicz

@stas00 stas00 changed the title don't reconvert when the type is already right [tokenizers] don't reconvert when the type is already right Nov 4, 2020
@stas00 stas00 changed the title [tokenizers] don't reconvert when the type is already right [tokenizers] convert_to_tensors: don't reconvert when the type is already right Nov 4, 2020
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this looks like a good additional check. @mfuntowicz, what do you think?

@stas00
Copy link
Contributor Author

stas00 commented Nov 9, 2020

ping

@mfuntowicz
Copy link
Member

mfuntowicz commented Nov 10, 2020

Looks good to me. Thanks for handling this one @stas00 and sorry for the delay.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I find it better this way too. Thanks @stas00!

@stas00 stas00 merged commit 42111f1 into huggingface:master Nov 19, 2020
moussaKam pushed a commit to moussaKam/transformers that referenced this pull request Nov 20, 2020
…eady right (huggingface#8283)

* don't reconvert when the type is already right

* better name

* adjust logic as suggested

* merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants