Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clip floating point constants to bf16 range to avoid inf conversion #20562

Closed
wants to merge 26 commits into from
Closed

Clip floating point constants to bf16 range to avoid inf conversion #20562

wants to merge 26 commits into from

Conversation

sangeethabal
Copy link
Contributor

When running HuggingFace BERT (any size) fine-tuning tutorial with transformers version >= 4.21.0 and using XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, I see NaNs in the loss after the first step.

What does this PR do?

This PR addresses the issue where the model code passes a value that is out of range for XLA_USE_BF16=1 or XLA_DOWNCAST_BF16=1, so the conversion would cast it to -inf.

The NaNs likely come from the transformers library change: #17306 . This PR replaced many lines which used to be -float(inf) (or other small constants) with torch.finfo().min. For torch.float32 the min value is -3.4028234663852886e+38 which is smaller than the bfloat16 minimum of -3.3895313892515355e+38. So the problem is that torch.finfo(torch.float32).min = -3.4028234663852886e+38 gets converted to -inf. When the original encoder_extended_attention_mask is 1, then encoder_extended_attention_mask becomes (1.0 - 1.0 ) * -inf which becomes NaN (via IEEE rule Inf * 0.0 = NaN).

This PR ensures torch.finfo(torch.bfloat16).min = -3.3895313892515355e+38 and not -inf. Then the results would not have Nans.

The following lines checks for XLA_USE_BF16 or XLA_DOWNCAST_BF16 environment variable and sets the dtype accordingly:

if is_torch_tpu_available():
   if os.environ.get("XLA_USE_BF16"):
       return torch.bfloat16
   if os.environ.get("XLA_DOWNCAST_BF16"):
       if t.dtype == torch.float:
            return torch.bfloat16
       if t.dtype == torch.double:
            return torch.float32

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 3, 2022

The documentation is not available anymore as the PR was closed or merged.

@sangeethabal sangeethabal changed the title Clip floating point constants to bf16 or fp16 range to avoid inf conversion Clip floating point constants to bf16 range to avoid inf conversion Dec 3, 2022
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Just have a small comment on style and we should be good to go.

# Fixes issue where the model code passes a value that is out of range for XLA_USE_BF16=1
# and XLA_DOWNCAST_BF16=1 so the conversion would cast it to -inf
if is_torch_tpu_available():
if os.environ.get("XLA_USE_BF16"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make an explicit comparison here? We usually dislike relying on Python bool conversion magic :-) We have in the utils some constants to regroup all kinds of truthy values that could be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me change that and make it an explicit comparison

ydshieh and others added 25 commits December 5, 2022 21:19
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix small nit

* add last file
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove is_encoder_decoder from some vision models

* cleanup more

* cleanup more

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* biogpt initial commit

* updated init

* fix faster decoding with use_cache

* 1. fix input_ids and input_embeds with correct device
2. added _keys_to_ignore_on_load_missing
3. updated prepare_inputs_for_generation

* add activation_dropout and scale_embedding

* replace fsmt attention with bart attention

* added test

* run make fix-copies

* doc init and fix build

* updated README with proper information

* 1. added tips to docs
2. updated BioGptTokenizer func

* 1. added tokenizer test
2. refactor tokenizer

* make fixup

* add biogpt fairseq to hf converter

* updated layer names more
similar to original checkpoints

* config update doc string and set defaults

* added "#copied" from bart model and
updated doc strings

* enable model_input_names in tokenizer

* 1.  positionalembedding depending on attention_mask
2. added attention mask to prepare for generation

* added test to verify past and generation

* BioGptLMHeadModel -> BioGptForCausalLM

* fix typo

* tokenization and test
Copyright and updated assertion

* updated Copyright and
one func at time in line

* Copyright updates and
minor doc fix

* replace assertion with ValueError

* rm extra space

* added code syntax

* revert cmnt position change

* add tokenizer to auto

* updated doc string

* tokenizer doc string update

* biogpt hub model update to microsoft/biogpt

* make fixup

* rm cmnt to fix flake8 5.0.4 vs 6 error
* Expected output for the test changed

* fix failing asr test
* add support for `from_pt`

* add tf_flax utility file

* Update src/transformers/modeling_tf_flax_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove flax related modifications

* add test

* remove FLAX related commits

* fixup

* remove safetensor todos

* revert deletion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Make convert_to_onnx runable as script again

Fix `convert_graph_to_onnx.py` relative import so it can be run as a script again.

* Trigger CI
* add type annotations for esm chunk_utils

use isinstance builtin instead of 'type(x) is y'; add assertions to aid in type inferencing; use bools instead of ints in _get_minimal_slice_set for improved type clarity; refactor to avoid re-assigning to the same variable with a different type

* add type annotations for esm data_transforms

refactor to avoid re-assigning to the same variable with a different type

* add type annotations for esm feats utils

refactor to avoid re-assigning to the same variable with a different type

* add type annotations for esm loss utils

* add/fix type annotations for esm rigit_utils

refactor to avoid re-assigning to the same variable with a different type; fix Callable, Tuple type hints; match conditional structure to other methods; fix return type on Rotation.cat and Rotation.unsqueeze

* add type annotations for esm tensor_utils

overload for tree_map; use insinstance builtin instead of 'type(x) is y'; export dict_multimap, flatten_final_dims, permute_final_dims in openfold_utils

* add type annotations for esm protein utils

add FIXME for attempted string mutation; add missing None check in get_pdb_headers; fix potentially unbound variable 'chain_tag' in to_pdb; modify get_pdb_headers return type

* add type annotations for esm residue constants

hints on collection constants; remove magic trailing comma to reduce number of lines; change list -> tuple for rigid_group_atom_positions for improved hinting

* code style fixup

Co-authored-by: Matt <rocketknight1@gmail.com>
* rembert onnx config

* formatting

Co-authored-by: Ho <erincho@bcd0745f972b.ant.amazon.com>
* Fix link to table transformer detection microsoft model

* Fix doc styles
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix whisper and speech to text doc
# What does this PR do?
Previously the documentation was badly indented for both models and indicated that
> If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value of `inputs_embeds`.`
Which is on valid for the forward pass of the `ForConditionnalGeneration` not for the model alone.

* other fixes
* remove set-output

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add v1 with tests

* add checker

* simplified version

* update docstring

* better version

* fix docstring + change order

* make style

* tests + change conditions

* final tests

* modify docstring

* Update src/transformers/feature_extraction_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* replace by `ValueError`

* fix logic

* apply suggestions

* `dtype` is not needed

* adapt suggestions

* remove `_parse_args_to_device`

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* [Whisper] Fix decoder ids methods

* enum property
* add whisper conversion scrip

* update conversion script

* update arg names

* fix missing encoder_ffn_dim

* fixup

* ast nits
@sgugger
Copy link
Collaborator

sgugger commented Dec 5, 2022

Oh, looks like something went wrong in your rebase (see the diff showing lots of files). You can either force-push a commit (with --force) to repare the history for git, or close this PR and open a fresh one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet