Fixing slow pipeline tests #14260

Narsil · 2021-11-03T15:40:25Z

Some tests were broken because of pytorch inference_mode.
This should cover all cases of inplace tensor modifications afaik.

Let me know if there are better ways to fix those.

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@stas00
@patrickvonplaten

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

patrickvonplaten · 2021-11-03T17:32:51Z

src/transformers/models/unispeech/modeling_unispeech.py

-        output_lengths = self._get_feat_extract_output_lengths(attention_mask.sum(-1)).to(torch.long)
+        # Effectively attention_mask.sum(-1), but not inplace to be able to run
+        # on inference mode.
+        mask = attention_mask.cumsum(dim=-1)[:, -1]


(nit) - could we call it non_padded_lengths - the idea here is to extract the sub sampled length from the "real non-padded" input length

Suggested change

mask = attention_mask.cumsum(dim=-1)[:, -1]

non_padded_lengths = attention_mask.cumsum(dim=-1)[:, -1]

True, do you know any other way to do that operation ? it's very surprising that .sum is inplace, and I am scared that using cumsum instead is super wasteful.

Tried to grep it in our code, but I couldn't find anything of that sort.

Hmm - don't really know to be honest...torch.sum(...) doesn't work either? But I think using .cumsum(...) is totally fine as well

No, torch.sum(..) doesn't work.

src/transformers/models/unispeech_sat/modeling_unispeech_sat.py

src/transformers/models/wav2vec2/modeling_wav2vec2.py

patrickvonplaten · 2021-11-03T17:33:28Z

src/transformers/pipelines/table_question_answering.py


    def sequential_inference(self, **inputs):
        """
        Inference used for models that need to process sequences in a sequential fashion, like the SQA models which
        handle conversational query related to a table.
        """
-        with torch.no_grad():


patrickvonplaten

Would be happy if we could give mask a better naming. Apart from that, thanks a lot for enabling inference mode for all models :-)

Narsil · 2021-11-03T19:16:36Z

src/transformers/models/detr/modeling_detr.py

+        if self.training:
+            if torch.isinf(hidden_states).any() or torch.isnan(hidden_states).any():
+                clamp_value = torch.finfo(hidden_states.dtype).max - 1000
+                hidden_states = torch.clamp(hidden_states, min=-clamp_value, max=clamp_value)


@stas00 IS that ok to remove at inference time ?

In theory yes. In practice, it depends on how the model was pre-trained.

The model weights don't change during inference, so we don't need to keep things in check all the time.

However if the pre-trained model's weights lead to an overflow in a single iteration during training, as is the case with some mt5 models under mixed-precision then this can occur just as well during inference.

This is primarily an issue with pre-trained on bf16 models fine-tuned/inferenced on fp16 (mixed or non-mixed precision).

If a model was pretrained with fp16/mixed precision it's pretty sure the clamping won't be needed.

To give you a more intelligent answer it'd require running some tests with the actual DETR models and checking their activations magnitudes at the point you're asking about, which should be pretty trivial, using https://huggingface.co/transformers/debugging.html#underflow-and-overflow-detection, which can be plugged into HF Trainer and the examples with just a single cl arg --debug underflow_overflow.

To be honest I think this code was just badly copy pasted, so I'm more in favor of disabling this hack for training (as it is done now)

Ok, if everyone is favorable, then let's do this.

To be honest I think this code was just badly copy pasted, so I'm more in favor of disabling this hack for training (as it is done now)

you must have meant for inference, right Patrick?

patrickvonplaten · 2021-11-04T08:46:57Z

Good for merge for me

* Fiixng slow pipeline tests * Remove the image-segmentaiton override. * Fixing clamping only in training. * Wav2vec2. * Remove last mention of `no_grad`. * Fixing copies. * Rename.

patrickvonplaten reviewed Nov 3, 2021

View reviewed changes

src/transformers/models/unispeech_sat/modeling_unispeech_sat.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Nov 3, 2021

View reviewed changes

src/transformers/models/wav2vec2/modeling_wav2vec2.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Nov 3, 2021

View reviewed changes

patrickvonplaten approved these changes Nov 3, 2021

View reviewed changes

Narsil commented Nov 3, 2021

View reviewed changes

Narsil added 7 commits November 4, 2021 09:18

Fiixng slow pipeline tests

e00a3d6

Remove the image-segmentaiton override.

c7ead4a

Fixing clamping only in training.

443fcba

Wav2vec2.

f307ef6

Remove last mention of no_grad.

dca3788

Fixing copies.

0809fe3

Rename.

bbb9b27

Narsil force-pushed the fix_slow_tests branch from 128e4f5 to bbb9b27 Compare November 4, 2021 08:18

Narsil merged commit 68427c9 into huggingface:master Nov 4, 2021

Narsil deleted the fix_slow_tests branch November 4, 2021 08:49

Narsil mentioned this pull request Nov 29, 2021

Add inference_mode back to image_segmentation #14210

Closed

4 tasks

gau-nernst mentioned this pull request Aug 20, 2023

Return effective attention mask in Wav2Vec2BaseModelOutput #25471

Closed

5 tasks

sanchit-gandhi mentioned this pull request Aug 24, 2023

Add Seamless M4T model #25693

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing slow pipeline tests #14260

Fixing slow pipeline tests #14260

Narsil commented Nov 3, 2021

patrickvonplaten Nov 3, 2021

Narsil Nov 3, 2021

patrickvonplaten Nov 3, 2021

Narsil Nov 4, 2021

patrickvonplaten Nov 3, 2021

patrickvonplaten left a comment

Narsil Nov 3, 2021

stas00 Nov 3, 2021 •

edited

Loading

patrickvonplaten Nov 3, 2021

Narsil Nov 4, 2021

stas00 Nov 4, 2021

patrickvonplaten commented Nov 4, 2021

	mask = attention_mask.cumsum(dim=-1)[:, -1]
	non_padded_lengths = attention_mask.cumsum(dim=-1)[:, -1]

Fixing slow pipeline tests #14260

Fixing slow pipeline tests #14260

Conversation

Narsil commented Nov 3, 2021

What does this PR do?

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stas00 Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 4, 2021

stas00 Nov 3, 2021 •

edited

Loading