Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores #29248

cifkao · 2024-02-23T14:25:49Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@patrickvonplaten @sanchit-gandhi @ylacombe

ArthurZucker

For whisper, num_beams > 1 paths seems a bit tricky. Would you mind adding a test to make sure we have the expected new results?

cifkao · 2024-03-30T23:51:43Z

@ArthurZucker I added a slow test where I set the logprob_threshold to 0 to make sure the no-speech detection is triggered. Without the fix, all the outputs are empty because all the no-speech probabilities are >1.

ylacombe · 2024-04-01T08:08:51Z

Hey @cifkao, thanks for the PR!
Could you make sure to rebase the branch to fix the failing test ?

This scenario arises when the model is called with language set and with num_beams > 1.

Also, could you point out why this scenario applies when language is set ? I understand the case for num_beams>1 but don't see the point for the other case!
Many thanks!

HuggingFaceDocBuilderDev · 2024-04-01T08:25:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cifkao · 2024-04-01T09:21:10Z

Also, could you point out why this scenario applies when language is set ? I understand the case for num_beams>1 but don't see the point for the other case!

@ylacombe The bug manifests only when both conditions (num_beams > 1 and language set) are true. The latter condition causes decoding to start after the language token, which is why we need to call the model again to get the logits at that position (which are then incorrectly treated as log-probabilities). With language unset, the decoding starts from the beginning, so we never enter that branch.

sanchit-gandhi

Thanks for this fix @cifkao!

sanchit-gandhi · 2024-04-02T15:34:02Z

src/transformers/generation/logits_process.py

        if input_ids.shape[1] == self.begin_index:
            if self.start_of_trans_offset > 1:
                with torch.no_grad():
                    logits = self.model(**self.inputs).logits

                no_speech_index = self.begin_index - self.start_of_trans_offset
                no_speech_scores = logits[:, no_speech_index]
+                is_scores_logprobs = False


Nice catch!

sanchit-gandhi · 2024-04-02T15:35:39Z

tests/models/whisper/test_modeling_whisper.py

@@ -2615,6 +2615,59 @@ def test_whisper_longform_multi_batch_hard_prev_cond(self):
        for i in range(num_samples):
            assert decoded_all[i] == EXPECTED_TEXT[i]

+    @slow
+    def test_whisper_longform_no_speech_detection(self):


Thanks for adding this slow test! Just to confirm, before the fix all the transcriptions are empty due to the no-speech probabilities exceeding 1?

sanchit-gandhi · 2024-04-02T15:58:58Z

Ready for final review from @ArthurZucker

ArthurZucker

🤯 awesome catch!

ArthurZucker · 2024-04-03T15:53:21Z

Thank you for taking the time to add a test 🔥

sanchit-gandhi · 2024-04-10T13:51:51Z

Thanks for the contribution @cifkao!

…uting scores (#29248) * Fix is_scores_logprobs in WhisperNoSpeechDetection * Add test_whisper_longform_no_speech_detection * Fix typo

ArthurZucker requested a review from sanchit-gandhi March 7, 2024 11:31

ArthurZucker requested a review from ylacombe March 30, 2024 17:01

ArthurZucker reviewed Mar 30, 2024

View reviewed changes

cifkao requested a review from ArthurZucker March 30, 2024 23:51

ylacombe mentioned this pull request Apr 1, 2024

Incorrect no-speech probability in WhisperNoSpeechDetection when language is set and num_beams > 1 #29313

Closed

cifkao force-pushed the patch-1 branch from 4311e3f to f52389c Compare April 1, 2024 09:03

cifkao added 3 commits April 1, 2024 13:32

Fix is_scores_logprobs in WhisperNoSpeechDetection

7145660

Add test_whisper_longform_no_speech_detection

70d1632

Fix typo

f07f6c1

cifkao force-pushed the patch-1 branch from f52389c to f07f6c1 Compare April 1, 2024 11:32

sanchit-gandhi approved these changes Apr 2, 2024

View reviewed changes

ArthurZucker approved these changes Apr 3, 2024

View reviewed changes

ArthurZucker merged commit 240e106 into huggingface:main Apr 3, 2024
21 checks passed

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

Fix probability computation in WhisperNoSpeechDetection when recomp…

50c056c

…uting scores (#29248) * Fix is_scores_logprobs in WhisperNoSpeechDetection * Add test_whisper_longform_no_speech_detection * Fix typo

itazap pushed a commit that referenced this pull request May 14, 2024

Fix probability computation in WhisperNoSpeechDetection when recomp…

eb964d2

…uting scores (#29248) * Fix is_scores_logprobs in WhisperNoSpeechDetection * Add test_whisper_longform_no_speech_detection * Fix typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores #29248

Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores #29248

cifkao commented Feb 23, 2024 •

edited

ArthurZucker left a comment

cifkao commented Mar 30, 2024 •

edited

ylacombe commented Apr 1, 2024

HuggingFaceDocBuilderDev commented Apr 1, 2024

cifkao commented Apr 1, 2024 •

edited

sanchit-gandhi left a comment

sanchit-gandhi Apr 2, 2024

sanchit-gandhi Apr 2, 2024

cifkao Apr 2, 2024

sanchit-gandhi commented Apr 2, 2024

ArthurZucker left a comment

ArthurZucker commented Apr 3, 2024

sanchit-gandhi commented Apr 10, 2024

Fix probability computation in WhisperNoSpeechDetection when recomputing scores #29248

Fix probability computation in WhisperNoSpeechDetection when recomputing scores #29248

Conversation

cifkao commented Feb 23, 2024 • edited

What does this PR do?

Before submitting

Who can review?

ArthurZucker left a comment

Choose a reason for hiding this comment

cifkao commented Mar 30, 2024 • edited

ylacombe commented Apr 1, 2024

HuggingFaceDocBuilderDev commented Apr 1, 2024

cifkao commented Apr 1, 2024 • edited

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi Apr 2, 2024

Choose a reason for hiding this comment

sanchit-gandhi Apr 2, 2024

Choose a reason for hiding this comment

cifkao Apr 2, 2024

Choose a reason for hiding this comment

sanchit-gandhi commented Apr 2, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Apr 3, 2024

sanchit-gandhi commented Apr 10, 2024

Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores #29248

Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores #29248

cifkao commented Feb 23, 2024 •

edited

cifkao commented Mar 30, 2024 •

edited

cifkao commented Apr 1, 2024 •

edited