Bugfix: Illogical "Avoid computing higher temperatures on no_speech" #1903

Purfview · 2023-12-17T05:52:27Z

Bugfix for #1279

The bug: It's "silence" when decoding has failed due to compression_ratio_threshold [+no_speech_threshold] in #1279, when further down the code it's not "silence" anymore.

"Silence" should be only when decoding has failed due to logprob_threshold [+no_speech_threshold].

Like described there:

whisper/whisper/transcribe.py

Line 421 in 8bc8860

    
           parser.add_argument("--no_speech_threshold", type=optional_float, default=0.6, help="if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence")

And in code there:

whisper/whisper/transcribe.py

Lines 243 to 251 in 8bc8860

    
           if no_speech_threshold is not None: 
        
               # no voice activity check 
        
               should_skip = result.no_speech_prob > no_speech_threshold 
        
               if ( 
        
                   logprob_threshold is not None 
        
                   and result.avg_logprob > logprob_threshold 
        
               ): 
        
                   # don't skip if the logprob is high enough, despite the no_speech_prob 
        
                   should_skip = False

Purfview · 2023-12-17T05:54:57Z

Related: SYSTRAN/faster-whisper#621

Purfview · 2023-12-17T06:07:57Z

I think this bug can trigger the hallucination loops because on some hallucination it wouldn't trigger the prompt reset on high temperature , and because higher temperatures are not computed on what is not an actual "silence".

Purfview · 2023-12-17T15:25:03Z

@TheoBoyer @jongwook , would be great if you could have a look.

TheoBoyer · 2023-12-18T09:54:07Z

This change is consistent with the rest of the code, so I'm not against it.

The original PR indeed skipped processing based on the logprob_threshold, but it was also contingent on logprob_threshold being set. @jongwook modified this. I assume the intention was to make the process independent of whether a threshold is set, but there may be reasons for this change that I'm unaware of.

However, I'm skeptical about involving logprob_threshold in silence discrimination in the first place.
The approach figure in the original paper clearly shows that there shouldn't be any decoding after no_speech.

PR #1279 was created because no_speech does not depend on token decoding; hence, regardless of the tokens decoded, no_speech_prob will remain unchanged.

In the (too) few experiments I conducted, the model seemed capable of hallucinating high-probability tokens during silences. It would be beneficial if someone could further investigate the relevance of incorporating logprob_threshold in silence discrimination. I'm also interested to know if any related experiments already exist.

Purfview · 2023-12-18T12:45:32Z

However, I'm skeptical about involving logprob_threshold in silence discrimination in the first place.

no_speech_threshold alone is pretty unreliable, model can generate no_speech_prob close to 1.0 on a perfectly fine speech.

Purfview · 2023-12-21T12:19:36Z

I think this bug can trigger the hallucinations loop because on some hallucination it wouldn't trigger the prompt reset on high temperature , because higher temperatures are not computed on what is not an actual "silence".

My guess was right, Today I encountered one:

DEBUG: Compression ratio threshold is not met with temperature 0.0 (6.677966 > 2.400000)
[04:17.320 --> 04:29.020]  been doing it for a long time. I'm a professional. I'm a professional. I'm a
[04:29.020 --> 04:29.340]  professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:29.340 --> 04:34.560]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:34.560 --> 04:38.360]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm
[04:38.360 --> 05:03.750]  a professional. I'm a professional. I'm a professional. I'm a professional. I'm

No hallucination loop with this bugfix:

DEBUG: Compression ratio threshold is not met with temperature 0.0 (6.677966 > 2.400000)
DEBUG: Compression ratio threshold is not met with temperature 0.2 (8.533333 > 2.400000)
DEBUG: Compression ratio threshold is not met with temperature 0.4 (8.884615 > 2.400000)
[04:17.320 --> 04:22.640]  got me feeling natural. Finding a natural-seeming way to fail at any given task.
[04:23.700 --> 04:27.140]  In each of the commercials that I'm in, I'm the one who simply can't go on
[04:27.140 --> 04:33.340]  without the product. It's ridiculous that we don't have the product. Show them.
DEBUG: Reset prompt. prompt_reset_on_temperature threshold is met 0.600000 > 0.500000
DEBUG: Log probability threshold is not met with temperature 0.0 (-1.344815 < -1.000000)
DEBUG: Log probability threshold is not met with temperature 0.2 (-1.150256 < -1.000000)
[04:33.340 --> 04:35.340]  No, you shouldn't.
[04:36.020 --> 04:36.300]  Please.
[04:36.560 --> 04:37.520]  You wanna see?
[04:38.020 --> 04:39.080]  Yeah, I wanna see.
[04:43.260 --> 04:44.120]  She's amazing.
[05:03.870 --> 05:05.110]  I just...
[05:05.110 --> 05:05.650]  I...

Bugfix for openai#1279 It's "silence" when decoding has failed due to `compression_ratio_threshold` too, when further down the code it's not "silence" anymore. "Silence" should be only when decoding has failed due to `logprob_threshold`. Like described there: https://github.com/openai/whisper/blob/8bc8860694949db53c42ba47ddc23786c2e02a8b/whisper/transcribe.py#L421 And in code there: https://github.com/openai/whisper/blob/8bc8860694949db53c42ba47ddc23786c2e02a8b/whisper/transcribe.py#L243-L251

Purfview · 2024-01-19T15:03:59Z

Another example of hallucination fix: #1962

Purfview force-pushed the patch-1 branch from bd5f92b to e7d46c9 Compare January 4, 2024 08:06

Fix if "logprob_threshold=None"

2a7e12f

nguyendc-systran mentioned this pull request Jan 24, 2024

Bugfix: Illogical "Avoid computing higher temperatures on no_speech" SYSTRAN/faster-whisper#625

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix: Illogical "Avoid computing higher temperatures on no_speech" #1903

Bugfix: Illogical "Avoid computing higher temperatures on no_speech" #1903

Purfview commented Dec 17, 2023 •

edited

Purfview commented Dec 17, 2023

Purfview commented Dec 17, 2023 •

edited

Purfview commented Dec 17, 2023

TheoBoyer commented Dec 18, 2023

Purfview commented Dec 18, 2023 •

edited

Purfview commented Dec 21, 2023

Purfview commented Jan 19, 2024

	if no_speech_threshold is not None:
	# no voice activity check
	should_skip = result.no_speech_prob > no_speech_threshold
	if (
	logprob_threshold is not None
	and result.avg_logprob > logprob_threshold
	):
	# don't skip if the logprob is high enough, despite the no_speech_prob
	should_skip = False

Bugfix: Illogical "Avoid computing higher temperatures on no_speech" #1903

Are you sure you want to change the base?

Bugfix: Illogical "Avoid computing higher temperatures on no_speech" #1903

Conversation

Purfview commented Dec 17, 2023 • edited

Purfview commented Dec 17, 2023

Purfview commented Dec 17, 2023 • edited

Purfview commented Dec 17, 2023

TheoBoyer commented Dec 18, 2023

Purfview commented Dec 18, 2023 • edited

Purfview commented Dec 21, 2023

Purfview commented Jan 19, 2024

Purfview commented Dec 17, 2023 •

edited

Purfview commented Dec 17, 2023 •

edited

Purfview commented Dec 18, 2023 •

edited