Resolve Inference Selection Bug Affecting Transcription Quality #1377

TheoBoyer · 2023-05-21T15:11:04Z

Currently, when none of the inference made at different temperatures can satisfy all of the "non-fallback" conditions, the last one is returned (the one with the biggest emperature with default args).

This can lead to weird behaviors, for example:

# I want to transcribe this piece of audio
print(model.transcribe(audio)["text"]) # Works fine

# I want to transcribe this piece of audio, and i want it to be very good so I increase the min logprob
print(model.transcribe(audio, logprob_threshold=-0.2)["text"]) # Result is bad because the one inference that is returned is the last one, with the biggest temperature

This PR doesn't change the behaviour when one of the inferences satisfy all of the conditions.
When it's not the case, the result that is returned is the one leading to the highest avg_logprob
When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned

When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned

hoonlight · 2023-07-17T15:01:05Z

I've been testing with this PR, and the improvement is bigger than I thought.
In the majority of the samples I tested, I was able to clearly observe an improvement in transcription accuracy.

@jongwook 님, Could you please review this PR?

guillaumekln · 2023-07-19T08:56:16Z

This change makes sense but I think it should take into account compression_ratio_threshold.

For example only consider the results where the compression ratio is below compression_ratio_threshold and select the best log prob from them. If all compression ratios are above compression_ratio_threshold, then it can pick the best log probs from all results.

TheoBoyer · 2023-12-18T10:25:34Z

This PR was merged into SYSTRAN/faster-whisper#356 and seems to somewhat improve transcription quality. Is there anything I can do for the review process ?

pacaklu · 2024-01-16T16:56:13Z

Hello guys,

I have also encountered this bug recently, so let me share my opinion about solution of this issue.

So let's assume that none of the predictions is meeting the criteria for compression_ratio and logprob.

I aggree with @guillaumekln, that in the case where one or more of the predictions is meeting the criteria for compression_ratio, then select the one with the best logprob.
If none of the predictions is meeting the criteria for compression_ratio, it can be a bit tricky to just select the one with the best logprob, because for small improvement of logprob, you can gain huge increase of compression_ratio.

Let me show you Real Data example that I found with whisper small, version 2 (it would be easy to create lot of mock examples, but this I did not want...):

temperature: 0.0
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.311024
compression: 5.588235294
---------------------------
temperature: 0.2
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.34232
compression: 5.588235294
---------------------------
temperature: 0.4
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.343599
compression: 5.588235294
---------------------------
temperature: 0.6
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.338365
compression: 5.588235294
---------------------------
temperature: 0.8
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.39118579
compression: 5.588235294
---------------------------
temperature: 1.0
encoded text: Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
logprob: -0.399336
compression: 2.411764705

So in this case, the first result would be selected, although it has very bad compression_ratio, where obviously the correct results would be the last one, where hallucination of the model is at least not that terrible.

So I suggest to calculate something called tradeoff_factor, that would take into both compression_ratio and logprob as:
(logprob of the prediction/logprob threshold) * (compression_ratio of the prediction / compression_ratio threshold).
And afterwards select the prediction with the lowest value of this tradeoff_factor.

In the current case, it will be:
[0.72419, 0.79707, 0.800047, 0.7878599, 0.9106927, 0.40129]

Possible implementation:

def _select_best_prediction(
    decoded_results: List[Decoding_result],
    logprob_threshold: Optional[float] = -1,
    compression_ratio_threshold: Optional[float] = 2.4,
) -> DecodingResult:
    """Select best prediction from decoded results with various temperatures."""
    assert len(decoded_results) > 0
    predictions_meeting_compression = []
    for pred in decoded_results:
        if pred.compression_ratio <= compression_ratio_threshold:
            predictions_meeting_compression.append(pred)

    # Case 1: There exist prediction with compression lower than
    # is the threshold
    # Then select the prediction with best log_prob
    if len(predictions_meeting_compression) > 0:
        return max(predictions_meeting_compression, key=lambda x: x.avg_logprob)

    # Case 2: There does not exist any prediction with compression ratio
    # smaller than the threshold
    # Then calculate tradeoff_factor between log_prob and compression ratio as
    # (logprob of the prediction/logprob threshold) *
    # (compression_ratio of the prediction / compression_ratio threshold)
    # and select the prediction with lowest value of this factor
    else:
        tradeoff_factors = []
        for pred in decoded_results:
            factor = (pred.avg_logprob / logprob_threshold) * (
                pred.compression_ratio / compression_ratio_threshold
            )
            tradeoff_factors.append(factor)
        best_index = tradeoff_factors.index(min(tradeoff_factors))
        return decoded_results(best_index)

TheoBoyer added 2 commits May 21, 2023 16:30

Return best option on fallback

a72a044

When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned

Return the best only if all fallbacks failed

f677284

TheoBoyer changed the title ~~Return best text~~ Resolve Inference Selection Bug Affecting Transcription Quality May 21, 2023

hoonlight mentioned this pull request Jul 17, 2023

Return result with best log prob when all temperature fallbacks failed SYSTRAN/faster-whisper#356

Merged

Merge branch 'main' into return_best_text

bf2612f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve Inference Selection Bug Affecting Transcription Quality #1377

Resolve Inference Selection Bug Affecting Transcription Quality #1377

TheoBoyer commented May 21, 2023

hoonlight commented Jul 17, 2023 •

edited

Loading

guillaumekln commented Jul 19, 2023

TheoBoyer commented Dec 18, 2023

pacaklu commented Jan 16, 2024 •

edited

Loading

Resolve Inference Selection Bug Affecting Transcription Quality #1377

Are you sure you want to change the base?

Resolve Inference Selection Bug Affecting Transcription Quality #1377

Conversation

TheoBoyer commented May 21, 2023

hoonlight commented Jul 17, 2023 • edited Loading

guillaumekln commented Jul 19, 2023

TheoBoyer commented Dec 18, 2023

pacaklu commented Jan 16, 2024 • edited Loading

hoonlight commented Jul 17, 2023 •

edited

Loading

pacaklu commented Jan 16, 2024 •

edited

Loading