Large-v3 model hallucinates, large-v2 doesn't #777

Arche151 · 2024-04-02T09:38:07Z

I have two scripts, where the large-v3 model hallucinates, for instance by making up things, that weren't said or by spamming a word like 50 times. When I replace large-v3 in the script with large-v2, the transcription works fine.

Script 1:

import subprocess
from faster_whisper import WhisperModel

audio_file = "/tmp/audio_recording.wav"
recording_state_file = "/tmp/recording_state"

def start_recording():
    subprocess.Popen(["arecord", "-f", "cd", audio_file])
    open(recording_state_file, 'w').close()

def stop_recording():
    subprocess.call(["pkill", "arecord"])
    if os.path.exists(recording_state_file):
        os.remove(recording_state_file)
    transcribe_audio()
    os.remove(audio_file)

def is_recording():
    return os.path.exists(recording_state_file)

def transcribe_audio():
    model = WhisperModel("large-v3", device="cpu", compute_type="int8")
    segments, info = model.transcribe(audio_file)
    transcription = " ".join([segment.text for segment in segments]).strip()
    subprocess.Popen(["xclip", "-selection", "c"], stdin=subprocess.PIPE).communicate(input=transcription.encode())
    # Notify the user that transcription is complete and copied to clipboard
    subprocess.call(["notify-send", "Transcription Complete", "The transcription has been copied to the clipboard."])

def main():
    if is_recording():
        stop_recording()
    else:
        start_recording()

if __name__ == "__main__":
    main()

Script 2:

from faster_whisper import WhisperModel

model_size = "large-v3"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

def transcribe_audio(file_path):
    segments, info = model.transcribe(file_path)
    transcript = " ".join(segment.text.strip() for segment in segments)
    print(transcript)

if __name__ == "__main__":
    if len(sys.argv) > 1:
        audio_file = sys.argv[1]
        transcribe_audio(audio_file)
    else:
        print("Please provide an audio file path as an argument.")

Does anyone know of a fix? Is something wrong with my scripts or with the faster-whisper large-v3 model?

The text was updated successfully, but these errors were encountered:

trungkienbkhn · 2024-04-02T15:43:55Z

@Arche151 , could you try again with compute_type="default" (or remove this command when initializing whisper model) ?

Arche151 · 2024-04-02T17:14:12Z

@Arche151 , could you try again with compute_type="default" (or remove this command when initializing whisper model) ?

Thanks for the quick reply and suggestion!

I'll try that and report back.

Purfview · 2024-04-02T19:30:27Z

Large-v3 model hallucinates, large-v2 doesn't

It's known that large-v3 hallucinates much more than large-v2, read there:
Whisper-v3 Hallucinations on Real World Data

Arche151 · 2024-04-03T11:06:31Z

Large-v3 model hallucinates, large-v2 doesn't

It's known that large-v2 hallucinates much more that large-v2, read there: Whisper-v3 Hallucinations on Real World Data

Damn, that sucks hard. In that case, there's ofc nothing that faster-whisper can change about that. Then I guess I'll stay with large-v2.

Thanks for linking the article!

Purfview · 2024-04-04T11:33:51Z

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?

On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:

compression_ratio_threshold=2.2
log_prob_threshold=-0.7

terryops · 2024-04-07T04:20:45Z

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?

On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:

compression_ratio_threshold=2.2 log_prob_threshold=-0.7

does it yield better result than large-v2 using your parameters with large-v3?

Arche151 · 2024-04-07T07:32:34Z

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?
On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:
compression_ratio_threshold=2.2 log_prob_threshold=-0.7

does it yield better result than large-v2 using your parameters with large-v3?

I didn't try for long enough to be able to say. Just went back to large-v2 after reading the deepgram article.

Purfview · 2024-04-07T07:36:41Z

does it yield better result than large-v2 using your parameters with large-v3?

You tell me, as I don't use large-v3. IMO large-v2 is better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large-v3 model hallucinates, large-v2 doesn't #777

Large-v3 model hallucinates, large-v2 doesn't #777

Arche151 commented Apr 2, 2024

trungkienbkhn commented Apr 2, 2024

Arche151 commented Apr 2, 2024

Purfview commented Apr 2, 2024 •

edited

Arche151 commented Apr 3, 2024 •

edited

Purfview commented Apr 4, 2024 •

edited

terryops commented Apr 7, 2024

Arche151 commented Apr 7, 2024

Purfview commented Apr 7, 2024

Large-v3 model hallucinates, large-v2 doesn't #777

Large-v3 model hallucinates, large-v2 doesn't #777

Comments

Arche151 commented Apr 2, 2024

trungkienbkhn commented Apr 2, 2024

Arche151 commented Apr 2, 2024

Purfview commented Apr 2, 2024 • edited

Arche151 commented Apr 3, 2024 • edited

Purfview commented Apr 4, 2024 • edited

terryops commented Apr 7, 2024

Arche151 commented Apr 7, 2024

Purfview commented Apr 7, 2024

Purfview commented Apr 2, 2024 •

edited

Arche151 commented Apr 3, 2024 •

edited

Purfview commented Apr 4, 2024 •

edited