Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large-v3 model hallucinates, large-v2 doesn't #777

Open
Arche151 opened this issue Apr 2, 2024 · 8 comments
Open

Large-v3 model hallucinates, large-v2 doesn't #777

Arche151 opened this issue Apr 2, 2024 · 8 comments

Comments

@Arche151
Copy link

Arche151 commented Apr 2, 2024

I have two scripts, where the large-v3 model hallucinates, for instance by making up things, that weren't said or by spamming a word like 50 times. When I replace large-v3 in the script with large-v2, the transcription works fine.

Script 1:

import subprocess
from faster_whisper import WhisperModel

audio_file = "/tmp/audio_recording.wav"
recording_state_file = "/tmp/recording_state"

def start_recording():
    subprocess.Popen(["arecord", "-f", "cd", audio_file])
    open(recording_state_file, 'w').close()

def stop_recording():
    subprocess.call(["pkill", "arecord"])
    if os.path.exists(recording_state_file):
        os.remove(recording_state_file)
    transcribe_audio()
    os.remove(audio_file)

def is_recording():
    return os.path.exists(recording_state_file)

def transcribe_audio():
    model = WhisperModel("large-v3", device="cpu", compute_type="int8")
    segments, info = model.transcribe(audio_file)
    transcription = " ".join([segment.text for segment in segments]).strip()
    subprocess.Popen(["xclip", "-selection", "c"], stdin=subprocess.PIPE).communicate(input=transcription.encode())
    # Notify the user that transcription is complete and copied to clipboard
    subprocess.call(["notify-send", "Transcription Complete", "The transcription has been copied to the clipboard."])

def main():
    if is_recording():
        stop_recording()
    else:
        start_recording()

if __name__ == "__main__":
    main() 

Script 2:

from faster_whisper import WhisperModel

model_size = "large-v3"
model = WhisperModel(model_size, device="cpu", compute_type="int8")

def transcribe_audio(file_path):
    segments, info = model.transcribe(file_path)
    transcript = " ".join(segment.text.strip() for segment in segments)
    print(transcript)

if __name__ == "__main__":
    if len(sys.argv) > 1:
        audio_file = sys.argv[1]
        transcribe_audio(audio_file)
    else:
        print("Please provide an audio file path as an argument.")

Does anyone know of a fix? Is something wrong with my scripts or with the faster-whisper large-v3 model?

@trungkienbkhn
Copy link
Collaborator

@Arche151 , could you try again with compute_type="default" (or remove this command when initializing whisper model) ?

@Arche151
Copy link
Author

Arche151 commented Apr 2, 2024

@Arche151 , could you try again with compute_type="default" (or remove this command when initializing whisper model) ?

Thanks for the quick reply and suggestion!

I'll try that and report back.

@Purfview
Copy link
Contributor

Purfview commented Apr 2, 2024

Large-v3 model hallucinates, large-v2 doesn't

It's known that large-v3 hallucinates much more than large-v2, read there:
Whisper-v3 Hallucinations on Real World Data

@Arche151
Copy link
Author

Arche151 commented Apr 3, 2024

Large-v3 model hallucinates, large-v2 doesn't

It's known that large-v2 hallucinates much more that large-v2, read there: Whisper-v3 Hallucinations on Real World Data

Damn, that sucks hard. In that case, there's ofc nothing that faster-whisper can change about that. Then I guess I'll stay with large-v2.

Thanks for linking the article!

@Purfview
Copy link
Contributor

Purfview commented Apr 4, 2024

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?

On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:

compression_ratio_threshold=2.2
log_prob_threshold=-0.7

@terryops
Copy link

terryops commented Apr 7, 2024

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?

On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:

compression_ratio_threshold=2.2 log_prob_threshold=-0.7

does it yield better result than large-v2 using your parameters with large-v3?

@Arche151
Copy link
Author

Arche151 commented Apr 7, 2024

Then I guess I'll stay with large-v3.

Did you meant "large-v2"?
On my Standalone Faster-Whisper I've added auto-offsets to whisper's pseudo-vad thresholds when "v3" is in use, you can try these parameters when using large-v3:
compression_ratio_threshold=2.2 log_prob_threshold=-0.7

does it yield better result than large-v2 using your parameters with large-v3?

I didn't try for long enough to be able to say. Just went back to large-v2 after reading the deepgram article.

@Purfview
Copy link
Contributor

Purfview commented Apr 7, 2024

does it yield better result than large-v2 using your parameters with large-v3?

You tell me, as I don't use large-v3. IMO large-v2 is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants