New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Suppress non-speech-related token outputs #473

Merged

ggerganov merged 2 commits into ggerganov:master from shibukazu:feature/non-speech-token-suppression

Feb 8, 2023

Contributor

shibukazu commented Feb 5, 2023

Problem

Speech unrelated outputs like "(笑)", "[Bell]" often emerge.

Change

Suppressed the output of non-speech-related tokens based on the OpenAI's implement.
(https://github.com/openai/whisper/blob/7858aa9c08d98f75575035ecd6481f462d66ca27/whisper/tokenizer.py#L224-L253)


          add non-speech-token suppression

2d3332e

RndyP commented Feb 5, 2023

I was suggesting this also. Good idea. However, shouldn't this be controlled by a bool in whisper_full_params ?

Contributor Author

shibukazu commented Feb 5, 2023

Your point is right, I will add the parameter to be able to control suppression.


          add suppress non-speech_tokens param

a8f0bd4

Contributor Author

shibukazu commented Feb 5, 2023 •

edited

Loading

I noticed that "non-speech-tokens suppression" makes the decoding speed worse in the case input audio file doesn't have any voice (only silent audio).
I think this is because "non-speech-tokens suppression" makes it difficult to express silent state (e.g. [SILENT], [無音] )
Does anyone have good idea?
I realized that by setting temperature_inc parameter as 0, users can mitigate this kind of decoding speed degradation.
That's why this is not a problem.

albino1 mentioned this pull request

Potential whisper.cpp GPU support via the Const-me Windows Implementation SubtitleEdit/subtitleedit#6651

Closed

ggerganov merged commit cfc06bf into ggerganov:master

ggerganov added a commit that referenced this pull request


          whisper : by default disable non-speech tokens suppression (#473)

a94897b

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

rock3125 pushed a commit to rock3125/whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

16c539e

* add non-speech-token suppression

* add suppress non-speech_tokens param

rock3125 pushed a commit to rock3125/whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

a6bac7c

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

53080b5

* add non-speech-token suppression

* add suppress non-speech_tokens param

anandijain pushed a commit to anandijain/whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

d172645

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

bobqianic mentioned this pull request

Why has whisper.cpp become less "fun" and useful for transcribing lyrics over music #1240

Closed

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

4090c65

* add non-speech-token suppression

* add suppress non-speech_tokens param

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

6a5c8e8

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

a552fef

* add non-speech-token suppression

* add suppress non-speech_tokens param

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

2759dda

* add non-speech-token suppression

* add suppress non-speech_tokens param

landtanin pushed a commit to landtanin/whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

09b2dc3

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request


          whisper : suppress non-speech-related token outputs (ggerganov#473)

3b1fe77

* add non-speech-token suppression

* add suppress non-speech_tokens param

iThalay pushed a commit to iThalay/whisper.cpp that referenced this pull request


          whisper : by default disable non-speech tokens suppression (ggerganov…

dbacde0

…#473)

This seems to be causing hallucinations in the end of the audio, e.g.:

"Thank you for listening"
"Amen"
..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet