Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGFPE on certain audio files #39

Closed
tazz4843 opened this issue Oct 11, 2022 · 3 comments · Fixed by #41
Closed

SIGFPE on certain audio files #39

tazz4843 opened this issue Oct 11, 2022 · 3 comments · Fixed by #41
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@tazz4843
Copy link
Contributor

Hey there! I'm testing out whisper.cpp to see if it would be suitable for production use. However I'm running into a SIGFPE on certain audio files: namely those that do not produce any output from the model. Because of the way my system is set up, I'm unable to provide any test files that can reproduce this bug.

However, I was able to build the library with debug symbols and trigger the exception. It seems to be a divide-by-zero error on line 2349 of whisper.cpp:

int progress_cur = (100*seek)/whisper_n_len(ctx);

The GDB output is as follows:

Thread 21 "scripty_stt_ser" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff7085700 (LWP 3869)]
0x0000555555599123 in whisper_full (ctx=0x5555556f6a80, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2349
2349            int progress_cur = (100*seek)/whisper_n_len(ctx);

Unfortunately, despite compiling with debug symbols (-g flag), bt gave no extra info beyond that:

(gdb) bt
#0  0x0000555555599123 in whisper_full (ctx=0x5555556f6a80, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2349
#1  0x0000555555593cf6 in whisper_rs::whisper_ctx::WhisperContext::full (self=<optimized out>, params=..., data=...) at src/whisper_ctx.rs:390

Let me know if there's anything else I can do to help!

@ggerganov
Copy link
Owner

Very likely that the length of the audio is 0.
The function whisper_n_len returns the length of the spectrogram and it will be 0 if the audio is empty.

There should be a check for division by zero.

When using -g you also want to avoid using -O3 in order to not get <optimized out>

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers labels Oct 11, 2022
@tazz4843
Copy link
Contributor Author

I'm willing to try out implementing this, how would you want the division by zero check to be handled? ie. should it throw an error or should it just silently skip?

@ggerganov
Copy link
Owner

Right after computing the mel spectrogram, check if the length is less than 100 (i.e. 1 second) and if yes - return 0:

whisper.cpp/whisper.cpp

Lines 2315 to 2320 in 8d94358

// compute log mel spectrogram
if (whisper_pcm_to_mel(ctx, samples, n_samples, params.n_threads) != 0) {
fprintf(stderr, "%s: failed to compute log mel spectrogram\n", __func__);
return -1;
}

Add short comment with explanation that we do not process audio less than 1 second

tazz4843 added a commit to tazz4843/whisper.cpp that referenced this issue Oct 11, 2022
anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
kultivator-consulting pushed a commit to KultivatorConsulting/whisper.cpp that referenced this issue Feb 12, 2024
…tate-struct-with-lifetime

refactor: delete map for State and expose struct with lifetime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants