Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper: optimize fft() function #2242

Merged
merged 1 commit into from
Jun 18, 2024
Merged

Conversation

mkycoder
Copy link
Contributor

In log_mel_spectrogram_worker_thread() function, we allocate enough memory for fft_in and fft_out variable, and then we can avoid allocating memory for even, odd, even_fft and odd_fft in fft().
In my testing, this optimization can reduce mel time from 125 ms to 100 ms for 5-minute audio.

std::vector<float> fft_in(frame_size, 0.0);
std::vector<float> fft_out(2 * frame_size);
std::vector<float> fft_in(frame_size * 2, 0.0);
std::vector<float> fft_out(frame_size * 2 * 2 * 2);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct - shouldn't it be just frame_size * 2 * 2?

Copy link
Contributor Author

@mkycoder mkycoder Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose frame_size is 8,
For fft_in, we have series recursively 8, 4, 2, 1, so 8+4+2+1=15, 8*2 is enough.
A little complicated for fft_out, the series are 8*2(out)+8(even_fft)+8(odd_fft), 8+4+4, 4+2+2, 2+1+1, then 32+16+8+4=60, 8*2*2*2 is enough.

@ggerganov ggerganov merged commit bf4cb4a into ggerganov:master Jun 18, 2024
49 checks passed
bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Jun 27, 2024
* bobqianic/fix-decoding: (436 commits)
  Add files via upload
  Update whisper.cpp
  whisper : optimize fft() function (ggerganov#2242)
  talk-llama : sync llama.cpp
  whisper : use ggml_backend_sched (ggerganov#2239)
  fix : remove extra files
  scripts : sync ggml-blas
  build : update make / cmake
  sync : ggml
  move BLAS to a separate backend (cont) (llama/6210)
  Vulkan Shader Refactor, Memory Debugging Option (llama/7947)
  scripts : stop sync whisper example from ggml
  cmake : fix sycl build (#0)
  ggml : remove OpenCL (#0)
  sycl : sync (#0)
  cuda : enable CUDA graphs (#0)
  talk-llama : sync llama.cpp
  cmake : fix CUDA build (#0)
  sync : ggml
  ggml : fix and optimize ppc64le (ggml/849)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants