-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speed up synthesis? #134
Comments
It isn't easy to speed up the synthesis when using the implemented algorithm. If you want to speed up the synthesis, you should implement another algorithm, and I have proposed an algorithm for this purpose. Since this algorithm is not released yet, you must implement it if needed. Another approach is to reduce the sampling frequency. The 24-kHz (or 22.05 kHz) sampling is reasonable as the value not to degrade the sound quality, and it is straightforward. |
@mmorise Thanks, currently I'm using 16k synth, with mgc order 59. I've tried fft length 256, which output good audio quality. When I decrease fft to 128, the quality comes worse. If I use mgc order 23, do you think the quality will be good with fft length 128? |
read but not very understand lol |
I think appropriate FFT length depends on the F0 of the input signal, and the order of mgc would not affect the best FFT length. |
Am I understand correct? (Please delete this comment if I shouldn't write it here since your paper is not released yet:))
Thank you again. |
There are several tunings for the 16-kHz speech synthesis. For example, the number of band-pass filters is three. Fig. 1 in the paper shows how to generate the excitation signal. After that, the algorithm process the excitation signal by a simple overlap-add (OLA) algorithm. This idea is similar to the mixed excitation.
No. This algorithm uses a zero-phase spectrum to compensate for the original signal completely. |
@mmorise Many thanks to your answer. I found minimum phase code in WORLD, how can I find zero-phase spectrum? |
The zero-phase spectrum of a spectrum X[k] is defined as the |X[k]|. In this synthesis, we use zero-phase as the phase spectrum of the excitation signals. After generating the excitation signal, the minimum phase spectrum generated from the spectral envelope is used. |
hi @mmorise How to generate pulse? Is it generated from pitch in the similar logic as GetPulseLocationsForTimeBase in World code? |
Yes, the pulse is generated based on temporal positions in the vocal cord vibrations calculated by GetPulseLocatiosForTimebase in the synthesis function. In detail, amplitude 1 is given at these positions. |
Hi @mmorise Thanks for you kind reply. My savior is online now. xD I annotated the questions in the figure.
I'm sorry for so many questions and I look forward to your replies. |
Sorry, I misunderstood. If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details. I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++. |
Thank you for your quick reply!!! Great, the matlab code is open-sourced. I'll dive into the matlab code first. |
Hi @mmorise , the matlab code is concise and clear. Now I grasp the idea and implementation details of the paper. Thank you!! |
Hi, I tried to use WORLD to synth in mobile phones, the audio quality is good but speed is not fast. Is there any way to speed up synthesis? I called synthesisrealtime, and use a very small fft len, I noticed there're 7 fft forward/inverse when processing only one frame, is it possible to decrease the number? Thanks in advance.
The text was updated successfully, but these errors were encountered: