Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up synthesis? #134

Closed
OnceJune opened this issue Dec 22, 2021 · 14 comments
Closed

How to speed up synthesis? #134

OnceJune opened this issue Dec 22, 2021 · 14 comments

Comments

@OnceJune
Copy link

Hi, I tried to use WORLD to synth in mobile phones, the audio quality is good but speed is not fast. Is there any way to speed up synthesis? I called synthesisrealtime, and use a very small fft len, I noticed there're 7 fft forward/inverse when processing only one frame, is it possible to decrease the number? Thanks in advance.

@mmorise
Copy link
Owner

mmorise commented Dec 22, 2021

It isn't easy to speed up the synthesis when using the implemented algorithm. If you want to speed up the synthesis, you should implement another algorithm, and I have proposed an algorithm for this purpose. Since this algorithm is not released yet, you must implement it if needed.
https://ieeexplore.ieee.org/document/9023206

Another approach is to reduce the sampling frequency. The 24-kHz (or 22.05 kHz) sampling is reasonable as the value not to degrade the sound quality, and it is straightforward.

@OnceJune
Copy link
Author

@mmorise Thanks, currently I'm using 16k synth, with mgc order 59. I've tried fft length 256, which output good audio quality. When I decrease fft to 128, the quality comes worse. If I use mgc order 23, do you think the quality will be good with fft length 128?

@OnceJune
Copy link
Author

https://ieeexplore.ieee.org/document/9023206

read but not very understand lol

@mmorise
Copy link
Owner

mmorise commented Dec 23, 2021

I think appropriate FFT length depends on the F0 of the input signal, and the order of mgc would not affect the best FFT length.

@OnceJune
Copy link
Author

https://ieeexplore.ieee.org/document/9023206

Am I understand correct? (Please delete this comment if I shouldn't write it here since your paper is not released yet:))

  1. Prepare 7 band-pass filters;
  2. Prepare MVN;
  3. Prepare Pulse(Is it minimun phase using sp?);
  4. Multiply 1 & 3;
  5. Conv 2 & 3;
  6. Multiply each subband from 4 by 1-ap, then sum together;
  7. Multiply each subband from 5 by interpolated ap, then sum together;
  8. Add 6 & 7.

Thank you again.

@mmorise
Copy link
Owner

mmorise commented Dec 24, 2021

There are several tunings for the 16-kHz speech synthesis. For example, the number of band-pass filters is three. Fig. 1 in the paper shows how to generate the excitation signal. After that, the algorithm process the excitation signal by a simple overlap-add (OLA) algorithm. This idea is similar to the mixed excitation.

Prepare Pulse(Is it minimun phase using sp?);

No. This algorithm uses a zero-phase spectrum to compensate for the original signal completely.

@OnceJune
Copy link
Author

OnceJune commented Jan 7, 2022

@mmorise Many thanks to your answer. I found minimum phase code in WORLD, how can I find zero-phase spectrum?

@mmorise
Copy link
Owner

mmorise commented Jan 7, 2022

The zero-phase spectrum of a spectrum X[k] is defined as the |X[k]|. In this synthesis, we use zero-phase as the phase spectrum of the excitation signals. After generating the excitation signal, the minimum phase spectrum generated from the spectral envelope is used.

@mmorise mmorise closed this as completed Feb 6, 2022
@bfs18
Copy link

bfs18 commented Nov 2, 2023

hi @mmorise How to generate pulse? Is it generated from pitch in the similar logic as GetPulseLocationsForTimeBase in World code?

@mmorise
Copy link
Owner

mmorise commented Nov 2, 2023

Yes, the pulse is generated based on temporal positions in the vocal cord vibrations calculated by GetPulseLocatiosForTimebase in the synthesis function. In detail, amplitude 1 is given at these positions.

@bfs18
Copy link

bfs18 commented Nov 3, 2023

Hi @mmorise Thanks for you kind reply. My savior is online now. xD
I'm implementing the algorithm, but due to limited knowledge in audio signal processing, I have some questions with the details. Besides, this post is sort of misleading.

I annotated the questions in the figure.
20231103-120047

  1. is the filter applied via sliding widow multiplication and summation (temporal convolution)?
  2. does the * symbol indicates temporal convolution? And is the temporal convolution implement via FFT frame-wisely. If this is the case, this part employs FFT N times , it is time-consuming.
  3. Is Ap the AperiodicRatio in WORLD code? The symbol indicates scalar multiplication?
  4. c = sqrt(number of samples in frame) ?
  5. envelope shaping is implemented by multiplying the temporal signal with the interpolated AperiodicRatio?
  6. Is step 2 of the algorithm the same as I depicted? The spectrum is first transformed into a minimum phase spectrum, which is then multiplied by the FFT of the excitation signal of the corresponding frame, and finally IFFT is performed.
  7. number of taps of the filters used in 1.?
  8. How is v/uv used in this algorithm?
  9. How to "calculate the filter and the convolution in advance" as mentioned in Section III?

I'm sorry for so many questions and I look forward to your replies.

@mmorise mmorise reopened this Nov 3, 2023
@mmorise
Copy link
Owner

mmorise commented Nov 3, 2023

Sorry, I misunderstood.
Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage).
https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html

If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.

I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.

@bfs18
Copy link

bfs18 commented Nov 3, 2023

Sorry, I misunderstood. Do you have a MATLAB license? If yes, you can download an implementation of MATLAB (Please see TestWORLDRequiem.m for the usage). https://www.isc.meiji.ac.jp/~mmorise/world/english/download.html

If you don't have it, I'll explain it again, but please give me some time because I have forgotten the details.

I didn't implement a C++ version, which is helpful for practical realization, because it may be close to a patent by another company. This is foresight to avoid trouble in the patent. I guess it is unlikely to cause patent trouble, but please use it with self-responsibility if you implement this program in C++.

Thank you for your quick reply!!! Great, the matlab code is open-sourced. I'll dive into the matlab code first.

@bfs18
Copy link

bfs18 commented Nov 4, 2023

Hi @mmorise , the matlab code is concise and clear. Now I grasp the idea and implementation details of the paper. Thank you!!

@mmorise mmorise closed this as completed Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants