-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proof of concept shallow AR model implementation #31
Conversation
for shallow AR
also fixed typo, testset
This reverts commit 1b6cb5b.
I've added a SAR recipe that does not use MLPG. New samples were uploaded:
svs-world-conv-sar-wo-mlpg works okay, but the pitch seems like pitch is less stable than the models with MLPG. |
Based on my experiments, shallow AR models works better than the previous model for the NIT SONG 070 dataset. However, I couldn't find good configurations for the other dataset yet. I will leave it for future work. Let me merge the PR, and let's continue testing on different dataset and configurations. |
Fixes #15
Summary
This PR adds support for shallow AR models which was proposed in https://www.researchgate.net/publication/324609423_Autoregressive_neural_F0_model_for_statistical_parametric_speech_synthesis.
There are several differences from the original paper (e.g., continuous vs. quantized f0, joint feature modeling vs. F0 only). However, I think I have implemented the core idea.
How it works
The following figure shows a comparison of spectrograms between 1) the previous model (top), 2) the shallow AR model (bottom). As you can see, the shallow AR model is better at capturing the time-varying nature of F0. Vibrato is well modeled without explicit modeling of vibrato parameters.
Samples
Resouces