New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support diffusion-based acoustic models #175
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for now NPSSMDNMultistreamModel is supported
so one can easily configure the models also clarified which parts need to be adjusted when using custom hed
- use variance predictor mdn for time-lag - add comments - myconfig -> world
with diffusion-based acoustic model config
r9y9
changed the title
Diffusion-based acoustic models
Support diffusion-based acoustic models
Nov 27, 2022
Codecov Report
@@ Coverage Diff @@
## master #175 +/- ##
==========================================
+ Coverage 64.08% 64.39% +0.31%
==========================================
Files 39 43 +4
Lines 5346 6005 +659
==========================================
+ Hits 3426 3867 +441
- Misses 1920 2138 +218
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Uploaded demo samples: https://r9y9.github.io/projects/nnsvs/#bonus-samples |
This was referenced Nov 27, 2022
The tests are all green. Good to go. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Diffusion-related
NPSSMDNMultistreamParametricModel
andMDNMultistreamSeparateF0MelModel
can be configured with diffusion-based modelsLuckily, thanks to the modularized design of NNSVS, no changes were needed to the synthesis scripts. Also, very small changes were needed to the training script.
Samples: https://r9y9.github.io/projects/nnsvs/#bonus-samples
Recipe configs
This PR also updates some training configs based on my recent experiments.
fixes #167
Limitations
Notes on design choice
While the original DiffSinger used a self-attention-based encoder for the diffusion model, I decided to use a simpler encoder based on Sinsy's acoustic model architecture (FFConvLSTM). I found it works well with a significantly smaller memory footprint.
How to use
Please check
recipes/namine_ritsu_utagoe_db/dev-48k-melf0
as an example.config.yaml
Mel features:
WORLD:
You can also specify the above by command line.
Steps
Up to stage 3 is done as usual.
SiFi-GAN training
NOTE: Training 200k steps would be enough for testing purposes. Try 600k steps only if you want to maximize the performance.
Mel features:
WORLD
Training diffusion-based acoustic model
Mel features
WORLD:
Synthesize waveforms
Mel features
WORLD