Add proof of concept shallow AR model implementation #31

r9y9 · 2020-11-02T12:14:50Z

Fixes #15

Summary

This PR adds support for shallow AR models which was proposed in https://www.researchgate.net/publication/324609423_Autoregressive_neural_F0_model_for_statistical_parametric_speech_synthesis.

There are several differences from the original paper (e.g., continuous vs. quantized f0, joint feature modeling vs. F0 only). However, I think I have implemented the core idea.

How it works

The following figure shows a comparison of spectrograms between 1) the previous model (top), 2) the shallow AR model (bottom). As you can see, the shallow AR model is better at capturing the time-varying nature of F0. Vibrato is well modeled without explicit modeling of vibrato parameters.

Samples

The previous model: https://soundcloud.com/r9y9/20200522-haru-ga-kita-3-nit-song070?in=r9y9/sets/dnn-based-singing-voice
Shallow AR model: https://soundcloud.com/r9y9/20200522-haru-ga-kita-4-nit-song070-sar-test?in=r9y9/sets/dnn-based-singing-voice

Resouces

for shallow AR

also fixed typo, testset

This reverts commit 1b6cb5b.

r9y9 · 2020-11-06T16:19:47Z

I've added a SAR recipe that does not use MLPG. New samples were uploaded:

svs-world-conv: https://soundcloud.com/r9y9/2020117-yuki-nit-song070-svs-world-conv
svs-world-conv-sar: https://soundcloud.com/r9y9/2020117-yuki-nit-song070-svs-world-conv-sar
svs-world-conv-sar-wo-mlpg: https://soundcloud.com/r9y9/2020117-yuki-nit-song070-svs-world-conv-sar-wo-mlpg

svs-world-conv-sar-wo-mlpg works okay, but the pitch seems like pitch is less stable than the models with MLPG.

r9y9 · 2020-11-06T16:26:30Z

Based on my experiments, shallow AR models works better than the previous model for the NIT SONG 070 dataset. However, I couldn't find good configurations for the other dataset yet. I will leave it for future work.

Let me merge the PR, and let's continue testing on different dataset and configurations.

r9y9 added 12 commits November 2, 2020 20:50

Add FIR implementations to dsp.py

c867bb9

base: add preprocess_target interface

ba6999f

for shallow AR

Add torchaudio dependency

92a7a99

Add Conv1dResnetSAR; a shalle AR model with conv1dresnet structure

8687c69

Fix typo

1612efd

egs: add a shallow AR recipe for nit-song070

15a764f

Fix 00-svs-world recipe to use hed v3

686a86e

also fixed typo, testset

disable pretrained model by default

27e281f

refactor shallow ar inference

72d7fe5

Add LSTMRNNSAR

cfadb54

FIx kiritan recipe

1b6cb5b

Merge remote-tracking branch 'origin/master' into shallow-ar

465eae5

r9y9 mentioned this pull request Nov 2, 2020

Implementation status and planned TODOs #1

Closed

40 tasks

r9y9 added 9 commits November 3, 2020 12:16

Revert "FIx kiritan recipe"

27c79c2

This reverts commit 1b6cb5b.

Merge remote-tracking branch 'origin/master' into shallow-ar

6483c4d

Merge remote-tracking branch 'origin/master' into shallow-ar

36e43f7

Merge remote-tracking branch 'origin/master' into shallow-ar

861f8fd

rollback

3b9f6d5

Merge remote-tracking branch 'origin/master' into shallow-ar

06f57ce

Merge remote-tracking branch 'origin/master' into shallow-ar

dd62ecb

update recipe to the new style

dfd16bc

Add a SAR recipe that does not use MLPG

420c951

r9y9 merged commit c2121d6 into master Nov 6, 2020

r9y9 deleted the shallow-ar branch November 6, 2020 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proof of concept shallow AR model implementation #31

Add proof of concept shallow AR model implementation #31

r9y9 commented Nov 2, 2020

r9y9 commented Nov 6, 2020

r9y9 commented Nov 6, 2020

Add proof of concept shallow AR model implementation #31

Add proof of concept shallow AR model implementation #31

Conversation

r9y9 commented Nov 2, 2020

Summary

How it works

Samples

Resouces

r9y9 commented Nov 6, 2020

r9y9 commented Nov 6, 2020