Improved acoustic model support: introducing autoregressive structure #15

r9y9 · 2020-09-07T17:02:32Z

As in the shallow AR model proposed by Xin Wang.

The issue was part of #1 but I raised a new issue since this is one of the very important action items to improve the singing voice synthesis quality. Specific discussion and progress can be done in this thread. Welcome any comments and suggestions.

Shallow AR
Standard AR
~~MDN + AR~~
Stream-wise modeling? Stream-wise training #21

taroushirani · 2020-09-26T16:04:01Z

Hello, I try to change the acoustic model from Conv1dResnet to RMDN as a preliminary step towards Shallow AR and I'm suffered from a shortage of GPU memory.

MDN returns pi, sigma, mu and total number of returned parameters is BxGx(2xD_out + 1) (B is batch size, G is number of Gaussian components, D_out is the number of dimension of target variable). In my experience, the upper limit of this value seems to be about 4000 when using GPU with 8GB memory. Because the out_dim of the default acoustic model is 199, we can set the number of Gaussian components to at most 10 or so, but this may be not enough to get good results.

Is there any need to train f0, mgc, bap separately, or is there good method to reduce memory consumption?

r9y9 · 2020-09-27T02:19:31Z

As I mentioned in #20, the number of Gaussians wouldn't be a large value (>16) in general, so I suppose using a mixture of density networks doesn't increase GPU memory usage dramatically. If we want to save GPU memory usage, we can use a smaller batch size, and it should be okay in my experience. If the batch size matters, we can implement a gradient accumulation trick.

Is there any need to train f0, mgc, bap separately

If we want to try the shallow-AR model, we would need to model F0 separately, at least. I didn't do that for simplicity, and my time was limited. Anyway, It's worth trying. If we implement the separate stream-wise training strategy, we can reduce GPU memory usage accordingly.

taroushirani · 2020-10-01T15:59:56Z

Thank you for your comment. I'll try the smaller number of Gaussians and investigate gradient accumulation trick.

r9y9 · 2022-07-03T07:03:55Z

I'm revisiting the autoregressive model. I managed to make my implementation work reasonably well, but not significantly better than the current models. #129

Pros

(Slightly) better temporal dynamics
(Slightly) less-smoothed outputs

Cons

Extremely slow training. It takes a week for 100 epochs with Ritsu's database. It depends on settings though.

r9y9 added enhancement New feature or request discussion labels Sep 7, 2020

r9y9 changed the title ~~Ehance acoustic model: introducing autoregressive structure~~ Improved acoustic model support: introducing autoregressive structure Sep 7, 2020

This was referenced Nov 1, 2020

Add BaseModel interface #27

Merged

Add proof of concept shallow AR model implementation #31

Merged

r9y9 closed this as completed in #31 Nov 6, 2020

r9y9 reopened this Nov 6, 2020

r9y9 mentioned this issue Nov 15, 2020

How do I train NNSVS? #40

Closed

r9y9 mentioned this issue Apr 30, 2022

Discussion: NNSVS vs. NEUTRINO #101

Open

r9y9 mentioned this issue Jul 3, 2022

Revisit autoregressive models: refactor Tacotron code #129

Merged

r9y9 self-assigned this Jul 3, 2022

r9y9 closed this as completed in #129 Jul 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved acoustic model support: introducing autoregressive structure #15

Improved acoustic model support: introducing autoregressive structure #15

r9y9 commented Sep 7, 2020 •

edited

taroushirani commented Sep 26, 2020 •

edited

r9y9 commented Sep 27, 2020

taroushirani commented Oct 1, 2020

r9y9 commented Jul 3, 2022

Improved acoustic model support: introducing autoregressive structure #15

Improved acoustic model support: introducing autoregressive structure #15

Comments

r9y9 commented Sep 7, 2020 • edited

taroushirani commented Sep 26, 2020 • edited

r9y9 commented Sep 27, 2020

taroushirani commented Oct 1, 2020

r9y9 commented Jul 3, 2022

Pros

Cons

r9y9 commented Sep 7, 2020 •

edited

taroushirani commented Sep 26, 2020 •

edited