Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Cyclic Normalizing Flow #1

Open
nuts-kun opened this issue Jun 4, 2023 · 3 comments
Open

About Cyclic Normalizing Flow #1

nuts-kun opened this issue Jun 4, 2023 · 3 comments

Comments

@nuts-kun
Copy link

nuts-kun commented Jun 4, 2023

Hi, thank you for your great work! :)
I could not find your email address in the paper and Google Scholar bio, so please let me ask you a question on this issue.

I have a question about Cyclic Normalizing Flow.
In Equation 3 of the paper, cycle consistency loss is defined as KL divergence between $p(z^{''}|x)$ and $p(z^{'}|x)$.
Here, since $p(z^{′′}|x) = f(f^{−1}(p(z^{'}|x)))$ and volume-preserving flow is used in the VITS, I think the following formula is true unless using Dropout in the Normalizing Flow part (except for the effect of padding on the edges).
$L_{cc}=KL[p(z^{′′}|x)||p(z^{'}|x)]=KL[p(z^{'}|x)||p(z^{'}|x)]=0$
In the original implementation of VITS, Dropout is 0 and there is no mention of using Dropout in the Normalizing Flow part in the paper, but do the experiments in the paper use Dropout?

I know you are currently busy during ICASSP, but I would be grateful if you could reply when you are free :).

@intory89
Copy link
Owner

intory89 commented Jul 1, 2023

As you suggested, the the representation of forward and backward directions should be equal. However, the mismatch problem occurs because the input vector in each direction is different. This is because the linguistic representation is produced by the prior encoder and the posterior representation is created in the posterior encoder. Therefore, we wanted to match forward and backward with only the linguistic representation.

@nuts-kun
Copy link
Author

nuts-kun commented Jul 6, 2023

Thanks for your reply :)

I agree with your point.
As the other works such as NaturalSpeech also show that enhancing prior and reducing posterior are really important to improve TTS quality.
But, only my question is about how to train model using cycle consistency loss.
In my understanding, cycle consistency loss should be 0, so the gradients should also be 0.
If gradients is 0, I think the model is not updated by this loss term.
Therefore, I wonder why this loss affect to improve model.
Could you tell me about this point?

Also, if you can share your code, it might be great help for me to understand.
Thank you :)

@nuts-kun
Copy link
Author

Hi, how about the above points?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants