Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training an autoencoder without z #66

Closed
james20141606 opened this issue Mar 27, 2020 · 7 comments
Closed

training an autoencoder without z #66

james20141606 opened this issue Mar 27, 2020 · 7 comments

Comments

@james20141606
Copy link

I noticed that in both ddsp/ddsp/training/gin/models/ae.gin and ddsp/ddsp/training/gin/models/ae_abs.gin settings, the model will use z as latent space. I tried to replace Autoencoder.decoder = @decoders.ZRnnFcDecoder() to Autoencoder.decoder = @decoders.RnnFcDecoder() to not use z and test the model's performance, is it the right way? I found that if I did not use z and use ae_abs.gin which jointly learns an encoder for f0, I will get loss nan after around 2000 steps. I doubt if this issue is from z latent missing...

@jesseengel
Copy link
Contributor

We have trained autoencoder models without latents, but you may need to adjust some things. You also want to make sure that the encoder is not making a latents if the decoder is not using them.

@james20141606
Copy link
Author

thanks for the reply! I only tried to change the decoder, for the encoder part I think it might have no influence on the result (even if it still produces a latent which is not used in the decoder, the gradient will not flow through this part and it only influences the training speed)?
I also have one not very related questions concerning loudness generation. I know DDSP generates loudness through some rules, what if we want to train an neural network to generate loudness? I found that an ordinary network might fail...

@jesseengel
Copy link
Contributor

jesseengel commented Mar 29, 2020 via email

@james20141606
Copy link
Author

I agree that generating loudness from audio using neural networks should not be so hard. But what if trying to generate loudness information from some other kind of signals instead of audios?

@jesseengel
Copy link
Contributor

jesseengel commented Mar 29, 2020 via email

@james20141606
Copy link
Author

wow, sounds interesting! look forward to your next research!
You mean a model like wavenet might be useful to model loudness/f0 from non-audio signal? I am afraid that it will make the model heavier and harder to train.

@jesseengel
Copy link
Contributor

You can also train small autoregressive models like a simple RNN, since the signal is much simpler than an audio waveform. I'm going to close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants