Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other feature representations besides mel-spect #28

Closed
Irislucent opened this issue May 5, 2022 · 4 comments
Closed

Other feature representations besides mel-spect #28

Irislucent opened this issue May 5, 2022 · 4 comments

Comments

@Irislucent
Copy link

I'm doing music related research, and mel-spectrogram doesn't seem to be the best data representation for the task I'm handling with, so I'm considering switching to CQT.
I trained DiffWave on music Mel-spectrograms and it yielded very impressive result. I'm wondering whether it makes sense to use some other input representations other than Mel-spectrograms, such as CQT? (The representation is informational enough)

@sharvil
Copy link
Contributor

sharvil commented May 5, 2022

I haven't tried CQT inputs with DiffWave, but I have tried learnt representations. Those experiments were successful so I'd be surprised if CQT didn't work out.

If possible, please consider submitting a PR to add a CQT preprocessing step. I'm sure others working with music would appreciate it. :)

@Irislucent
Copy link
Author

Sure I will! But I haven't got any meaningful result for now, training takes quite a lot of time and it destructs my confidence.

@Irislucent
Copy link
Author

I'm curious, when you tried those learnt representations, did you change any hyperparameters, or even the model, to make it work? Did you encounter any difference from training with mel-spectrograms?

@sharvil
Copy link
Contributor

sharvil commented May 5, 2022

I didn't change any hyperparameters. I was using a quantized learnt represenatation (from a VQ-VAE) which is quite different from mel spectrograms. Since adjacent quantized frames are typically discontinuous, I added a convnet to try and smooth out the conditioning signal before it's sent to the rest of the network. That was the only change I remember making.

The experiment was successful in the sense that DiffWave was able to act as a decoder for the quantized inputs. Unfortunately, my VQ-VAE model was poorly tuned so the audio quality was worse than with mel spectrograms.

@sharvil sharvil closed this as completed Jul 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants