Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstruction Loss #32

Open
AndreyBocharnikov opened this issue Feb 6, 2023 · 3 comments
Open

Reconstruction Loss #32

AndreyBocharnikov opened this issue Feb 6, 2023 · 3 comments
Labels
question Further information is requested

Comments

@AndreyBocharnikov
Copy link

❓ Questions

Hello, my question is about reconstruction loss in frequency domain, in a paragraph 3.4 it is stated that you use "mel-spectrogram using a normalized STFT", what type of normalization is mentioned here? Is it sufficient to use normalized flag of
torchaudio.transforms.MelSpectrogram which normalizes "by magnitude after stft"?
Also in practice stft loss is sometimes computed via log mel-spectrogram for better convergence, so I want to clarify, in your implementation, S_i from formula 1 is a mel-spectrogram or log mel-spectrogram?

@AndreyBocharnikov AndreyBocharnikov added the question Further information is requested label Feb 6, 2023
@AndreyBocharnikov
Copy link
Author

Clarification for the first question - because here spectrogram is being normalized by the argument I think the answer on my question will be yes.

@AndreyBocharnikov
Copy link
Author

Clarification for the second question - in the formula number (5) in the SoundStream paper the log is being taken of the mel-spec when computing L2 part of the multi-scale spectral reconstruction loss, so the question remains, did you remove it on purpose?

@AndreyBocharnikov
Copy link
Author

And one more question about the multi-scale spectral reconstruction loss, when constructing MelSpectrogram with 64 n_mels and window_size < 512 I get the following warning (4 of them)
/opt/conda/lib/python3.10/site-packages/torchaudio/functional/functional.py:539: UserWarning: At least one mel filterbank has all zero values. The value for n_mels (64) may be set too high. Or, the value for n_freqs (65) may be set too low. Is it an expected behavior and I should leave this loss as it is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant