You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, my question is about reconstruction loss in frequency domain, in a paragraph 3.4 it is stated that you use "mel-spectrogram using a normalized STFT", what type of normalization is mentioned here? Is it sufficient to use normalized flag of torchaudio.transforms.MelSpectrogram which normalizes "by magnitude after stft"?
Also in practice stft loss is sometimes computed via log mel-spectrogram for better convergence, so I want to clarify, in your implementation, S_i from formula 1 is a mel-spectrogram or log mel-spectrogram?
The text was updated successfully, but these errors were encountered:
Clarification for the second question - in the formula number (5) in the SoundStream paper the log is being taken of the mel-spec when computing L2 part of the multi-scale spectral reconstruction loss, so the question remains, did you remove it on purpose?
And one more question about the multi-scale spectral reconstruction loss, when constructing MelSpectrogram with 64 n_mels and window_size < 512 I get the following warning (4 of them) /opt/conda/lib/python3.10/site-packages/torchaudio/functional/functional.py:539: UserWarning: At least one mel filterbank has all zero values. The value for n_mels (64) may be set too high. Or, the value for n_freqs (65) may be set too low. Is it an expected behavior and I should leave this loss as it is?
❓ Questions
Hello, my question is about reconstruction loss in frequency domain, in a paragraph 3.4 it is stated that you use "mel-spectrogram using a normalized STFT", what type of normalization is mentioned here? Is it sufficient to use
normalized
flag oftorchaudio.transforms.MelSpectrogram
which normalizes "by magnitude after stft"?Also in practice stft loss is sometimes computed via log mel-spectrogram for better convergence, so I want to clarify, in your implementation, S_i from formula 1 is a mel-spectrogram or log mel-spectrogram?
The text was updated successfully, but these errors were encountered: