New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444
Conversation
examples/pipeline_wavernn/main.py
Outdated
@@ -270,11 +270,11 @@ def main(args): | |||
|
|||
transforms = torch.nn.Sequential( | |||
torchaudio.transforms.Spectrogram(**melkwargs), | |||
LinearToMel( | |||
torchaudio.transforms.MelScale( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have an extra bandwidth, can you add a test that verifies that librosa.feature.melspectrogram(S=spectrogram, ...)
and MelScale()(spectrogram)
produce the same result?
We do have a test for MelSpectrogram
here
audio/test/torchaudio_unittest/transforms/librosa_compatibility_test_impl.py
Lines 62 to 87 in 35d68fd
@nested_params( | |
[ | |
param(n_fft=400, hop_length=200, n_mels=64), | |
param(n_fft=600, hop_length=100, n_mels=128), | |
param(n_fft=200, hop_length=50, n_mels=32), | |
], | |
[param(norm=norm) for norm in [None, 'slaney']], | |
[param(mel_scale=mel_scale) for mel_scale in ['htk', 'slaney']], | |
) | |
def test_MelSpectrogram(self, n_fft, hop_length, n_mels, norm, mel_scale): | |
sample_rate = 16000 | |
waveform = get_sinusoid( | |
sample_rate=sample_rate, n_channels=1, | |
).to(self.device, self.dtype) | |
expected = librosa.feature.melspectrogram( | |
y=waveform[0].cpu().numpy(), | |
sr=sample_rate, n_fft=n_fft, | |
hop_length=hop_length, n_mels=n_mels, norm=norm, | |
htk=mel_scale == "htk") | |
result = T.MelSpectrogram( | |
sample_rate=sample_rate, window_fn=torch.hann_window, | |
hop_length=hop_length, n_mels=n_mels, | |
n_fft=n_fft, norm=norm, mel_scale=mel_scale, | |
).to(self.device, self.dtype)(waveform)[0] | |
self.assertEqual(result, torch.from_numpy(expected), atol=5e-4, rtol=1e-5) |
It will be nice to add one for MelScale
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about MelSpectrogram
, how about using MelSpectrogram
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about MelSpectrogram, how about using MelSpectrogram directly?
Makes sense.
Fixed it and checked manually.
@vincentqb Review please. |
LGTM, but we need to make sure the convergence of the model has not been affected. @discort -- have you reran the training loop to see? I'm rerunning the model on my side to see if the convergence profile changed. |
thanks for getting back to me @vincentqb |
Alright, LGTM, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…ne preprocessing (pytorch#1444) * Use torchaudio melscale instead of librosa
* Parametrizaitons tutorial * Add remove_parametrization * Correct name * minor * Proper version number * Fuzzy spellcheck * version * Remove _tutorial from name * Forgot to add the file... * Rename parametrizations_tutorial by parametrizations everywhere Add Alban's suggestions Correct the code Beter spacing after enumeration * Minor * Add more comments * Minor * Prefer unicode over math * Minor * minor * Corrections Co-authored-by: Brian Johnson <brianjo@fb.com>
cc #593