Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444

discort · 2021-04-09T12:07:35Z

discort · 2021-04-09T12:48:52Z

mthrok · 2021-04-09T17:25:09Z

examples/pipeline_wavernn/main.py

@@ -270,11 +270,11 @@ def main(args):

    transforms = torch.nn.Sequential(
        torchaudio.transforms.Spectrogram(**melkwargs),
-        LinearToMel(
+        torchaudio.transforms.MelScale(


If you have an extra bandwidth, can you add a test that verifies that librosa.feature.melspectrogram(S=spectrogram, ...) and MelScale()(spectrogram) produce the same result?

We do have a test for MelSpectrogram here

audio/test/torchaudio_unittest/transforms/librosa_compatibility_test_impl.py

Lines 62 to 87 in 35d68fd

@nested_params(

[

param(n_fft=400, hop_length=200, n_mels=64),

param(n_fft=600, hop_length=100, n_mels=128),

param(n_fft=200, hop_length=50, n_mels=32),

],

[param(norm=norm) for norm in [None, 'slaney']],

[param(mel_scale=mel_scale) for mel_scale in ['htk', 'slaney']],

)

def test_MelSpectrogram(self, n_fft, hop_length, n_mels, norm, mel_scale):

sample_rate = 16000

waveform = get_sinusoid(

sample_rate=sample_rate, n_channels=1,

).to(self.device, self.dtype)

expected = librosa.feature.melspectrogram(

y=waveform[0].cpu().numpy(),

sr=sample_rate, n_fft=n_fft,

hop_length=hop_length, n_mels=n_mels, norm=norm,

htk=mel_scale == "htk")

result = T.MelSpectrogram(

sample_rate=sample_rate, window_fn=torch.hann_window,

hop_length=hop_length, n_mels=n_mels,

n_fft=n_fft, norm=norm, mel_scale=mel_scale,

).to(self.device, self.dtype)(waveform)[0]

self.assertEqual(result, torch.from_numpy(expected), atol=5e-4, rtol=1e-5)

It will be nice to add one for MelScale as well.

Talking about MelSpectrogram, how about using MelSpectrogram directly?

Talking about MelSpectrogram, how about using MelSpectrogram directly?

Makes sense.
Fixed it and checked manually.

mthrok · 2021-04-13T15:46:49Z

@vincentqb Review please.

vincentqb · 2021-04-14T15:34:39Z

LGTM, but we need to make sure the convergence of the model has not been affected. @discort -- have you reran the training loop to see? I'm rerunning the model on my side to see if the convergence profile changed.

discort · 2021-04-15T12:05:14Z

thanks for getting back to me @vincentqb
Unfortunately I don't have resources to check the convergency. Let me know if there are any problems and next week I'll try to allocate some staff to check it manually.

vincentqb · 2021-04-15T23:03:36Z

Alright, LGTM, thanks!

vincentqb

Thanks!

…ne preprocessing (pytorch#1444) * Use torchaudio melscale instead of librosa

* Parametrizaitons tutorial * Add remove_parametrization * Correct name * minor * Proper version number * Fuzzy spellcheck * version * Remove _tutorial from name * Forgot to add the file... * Rename parametrizations_tutorial by parametrizations everywhere Add Alban's suggestions Correct the code Beter spacing after enumeration * Minor * Add more comments * Minor * Prefer unicode over math * Minor * minor * Corrections Co-authored-by: Brian Johnson <brianjo@fb.com>

Use torchaudio melscale instead of librosa

b57aae3

facebook-github-bot added the CLA Signed label Apr 9, 2021

mthrok reviewed Apr 9, 2021

View reviewed changes

use direct MelSpectrogram

db11d27

vincentqb approved these changes Apr 15, 2021

View reviewed changes

vincentqb merged commit e061b26 into pytorch:master Apr 15, 2021

discort deleted the replace_wavernn_librosa branch April 16, 2021 07:36

carolineechen pushed a commit to carolineechen/audio that referenced this pull request Apr 30, 2021

Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeli…

66ff2e6

…ne preprocessing (pytorch#1444) * Use torchaudio melscale instead of librosa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444

Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444

discort commented Apr 9, 2021 •

edited by vincentqb

discort commented Apr 9, 2021

mthrok Apr 9, 2021

mthrok Apr 9, 2021

discort Apr 9, 2021 •

edited

mthrok commented Apr 13, 2021

vincentqb commented Apr 14, 2021

discort commented Apr 15, 2021

vincentqb commented Apr 15, 2021

vincentqb left a comment

	@nested_params(
	[
	param(n_fft=400, hop_length=200, n_mels=64),
	param(n_fft=600, hop_length=100, n_mels=128),
	param(n_fft=200, hop_length=50, n_mels=32),
	],
	[param(norm=norm) for norm in [None, 'slaney']],
	[param(mel_scale=mel_scale) for mel_scale in ['htk', 'slaney']],
	)
	def test_MelSpectrogram(self, n_fft, hop_length, n_mels, norm, mel_scale):
	sample_rate = 16000
	waveform = get_sinusoid(
	sample_rate=sample_rate, n_channels=1,
	).to(self.device, self.dtype)

	expected = librosa.feature.melspectrogram(
	y=waveform[0].cpu().numpy(),
	sr=sample_rate, n_fft=n_fft,
	hop_length=hop_length, n_mels=n_mels, norm=norm,
	htk=mel_scale == "htk")
	result = T.MelSpectrogram(
	sample_rate=sample_rate, window_fn=torch.hann_window,
	hop_length=hop_length, n_mels=n_mels,
	n_fft=n_fft, norm=norm, mel_scale=mel_scale,
	).to(self.device, self.dtype)(waveform)[0]
	self.assertEqual(result, torch.from_numpy(expected), atol=5e-4, rtol=1e-5)

Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444

Use torchaudio melscale 'slaney' instead of librosa in WaveRNN pipeline preprocessing #1444

Conversation

discort commented Apr 9, 2021 • edited by vincentqb

discort commented Apr 9, 2021

mthrok Apr 9, 2021

Choose a reason for hiding this comment

mthrok Apr 9, 2021

Choose a reason for hiding this comment

discort Apr 9, 2021 • edited

Choose a reason for hiding this comment

mthrok commented Apr 13, 2021

vincentqb commented Apr 14, 2021

discort commented Apr 15, 2021

vincentqb commented Apr 15, 2021

vincentqb left a comment

Choose a reason for hiding this comment

discort commented Apr 9, 2021 •

edited by vincentqb

discort Apr 9, 2021 •

edited