Which library is torchaudio consistent with?

Hi, I'm currently updating my torch codebase from using librosa to torchaudio for transforms, to take advantage of the (much) faster stft torch implementation on the GPU. However, running into several occasions where the output from Spectrogram vs. librosa.core._spectrogram, MelSpectrogram vs. librosa.melspectrogram have different results. Does this repo ensure consistency with another python audio library for those transformations? I think it would be good to have consistency with another widely used library. Currently figuring out the correct params to ensure consistency and I can PR something if that sounds useful. 

For example:
```
sound, sample_rate = torchaudio.load('wav_file.wav')
sound = sound
sound_librosa = sound.cpu().numpy().squeeze().T

sample_rate = 16000
n_mels = 40
window_stride = 0.01
window_size = 0.025
hop_length = int(sample_rate * window_stride)
n_fft = int(sample_rate * window_size)

stft_librosa = librosa.stft(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft)
spectro_librosa, n_fft = librosa.core.spectrum._spectrogram(y=sound_librosa,
                            hop_length=hop_length,
                            n_fft=n_fft, power=2)
mel_basis = librosa.filters.mel(sample_rate,
                                n_mels=n_mels,
                                n_fft=n_fft,
                                norm=None, # non-standard
                                htk=True) # non-standard
check = np.dot(mel_basis, spectro_librosa)

stft_torch = torch.stft(soundcuda,
                        hop_length=hop_length,
                        n_fft=n_fft,
                        window=window).transpose(1, 2)
spectro_torch = stft_torch.pow(2).sum(-1)
melscale = torchaudio.transforms.MelScale(n_mels=n_mels)
check2 = melscale(check)

#check == check2
```

The torchaudio MelScale uses the non-default librosa options norm=None, htk=True on librosa.filters.mel (https://librosa.github.io/librosa/_modules/librosa/filters.html#mel). I also removed the default spectrogram normalization at https://github.com/pytorch/audio/blob/master/torchaudio/transforms.py#L198, which is not a librosa option.

There's also functional inconsistencies between the librosa and torchaudio function calls -- librosa returns a spectrogram with librosa.feature.melspectrogram, whereas torchaudio converts the spectrogram to the Db scale. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Which library is torchaudio consistent with? #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Which library is torchaudio consistent with? #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions