torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram #157

vincentqb · 2019-07-19T19:49:50Z

Does torchaudio.compliance.kaldi.spectrogram only currently support vectors?

When feeding a tensor of shape torch.Size([2, 276858]) the result is not what's expected, yet there is no error. I would expect a "train pattern" to be visible, as in the second figure below.

This is what kaldi gives

This is what torchaudio.transforms.spectrogram gives

The "train pattern" is also visible on academo.org.

The text was updated successfully, but these errors were encountered:

vincentqb · 2019-07-19T20:24:53Z

Code to generate the two figures below. Sound file here.

import torch
import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "assets/steam-train-whistle-daniel_simon-converted-from-mp3.wav"
tensor, frequency = torchaudio.load(filename)

spec = torchaudio.transforms.Spectrogram()(tensor)
plt.imshow(spec.log2().transpose(1,2)[0,:,:].numpy(), cmap='gray')
plt.show()

spec = torchaudio.compliance.kaldi.spectrogram(tensor)
plt.imshow(spec.log2().transpose(0,1).numpy(), cmap='gray')
plt.show()

vincentqb · 2019-07-19T21:02:35Z

The error appears unrelated to multiple channels, since I get similar results with

spec = torchaudio.compliance.kaldi.spectrogram(tensor[0,:].view(1,-1))
plt.imshow(spec.log2().transpose(0,1).numpy(), cmap='gray')
plt.show()

Note also that I had to pass the tensor with shape torch.Size([1, 276858]) and not torch.Size([276858]). The channel flag specifies which one of the channels will be process (the last by default) -- thanks @jamarshon for pointing this out!

vincentqb · 2019-07-19T21:46:19Z

The main issue is that the result from kaldi looks like noise, and the fact that the train pattern is not visible in the spectrogram is unexpected.

cpuhrsch · 2019-07-22T16:29:34Z

Try smaller inputs, zeros, ones, arange, etc. ; but in general we want to standardize on kaldi and whatever they produce is what we produce.

jamarshon · 2019-07-22T17:14:04Z

@vincentqb I could investigate the flags more kaldi.spectrogram to get a more closer result but is this more similar to what you would expect?

import torch
import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "/Users/jamarshon/Documents/GitHub/audio/test/assets/steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)
EPSILON = torch.tensor(torch.finfo(torch.float).eps, dtype=torch.get_default_dtype())

spec = torchaudio.transforms.Spectrogram()(s)
x = torch.max(EPSILON, spec).log2().transpose(1,2)[0,:,:]
plt.imshow(x.numpy(), cmap='gray')
plt.show()

n_fft = 400.0
fl = n_fft / sr * 1000.0
fs = fl / 2.0
spec2 = torchaudio.compliance.kaldi.spectrogram(
	s, dither=0.0, window_type='hanning', 
	frame_length=fl, frame_shift=fs, remove_dc_offset=False, 
	round_to_power_of_two=False, sample_frequency=sr)
y = spec2.t()
plt.imshow(y.numpy(), cmap='gray')
plt.show()

Spec1:

Spec2:

vincentqb · 2019-07-22T18:27:53Z

Great, that's good enough. Thanks!

mahmoodn · 2019-12-15T17:12:22Z

@vincentqb
Can you upload the wav file again. I can not find it. The link is broken.

vincentqb · 2019-12-17T16:56:42Z

@vincentqb
Can you upload the wav file again. I can not find it. The link is broken.

The file can still be accessed here.

vincentqb · 2019-12-26T14:59:40Z

For reference, this is enough to produce reasonable spectrogram.

spec = torchaudio.compliance.kaldi.spectrogram(tensor, dither=0.)
plt.imshow(spec.t().numpy(), cmap='gray')
plt.show()

EDIT: no log needed here.

Fixes pytorch#157

vincentqb changed the title ~~torchaudio.compliance.kaldi.spectrogram gives incorrect results for non-vector input~~ torchaudio.compliance.kaldi.spectrogram gives result different from torchaudio.transforms.spectrogram Jul 19, 2019

vincentqb changed the title ~~torchaudio.compliance.kaldi.spectrogram gives result different from torchaudio.transforms.spectrogram~~ torchaudio.compliance.kaldi.spectrogram gives results different from torchaudio.transforms.spectrogram Jul 19, 2019

vincentqb changed the title ~~torchaudio.compliance.kaldi.spectrogram gives results different from torchaudio.transforms.spectrogram~~ torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram Jul 19, 2019

vincentqb closed this as completed Jul 22, 2019

Oktai15 mentioned this issue Dec 18, 2019

Dithering constant #371

Closed

pablomainar mentioned this issue Feb 13, 2020

Problems with Kaldi MFCCs #328

Open

Oktai15 mentioned this issue Feb 18, 2021

torchaudio.compliance.kaldi.fbank #1245

Open

mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023

Fix unused --test-batch-size command line argument

8d9f910

Fixes pytorch#157

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram #157

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram #157

vincentqb commented Jul 19, 2019 •

edited

Loading

vincentqb commented Jul 19, 2019

vincentqb commented Jul 19, 2019 •

edited

Loading

vincentqb commented Jul 19, 2019

cpuhrsch commented Jul 22, 2019

jamarshon commented Jul 22, 2019

vincentqb commented Jul 22, 2019

mahmoodn commented Dec 15, 2019

vincentqb commented Dec 17, 2019

vincentqb commented Dec 26, 2019 •

edited

Loading

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram #157

torchaudio.compliance.kaldi.spectrogram is very different from torchaudio.transforms.spectrogram #157

Comments

vincentqb commented Jul 19, 2019 • edited Loading

vincentqb commented Jul 19, 2019

vincentqb commented Jul 19, 2019 • edited Loading

vincentqb commented Jul 19, 2019

cpuhrsch commented Jul 22, 2019

jamarshon commented Jul 22, 2019

vincentqb commented Jul 22, 2019

mahmoodn commented Dec 15, 2019

vincentqb commented Dec 17, 2019

vincentqb commented Dec 26, 2019 • edited Loading

vincentqb commented Jul 19, 2019 •

edited

Loading

vincentqb commented Jul 19, 2019 •

edited

Loading

vincentqb commented Dec 26, 2019 •

edited

Loading