Dithering constant #371

Oktai15 · 2019-12-18T08:59:13Z

Why do torchaudio.compliance.kaldi.fbank and torchaudio.compliance.kaldi.spectrogram have so large dither default parameter (=1.0)? It very often just noises full output.

It's common to use dither around 0, e.g 0.00001 in QuartzNet, Jasper -- near to SOTA ASR models (https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/quartznet15x5.yaml).

I want to notice that even in torchaudio tutorial we have dither = 0.0: https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html.

Also look at this issue and how it was resolved: #157

The text was updated successfully, but these errors were encountered:

vincentqb · 2019-12-23T16:36:55Z

Why do torchaudio.compliance.kaldi.fbank and torchaudio.compliance.kaldi.spectrogram have so large dither default parameter (=1.0)? It very often just noises full output.

Can you provide an example of code with noisy output using default value?

It's common to use dither around 0, e.g 0.00001 in QuartzNet, Jasper -- near to SOTA ASR models (https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/quartznet15x5.yaml).

Can you provide Kaldi's or other software's default value?

Oktai15 · 2019-12-23T21:50:56Z

About example: as I already mentioned, @vincentqb, check this your issue #157

vincentqb · 2019-12-26T15:01:28Z

Thanks for pointing this out. We should make sure thatspectrogram, fbank, and mfcc uses the same default.

From #157, it does seem like a value of 1 is large. If dither is set to zero though, the user should specify the energy_floor. Thoughts on what could be a good default, and what other softwares do?

vincentqb · 2019-12-27T15:10:03Z

Addresses part of #263

popcornell · 2020-02-29T22:48:47Z

Second this, the default right now makes the whole torchaudio.compliace.kaldi features totally unusable out-of-the-box.
I spent one hour looking at possible bugs on labels only to find out that basically my model was fed noise because of the dither default value.

Oktai15 · 2020-03-01T00:02:39Z

@popcornell I know that feel bro (the same problem I had had and after that I created this issue)

vincentqb · 2020-03-02T16:27:35Z

@popcornell @Oktai15 -- We're looking at what would be a good value to use. What would you say would be a reasonable nonzero value? What values do other packages use, and that you like?

@cpuhrsch -- I would have used 0. by default but the implementation says explicitly to specify energy_floor in that case, see here. Do you have more context? Note that all the test where done with dither=0.: see here for the original testing, and here where the second parameter (dither) in the file names for testing are all 0., e.g. spec-XXX-0-....

In the absence of more information, I'd suggest dither=1e-5.

import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)

spec0 = torchaudio.transforms.Spectrogram()(s)[0]
plt.imshow(spec0.log2().numpy(), cmap='gray')
plt.show()

spec1 = torchaudio.compliance.kaldi.spectrogram(s, dither=0.)
plt.imshow(spec1.t().numpy(), cmap='gray')
plt.show()

spec2 = torchaudio.compliance.kaldi.spectrogram(s, dither=1e-5)
plt.imshow(spec2.t().numpy(), cmap='gray')
plt.show()

# Mean absolute percent difference 
print(2*((spec1 - spec2).abs()/(spec1.abs() + spec2.abs())).mean())
# We see an average absolute percentage difference of 0.25%.

vincentqb · 2020-03-03T19:38:42Z

Based on this discussion, we'll simply set dither to 0 and energy_floor to 1 by default. This also seems to behave very closely to a small value of dither, see below.

import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

filename = "/Users/vincentqb/audio/test/assets/steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)

spec0 = torchaudio.transforms.Spectrogram()(s)[0]
plt.imshow(spec0.log2().numpy(), cmap='gray')
plt.show()

spec1 = torchaudio.compliance.kaldi.spectrogram(s, dither=0., energy_floor=1.)
plt.imshow(spec1.t().numpy(), cmap='gray')
plt.show()

spec2 = torchaudio.compliance.kaldi.spectrogram(s, dither=1e-6)
plt.imshow(spec2.t().numpy(), cmap='gray')
plt.show()

# Mean absolute percent difference 
print(2*((spec1 - spec2).abs()/(spec1.abs() + spec2.abs())).mean())
# We see an average absolute percentage difference of 0.16%.

csukuangfj · 2020-05-08T03:44:34Z

Why do torchaudio.compliance.kaldi.fbank and torchaudio.compliance.kaldi.spectrogram have so large dither default parameter (=1.0)

Kaldi uses 1 as the default dither value. It is fine for Kaldi because waveform in kaldi
has a range [-32768, 32767]. 1 is relatively small compared to the maximum value 32767.

However, in torchaudio,

torchaudio.load(filename)

returns a tensor with values in the range [-1, 1]. So if you still use the default value 1 from
Kaldi, you will distort the audio signal.

vincentqb self-assigned this Dec 23, 2019

vincentqb mentioned this issue Dec 27, 2019

Problems with Kaldi MFCCs #328

Open

vincentqb mentioned this issue Mar 3, 2020

Change default value of dither #453

Merged

vincentqb closed this as completed in #453 Mar 6, 2020

popcornell mentioned this issue Jun 28, 2020

Source Separation Integration: sum(sources + background_noise) != mixture with mels. lhotse-speech/lhotse#38

Closed

mthrok mentioned this issue Jul 16, 2020

Revise parameters for Kaldi mfcc compatibility test #689

Open

mthrok mentioned this issue Feb 15, 2021

RFC: The future of Kaldi compliance module #1269

Open

alephpi mentioned this issue Mar 18, 2024

Issues on VBx reimplementation BUTSpeechFIT/VBx#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dithering constant #371

Dithering constant #371

Oktai15 commented Dec 18, 2019 •

edited

Loading

vincentqb commented Dec 23, 2019

Oktai15 commented Dec 23, 2019 •

edited

Loading

vincentqb commented Dec 26, 2019 •

edited

Loading

vincentqb commented Dec 27, 2019

popcornell commented Feb 29, 2020

Oktai15 commented Mar 1, 2020

vincentqb commented Mar 2, 2020 •

edited

Loading

vincentqb commented Mar 3, 2020

csukuangfj commented May 8, 2020

Dithering constant #371

Dithering constant #371

Comments

Oktai15 commented Dec 18, 2019 • edited Loading

vincentqb commented Dec 23, 2019

Oktai15 commented Dec 23, 2019 • edited Loading

vincentqb commented Dec 26, 2019 • edited Loading

vincentqb commented Dec 27, 2019

popcornell commented Feb 29, 2020

Oktai15 commented Mar 1, 2020

vincentqb commented Mar 2, 2020 • edited Loading

vincentqb commented Mar 3, 2020

csukuangfj commented May 8, 2020

Oktai15 commented Dec 18, 2019 •

edited

Loading

Oktai15 commented Dec 23, 2019 •

edited

Loading

vincentqb commented Dec 26, 2019 •

edited

Loading

vincentqb commented Mar 2, 2020 •

edited

Loading