-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dithering constant #371
Comments
Can you provide an example of code with noisy output using default value?
Can you provide Kaldi's or other software's default value? |
About example: as I already mentioned, @vincentqb, check this your issue #157 |
Thanks for pointing this out. We should make sure that From #157, it does seem like a value of 1 is large. If |
Addresses part of #263 |
Second this, the default right now makes the whole |
@popcornell I know that feel bro (the same problem I had had and after that I created this issue) |
@popcornell @Oktai15 -- We're looking at what would be a good value to use. What would you say would be a reasonable nonzero value? What values do other packages use, and that you like? @cpuhrsch -- I would have used 0. by default but the implementation says explicitly to specify In the absence of more information, I'd suggest import torchaudio
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
filename = "steam-train-whistle-daniel_simon.mp3"
s, sr = torchaudio.load(filename)
spec0 = torchaudio.transforms.Spectrogram()(s)[0]
plt.imshow(spec0.log2().numpy(), cmap='gray')
plt.show()
spec1 = torchaudio.compliance.kaldi.spectrogram(s, dither=0.)
plt.imshow(spec1.t().numpy(), cmap='gray')
plt.show()
spec2 = torchaudio.compliance.kaldi.spectrogram(s, dither=1e-5)
plt.imshow(spec2.t().numpy(), cmap='gray')
plt.show()
# Mean absolute percent difference
print(2*((spec1 - spec2).abs()/(spec1.abs() + spec2.abs())).mean())
# We see an average absolute percentage difference of 0.25%. |
Based on this discussion, we'll simply set
|
Kaldi uses However, in torchaudio, torchaudio.load(filename) returns a tensor with values in the range [-1, 1]. So if you still use the default value |
Why do
torchaudio.compliance.kaldi.fbank
andtorchaudio.compliance.kaldi.spectrogram
have so largedither
default parameter (=1.0)? It very often just noises full output.It's common to use dither around 0, e.g 0.00001 in QuartzNet, Jasper -- near to SOTA ASR models (https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/quartznet15x5.yaml).
I want to notice that even in torchaudio tutorial we have dither = 0.0: https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html.
Also look at this issue and how it was resolved: #157
The text was updated successfully, but these errors were encountered: