Migrate kaldi fbank #672

bhargavkathivarapu · 2020-06-01T06:14:55Z

Hi ,

This PR migrates the kaldi fbank test (#597 ) from test/test_compliance_kaldi.py to test/kaldi_compatibility_impl.py

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

mthrok · 2020-06-01T16:42:03Z

Thanks for working on this!
I think CI issue is something intermittent. (probably the webhook did not get delivered to CircleCI.) So it should work, if you push a commit again.

I tried your change and it almost worked. Here is the fix I suggest. (Note that you need to re-add parameterized to environment.yml when running tests on CI)

[nit] I moved import json. PEP8 recommends group imports by standard libraries, one line break, external libraries, one line break, own library.
It turned out that when passing iterables as one parameter, it has to be wrapped with tuple or parameterized.param object. (also changed the helper function name to accommodate this change)
There was an issue with the order of decorators. Flipping them fixed it.

diff --git a/test/kaldi_compatibility_impl.py b/test/kaldi_compatibility_impl.py
index aed5a35..880004d 100644
--- a/test/kaldi_compatibility_impl.py
+++ b/test/kaldi_compatibility_impl.py
@@ -1,4 +1,5 @@
 """Test suites for checking numerical compatibility against Kaldi"""
+import json
 import shutil
 import unittest
 import subprocess
@@ -9,8 +10,7 @@ import torchaudio.functional as F
 import torchaudio.compliance.kaldi

 from . import common_utils
-from parameterized import parameterized
-import json
+from parameterized import parameterized, param


 def _not_available(cmd):
@@ -49,9 +49,9 @@ def _run_kaldi(command, input_type, input_value):
     return torch.from_numpy(result.copy())  # copy supresses some torch warning


-def _load_jsonl(path):
+def _load_params(path):
     with open(path, 'r') as file:
-        return [json.loads(line) for line in file]
+        return [param(json.loads(line)) for line in file]


 class Kaldi(common_utils.TestBaseMixin):
@@ -75,8 +75,8 @@ class Kaldi(common_utils.TestBaseMixin):
         kaldi_result = _run_kaldi(command, 'ark', tensor)
         self.assert_equal(result, expected=kaldi_result)

+    @parameterized.expand(_load_params(common_utils.get_asset_path('kaldi_test_fbank_args.json')))
     @unittest.skipIf(_not_available('compute-fbank-feats'), '`compute-fbank-feats` not available')
-    @parameterized.expand(_load_jsonl(common_utils.get_asset_path('kaldi_test_fbank_args.json')))
     def test_fbank(self, kwargs):
         """fbank should be numerically compatible with compute-fbank-feats"""
         wave_file = common_utils.get_asset_path('kaldi_file.wav')

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

bhargavkathivarapu · 2020-06-02T08:47:56Z

@mthrok , Made the changes
With existing threshold for fbank : rtol=1e-4 and atol=1e-8
Out of 97 arg configurations , 9 are not satisfying the threshold

Old complaince kaldi test is using rtol=1e-1 and atol=1e-3 for fbank , should we change the threshold to this ??

mthrok · 2020-06-02T14:26:17Z

Hi @bhargavkathivarapu

Looking at the log, most of them are still close enough, but one of them looks way off.

____________________________________________________________________________________________________ TestKaldi_CPU_Float32.test_fbank_64 _____________________________________________________________________________________________________

a = (<test.common_utils.TestKaldi_CPU_Float32 testMethod=test_fbank_64>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

/home/moto/conda/envs/PY3.8-cuda101/lib/python3.8/site-packages/parameterized/parameterized.py:530:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/kaldi_compatibility_impl.py:87: in test_fbank
    self.assert_equal(result, expected=kaldi_result, rtol=1e-4, atol=1e-8)
test/kaldi_compatibility_impl.py:60: in assert_equal
    self.assertEqual(output, expected, rtol=rtol, atol=atol)
../pytorch/torch/testing/_internal/common_utils.py:1083: in assertEqual
    self.assertTrue(result, msg=message)
E   AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.0001 and atol=1e-08, found 1 element(s) (out of 5) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.8112268447875977 (10.725326538085938 vs. 9.91409969329834), which occurred at index (0, 3).
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
compute-fbank-feats --blackman-coeff=3.0442 --energy-floor=4.0677 --frame-length=1.0625 --frame-shift=1.125 --high-freq=5086 --htk-compat=true --low-freq=1013 --num-mel-bins=4 --preemphasis-coefficient=0.99 --raw-energy=false --remove-dc-offset=false --round-to-power-of-two=true --snip-edges=true --subtract-mean=false --use-energy=true --use-log-fbank=true --use-power=false --vtln-high=4997 --vtln-low=4836 --vtln-warp=1.9525 --window-type=hamming --dither=0.0 scp:- ark:-
LOG (compute-fbank-feats[5.5.689~1-2c7e7]:main():compute-fbank-feats.cc:185)  Done 1 out of 1 utterances.

I think some values are out of expected range, which makes me question the validity of the original test.

$ compute-fbank-feats --help

Create Mel-filter bank (FBANK) feature files.
Usage:  compute-fbank-feats [options...] <wav-rspecifier> <feats-wspecifier>

Options:
  --allow-downsample          : If true, allow the input waveform to have a higher frequency than the specified --sample-frequency (and we'll downsample). (bool, default = false)
  --allow-upsample            : If true, allow the input waveform to have a lower frequency than the specified --sample-frequency (and we'll upsample). (bool, default = false)
  --blackman-coeff            : Constant coefficient for generalized Blackman window. (float, default = 0.42)
  --channel                   : Channel to extract (-1 -> expect mono, 0 -> left, 1 -> right) (int, default = -1)
  --debug-mel                 : Print out debugging information for mel bin computation (bool, default = false)
  --dither                    : Dithering constant (0.0 means no dither). If you turn this off, you should set the --energy-floor option, e.g. to 1.0 or 0.1 (float, default = 1)
  --energy-floor              : Floor on energy (absolute, not relative) in FBANK computation. Only makes a difference if --use-energy=true; only necessary if --dither=0.0.  Suggested values: 0.1 or 1.0 (float, default = 0)
  --frame-length              : Frame length in milliseconds (float, default = 25)
  --frame-shift               : Frame shift in milliseconds (float, default = 10)
  --high-freq                 : High cutoff frequency for mel bins (if <= 0, offset from Nyquist) (float, default = 0)
  --htk-compat                : If true, put energy last.  Warning: not sufficient to get HTK compatible features (need to change other parameters). (bool, default = false)
  --low-freq                  : Low cutoff frequency for mel bins (float, default = 20)
  --max-feature-vectors       : Memory optimization. If larger than 0, periodically remove feature vectors so that only this number of the latest feature vectors is retained. (int, default = -1)
  --min-duration              : Minimum duration of segments to process (in seconds). (float, default = 0)
  --num-mel-bins              : Number of triangular mel-frequency bins (int, default = 23)
  --output-format             : Format of the output files [kaldi, htk] (string, default = "kaldi")
  --preemphasis-coefficient   : Coefficient for use in signal preemphasis (float, default = 0.97)
  --raw-energy                : If true, compute energy before preemphasis and windowing (bool, default = true)
  --remove-dc-offset          : Subtract mean from waveform on each frame (bool, default = true)
  --round-to-power-of-two     : If true, round window size to power of two by zero-padding input to FFT. (bool, default = true)
  --sample-frequency          : Waveform data sample frequency (must match the waveform file, if specified there) (float, default = 16000)
  --snip-edges                : If true, end effects will be handled by outputting only frames that completely fit in the file, and the number of frames depends on the frame-length.  If false, the number of frames depends only on the frame-shift, and we reflect the data at the ends. (bool, default = true)
  --subtract-mean             : Subtract mean of each feature file [CMS]; not recommended to do it this way.  (bool, default = false)
  --use-energy                : Add an extra dimension with energy to the FBANK output. (bool, default = false)
  --use-log-fbank             : If true, produce log-filterbank, else produce linear. (bool, default = true)
  --use-power                 : If true, use power, else use magnitude. (bool, default = true)
  --utt2spk                   : Utterance to speaker-id map (if doing VTLN and you have warps per speaker) (string, default = "")
  --vtln-high                 : High inflection point in piecewise linear VTLN warping function (if negative, offset from high-mel-freq (float, default = -500)
  --vtln-low                  : Low inflection point in piecewise linear VTLN warping function (float, default = 100)
  --vtln-map                  : Map from utterance or speaker-id to vtln warp factor (rspecifier) (string, default = "")
  --vtln-warp                 : Vtln warp factor (only applicable if vtln-map not specified) (float, default = 1)
  --window-type               : Type of window ("hamming"|"hanning"|"povey"|"rectangular"|"sine"|"blackmann") (string, default = "povey")
  --write-utt2dur             : Wspecifier to write duration of each utterance in seconds, e.g. 'ark,t:utt2dur'. (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

Looking at the default values, some values are very far from default.
For example, --frame-length, --frame-shift and --energy-floor seems very off to me.
Do you have an insight of this?

bhargavkathivarapu · 2020-06-02T16:12:05Z

@mthrok From the fbank argument generation script present at test/compliance/generate_fbank_data.py ( assuming these arguments are generated from this script )

Below configuration is used to generate for those 3 args :

wave_len  = 20
'energy_floor': '%.4f' % (random.random() * 5),
 'frame_length': '%.4f' % (float(random.randint(3, wave_len - 1)) / 16000 * 1000),
 'frame_shift': '%.4f' % (float(random.randint(1, wave_len - 1)) / 16000 * 1000)

Not sure about the exact range of values that these arguments must take

mthrok · 2020-06-03T00:33:39Z

So I got the advice from some experienced Kaldi user,

--frame-length and --frame-shift should be integer value around 10 - 50 [ms] if perturbing
--preemphasis-coefficient should be less than 1.0, around 0.90 - 0.99
--low-freq should be around 20 - 50 (and less than --high-freq)
--high-freq typical number -200 for 8000 Hz audio, ([3800 == (8000 / 2 - 200)] Hz). should pick numbers found in Kaldi examples.

So I think that parameters failing are not valid use cases. We should remove them.
For we can leave the passing cases as is and revise the validity of them pater as a follow up.

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

bhargavkathivarapu · 2020-06-03T13:55:22Z

@mthrok , Removed those 9 failing cases from JSON . Now all unit tests passed

mthrok

Looks good, thanks!

mthrok · 2020-06-03T14:42:16Z

@bhargavkathivarapu I forgot to add but can you do the honor to remove the migrated tests (test_compliance_kaldi.py::Test_Kaldi::test_fbank) and data (ark) files??

bhargavkathivarapu · 2020-06-03T17:06:44Z

yeah , will remove all compliance files at once , after all tests are migrated

* fix: Namespace issue of Reduction in CPP MNIST - Changed the at::Reduction::Sum into Reduction::Sum Solved issue pytorch#672 Signed-off-by: Arkadip <in2arkadipb13@gmail.com> * Update cpp/mnist/mnist.cpp Co-Authored-By: Will Feng <yf225@cornell.edu> Co-authored-by: Will Feng <yf225@cornell.edu>

bhargavkathivarapu added 2 commits June 1, 2020 11:40

Migrate fbank

6137edc

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

change yml

49f0d8d

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

bhargavkathivarapu mentioned this pull request Jun 1, 2020

Migrating kaldi tests #671

Closed

change yml and minor changes

27fb115

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

bhargavkathivarapu added 2 commits June 3, 2020 19:07

delete failing tests

cd22aea

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

delete failing tests 2

216a1b9

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>

mthrok approved these changes Jun 3, 2020

View reviewed changes

mthrok merged commit 8a03087 into pytorch:master Jun 3, 2020

mthrok mentioned this pull request Jun 3, 2020

Revise parameters for Kaldi fbank compatibility test #679

Open

bhargavkathivarapu mentioned this pull request Jun 7, 2020

kaldi compliance files cleanup for spec,fbank,mfcc #703

Merged

mthrok mentioned this pull request Jul 14, 2020

Issue 764: Switch Pitch Detection Test to use On the Fly Generation instead of file. #783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate kaldi fbank #672

Migrate kaldi fbank #672

bhargavkathivarapu commented Jun 1, 2020

mthrok commented Jun 1, 2020 •

edited

bhargavkathivarapu commented Jun 2, 2020

mthrok commented Jun 2, 2020

bhargavkathivarapu commented Jun 2, 2020 •

edited

mthrok commented Jun 3, 2020 •

edited

bhargavkathivarapu commented Jun 3, 2020

mthrok left a comment

mthrok commented Jun 3, 2020

bhargavkathivarapu commented Jun 3, 2020

Migrate kaldi fbank #672

Migrate kaldi fbank #672

Conversation

bhargavkathivarapu commented Jun 1, 2020

mthrok commented Jun 1, 2020 • edited

bhargavkathivarapu commented Jun 2, 2020

mthrok commented Jun 2, 2020

bhargavkathivarapu commented Jun 2, 2020 • edited

mthrok commented Jun 3, 2020 • edited

bhargavkathivarapu commented Jun 3, 2020

mthrok left a comment

Choose a reason for hiding this comment

mthrok commented Jun 3, 2020

bhargavkathivarapu commented Jun 3, 2020

mthrok commented Jun 1, 2020 •

edited

bhargavkathivarapu commented Jun 2, 2020 •

edited

mthrok commented Jun 3, 2020 •

edited