Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating kaldi tests #671

Closed
wants to merge 3 commits into from

Conversation

bhargavkathivarapu
Copy link
Contributor

Hi ,
Regarding #597 , This PR moves tests from test/test_compliance_kaldi.py to test/kaldi_compatibility_impl.py

Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>
Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>
Signed-off-by: Bhargav Kathivarapu <bhargavkathivarapu31@gmail.com>
@bhargavkathivarapu
Copy link
Contributor Author

@vincentqb ,

fbank, mfcc , spectrogram outputs between kaldi and torchaudio kaldi are not matching in a few cases as per the threshold defined . In other cases they are matching as per threshold .

For resample what is the kaldi command to get resampled waveform ? , for now I kept the kaldi result of resample to None

@bhargavkathivarapu bhargavkathivarapu marked this pull request as ready for review May 31, 2020 11:33
@mthrok
Copy link
Collaborator

mthrok commented May 31, 2020

Hi @bhargavkathivarapu

Thanks for working on this.

First, let's put migration of different types of tests into separate PRs and start from existing one. bank or mfcc.

Second, instead of simple text file, can you use JSON Lines format? (with .json extension) so that no string parsing is required in test code. then we do not have to add _get_func_args, _parse, and TEST_PREFIX.
JSON Lines format is one JSON object (dict with proper arg name and values) per line, which can be read like

def _load_jsonl(path):
    with open('file.json', 'r') as file:
        return [json.loads(line) for line in file]

Third, when working on different types of test, can you use different parameter files? One JSON Line file for fbank, one for mfcc etc...

Forth, the current implementation stops when there is one test error. Let's try using parameterized.expand to list all such failures.
@vincentqb This is my temporary suggestion for parameterization. PyTorch has some discussion going on pytorch/pytorch#11578 but this is stall. Once they implement their own parameterization, we can migrate it.

Combining the second and forth suggestions, the required change should be fairly simple.

From

def test_fbank(self):

to

@parameterized.expand(_load_jsonl(path_to_parameter_jsonl_file))
def test_fbank(self, kwargs):

And you will need to add dependencies to CI config here and here

@@ -48,6 +89,8 @@ def _run_kaldi(command, input_type, input_value):


class Kaldi(common_utils.TestBaseMixin):
test_8000_filepath = common_utils.get_asset_path('kaldi_file_8000.wav')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this inside of test each test. The intended usage of asset file is not immediately clear when someone else work on adding a new test, so it is not very suited to be common property.

README.md Show resolved Hide resolved
@bhargavkathivarapu
Copy link
Contributor Author

@mthrok ,Submitted another PR ( #672 ) just for fbank , includes below

  1. Separate JSON file for each fbank args
  2. Adding parameterized dependency to linux and windows yml files
  3. Using the load json and parameterized.expand in tests

But the CI tests are not getting trigged for that pull request
I tried reverting the yaml to previous configuration without parametezied , but the commit is not triggering CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants