Modify asr.sh #5020

kamo-naoyuki · 2023-03-16T20:36:17Z

This PR includes my self-answer to #4944.

I changed asr.sh to raise an error if --train_set equal to --valid_set, or --train_set is also included in --test_sets.
If --valid_set is included in --test_sets, --eval_valid_set option is enabled.
if --eval_valid_set is enabled, dump/org/${valid_set} is evaluated at the decoding stage instead of dump/${valid_set}

Someone might still think we should filter the long/short utterances in the training python process, but finally, I concluded it's a bad idea. Please let me go in this direction.

I also changed the behavior of the --skip_train option:

Before this PR: The data preparation for ${train_set} and ${valid_set} still works. This is inconvinient if using a pre-trained model and evaluation is only required.
After this PR: The data preparation for ${train_set} and ${valid_set} can be also skipped.

Finally, I added a new feat-type: --feats_type raw_copy. This is almost the same as --feats_type raw_copy, but it can skip format_wav_scp.py.

This type is useful if the file format is already correct.　Sometimes, we create a new evaluation set from an existing dataset, e.g. applying a speech enhancement method. In this case. the data set may be already correct, so we might want to skip
format_wav_scp.py.

Please note that if an user specifies --feats_type raw_copy, the user is responsible to guaranteed that the data format follows correctly our requirements.

codecov · 2023-03-16T21:03:12Z

Codecov Report

Merging #5020 (09ce640) into master (7964a2a) will increase coverage by 3.55%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5020      +/-   ##
==========================================
+ Coverage   73.47%   77.02%   +3.55%     
==========================================
  Files         606      606              
  Lines       53748    53748              
==========================================
+ Hits        39491    41400    +1909     
+ Misses      14257    12348    -1909

Flag	Coverage Δ
test_integration_espnet1	`66.29% <ø> (+<0.01%)`	⬆️
test_integration_espnet2	`47.96% <ø> (?)`
test_python	`66.85% <ø> (+0.02%)`	⬆️
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 61 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kamo-naoyuki · 2023-03-17T09:40:18Z

@popcornell

Now we can use the same valid_set and test_set without modification. If ${valid_set} is included in ${test_sets}, it is just replaced to org/${valid} in the evaluation stages.

As the above comment, I also modified the behaviour of --skip_train. If --skip_train true is specified, the data preparation for the train_set can be skipped. This is useful for the chime7 challenge.

popcornell · 2023-03-17T09:55:02Z

This is great !

mergify bot added ESPnet2 CI Travis, Circle CI, etc labels Mar 16, 2023

kamo-naoyuki force-pushed the refactor branch 4 times, most recently from 0a92365 to 9830e79 Compare March 17, 2023 02:28

kamo-naoyuki added auto-merge Enable auto-merge ASR Automatic speech recogntion Refactoring Refactoring labels Mar 17, 2023

kamo-naoyuki force-pushed the refactor branch 4 times, most recently from 0fd3f61 to 08b1676 Compare March 17, 2023 06:02

refactoring asr.sh

09ce640

kamo-naoyuki force-pushed the refactor branch from 08b1676 to 09ce640 Compare March 17, 2023 07:18

kamo-naoyuki changed the title ~~Refactoring asr.sh~~ Modify asr.sh Mar 17, 2023

kamo-naoyuki added New Features and removed Refactoring Refactoring labels Mar 17, 2023

mergify bot merged commit 496365d into espnet:master Mar 17, 2023
24 checks passed

kamo-naoyuki deleted the refactor branch March 17, 2023 09:40

YoshikiMas mentioned this pull request Jul 15, 2023

Issue in preparing RESULTS.md for ASR recipes #5303

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify asr.sh #5020

Modify asr.sh #5020

kamo-naoyuki commented Mar 16, 2023 •

edited

codecov bot commented Mar 16, 2023 •

edited

kamo-naoyuki commented Mar 17, 2023

popcornell commented Mar 17, 2023

Modify asr.sh #5020

Modify asr.sh #5020

Conversation

kamo-naoyuki commented Mar 16, 2023 • edited

codecov bot commented Mar 16, 2023 • edited

Codecov Report

kamo-naoyuki commented Mar 17, 2023

popcornell commented Mar 17, 2023

kamo-naoyuki commented Mar 16, 2023 •

edited

codecov bot commented Mar 16, 2023 •

edited