Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR includes my self-answer to #4944.
asr.sh
to raise an error if--train_set
equal to--valid_set
, or--train_set
is also included in--test_sets
.--valid_set
is included in--test_sets
,--eval_valid_set
option is enabled.--eval_valid_set
is enabled,dump/org/${valid_set}
is evaluated at the decoding stage instead ofdump/${valid_set}
Someone might still think we should filter the long/short utterances in the training python process, but finally, I concluded it's a bad idea. Please let me go in this direction.
I also changed the behavior of the
--skip_train
option:${train_set}
and${valid_set}
still works. This is inconvinient if using a pre-trained model and evaluation is only required.${train_set}
and${valid_set}
can be also skipped.Finally, I added a new feat-type:
--feats_type raw_copy
. This is almost the same as--feats_type raw_copy
, but it can skipformat_wav_scp.py
.This type is useful if the file format is already correct. Sometimes, we create a new evaluation set from an existing dataset, e.g. applying a speech enhancement method. In this case. the data set may be already correct, so we might want to skip
format_wav_scp.py
.Please note that if an user specifies
--feats_type raw_copy
, the user is responsible to guaranteed that the data format follows correctly our requirements.