New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix both utt2sid and utt2lid when removing long/short data #4609
Fix both utt2sid and utt2lid when removing long/short data #4609
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4609 +/- ##
========================================
Coverage 83.06% 83.07%
========================================
Files 508 508
Lines 43646 43775 +129
========================================
+ Hits 36253 36364 +111
- Misses 7393 7411 +18
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your report and fixing.
In your case, the fixing is not performed when there is no utt2lid or utt2sid.
So I changed the strategy. Could you check my suggestion?
Co-authored-by: Tomoki Hayashi <hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp>
No problem! Thanks 😄 |
Thank you for your kind contribution! |
Stage 3 of
tts.sh
removes long/short data. At this time, authors useutils/fix_data_dir.sh
to remove the entry of the removed utterance as follows:espnet/egs2/TEMPLATE/tts1/tts.sh
Lines 500 to 509 in e5d133c
However, if there exist both
utt2sid
andutt2lid
,_fix_opts
are overwritten as"--utt_extra_files utt2lid "
because authors used the assignment operator. So I tried to fix the above code as follows:However, after I tested it, it seems that
utils/fix_data_dir.sh
does not support multiple options. So, I suggest the following workaround in this PR becuaseutils/fix_data_dir.sh
is a part ofkaldi
: