Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix both utt2sid and utt2lid when removing long/short data #4609

Merged
merged 3 commits into from Sep 5, 2022

Conversation

jonghwanhyeon
Copy link
Contributor

@jonghwanhyeon jonghwanhyeon commented Aug 30, 2022

Stage 3 of tts.sh removes long/short data. At this time, authors use utils/fix_data_dir.sh to remove the entry of the removed utterance as follows:

# fix_data_dir.sh leaves only utts which exist in all files
_fix_opts=""
if [ -e "${data_feats}/org/${dset}/utt2sid" ]; then
_fix_opts="--utt_extra_files utt2sid "
fi
if [ -e "${data_feats}/org/${dset}/utt2lid" ]; then
_fix_opts="--utt_extra_files utt2lid "
fi
# shellcheck disable=SC2086
utils/fix_data_dir.sh ${_fix_opts} "${data_feats}/${dset}"

However, if there exist both utt2sid and utt2lid, _fix_opts are overwritten as "--utt_extra_files utt2lid " because authors used the assignment operator. So I tried to fix the above code as follows:

# fix_data_dir.sh leaves only utts which exist in all files
_fix_opts=""
if [ -e "${data_feats}/org/${dset}/utt2sid" ]; then
    _fix_opts+="--utt_extra_files utt2sid "
fi
if [ -e "${data_feats}/org/${dset}/utt2lid" ]; then
    _fix_opts+="--utt_extra_files utt2lid "
fi
# shellcheck disable=SC2086
utils/fix_data_dir.sh ${_fix_opts} "${data_feats}/${dset}"

However, after I tested it, it seems that utils/fix_data_dir.sh does not support multiple options. So, I suggest the following workaround in this PR becuase utils/fix_data_dir.sh is a part of kaldi:

# fix_data_dir.sh leaves only utts which exist in all files
if [ -e "${data_feats}/org/${dset}/utt2sid" ]; then
    utils/fix_data_dir.sh --utt_extra_files utt2sid "${data_feats}/${dset}"
fi
if [ -e "${data_feats}/org/${dset}/utt2lid" ]; then
    utils/fix_data_dir.sh --utt_extra_files utt2lid "${data_feats}/${dset}"
fi

@mergify mergify bot added the ESPnet2 label Aug 30, 2022
@sw005320 sw005320 added Bugfix TTS Text-to-speech labels Aug 30, 2022
@sw005320 sw005320 added this to the v.202209 milestone Aug 30, 2022
@codecov
Copy link

codecov bot commented Aug 30, 2022

Codecov Report

Merging #4609 (4cb824c) into master (f0edec2) will increase coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 4cb824c differs from pull request most recent head dfed926. Consider uploading reports for the commit dfed926 to get more accurate results

@@           Coverage Diff            @@
##           master    #4609    +/-   ##
========================================
  Coverage   83.06%   83.07%            
========================================
  Files         508      508            
  Lines       43646    43775   +129     
========================================
+ Hits        36253    36364   +111     
- Misses       7393     7411    +18     
Flag Coverage Δ
test_integration_espnet1 66.36% <ø> (ø)
test_integration_espnet2 49.53% <ø> (+<0.01%) ⬆️
test_python 70.61% <ø> (+<0.01%) ⬆️
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/enh/espnet_enh_s2t_model.py 81.72% <0.00%> (-0.64%) ⬇️
espnet2/tasks/abs_task.py 75.52% <0.00%> (+0.03%) ⬆️
espnet2/tasks/enh_s2t.py 96.66% <0.00%> (+0.05%) ⬆️
espnet2/bin/asr_inference.py 84.98% <0.00%> (+0.08%) ⬆️
espnet2/asr/espnet_model.py 81.38% <0.00%> (+0.27%) ⬆️
espnet2/enh/loss/wrappers/fixed_order.py 91.30% <0.00%> (+0.39%) ⬆️
espnet2/enh/espnet_model.py 86.56% <0.00%> (+1.08%) ⬆️
espnet2/asr/ctc.py 96.72% <0.00%> (+1.63%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@kan-bayashi kan-bayashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your report and fixing.
In your case, the fixing is not performed when there is no utt2lid or utt2sid.
So I changed the strategy. Could you check my suggestion?

egs2/TEMPLATE/tts1/tts.sh Show resolved Hide resolved
egs2/TEMPLATE/tts1/tts.sh Outdated Show resolved Hide resolved
egs2/TEMPLATE/tts1/tts.sh Outdated Show resolved Hide resolved
egs2/TEMPLATE/tts1/tts.sh Show resolved Hide resolved
Co-authored-by: Tomoki Hayashi <hayashi.tomoki@g.sp.m.is.nagoya-u.ac.jp>
@jonghwanhyeon
Copy link
Contributor Author

No problem! Thanks 😄

@kan-bayashi kan-bayashi merged commit 66edc59 into espnet:master Sep 5, 2022
@kan-bayashi
Copy link
Member

Thank you for your kind contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants