Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHiME-7 DASR fixes from participants feedback #4999

Merged
merged 123 commits into from Mar 23, 2023

Conversation

popcornell
Copy link
Contributor

@popcornell popcornell commented Mar 12, 2023

These fixes address mainly @kamo-naoyuki comments.

I have added mixer6 dev ihm.
Re-formatted the kaldi utterance-id so that it is easier to debug utterances e.g. all follow {speaker}-{record_id}-{start}_{end}-{mic-channel} now as suggested.
Other small fixes too.

Added evaluation script.

I also added memory figures from @boeddeker in the README

@codecov
Copy link

codecov bot commented Mar 13, 2023

Codecov Report

Merging #4999 (656038f) into master (aeddd8e) will decrease coverage by 0.39%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4999      +/-   ##
==========================================
- Coverage   77.03%   76.65%   -0.39%     
==========================================
  Files         606      606              
  Lines       53755    53721      -34     
==========================================
- Hits        41409    41178     -231     
- Misses      12346    12543     +197     
Flag Coverage Δ
test_integration_espnet1 66.29% <ø> (ø)
test_integration_espnet2 47.96% <ø> (ø)
test_python 66.39% <ø> (-0.48%) ⬇️
test_utils 23.01% <ø> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 21 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mergify
Copy link
Contributor

mergify bot commented Mar 15, 2023

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Mar 15, 2023
either the "style" of the baseline GSS ones or the ones belonging to close-talk mics.

To evaluate the new enhanced data, e.g. `data/chime6/dev/my_enhanced`, you can `run ./asr.sh --feats_type raw_copy --skip_train true --test_sets chime6/dev/enhanced`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wrote them really roughly.

  • Actually we can't run asr.sh directly, because it requires train_set etc.
  • I think it's better to create a wrapping script for this purpose e.g. name as eval.sh to run asr.sh with required options.
  • It's enough to select one from my 1 and 2 ideas.
  • It's better to write an example to run the eval.sh

e.g.

utils/copy_data_dir.sh  data/kaldi/chime6/dev/gss data/kaldi/chime6/dev/my_enhanced
# rewrite wav.scp
./eval.sh --data_sets "kaldi/chime6/dev/my_enhanced"

I think these should be done in the next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me delete the last bit then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think diarization part has a priority, so please just keep this comment in mind for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that
run.sh --stage 3 --asr-tt-set kaldi/chime6/dev/gss --decode-only 1 --use-pretrained popcornell/chime7_task1_asr1_baseline --asr-dprep-stage 4

could do it actually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so, it's also okay, but it seems a little bit complicated,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that an option for --feats_type raw_copy is not prepared in run.sh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is copying the dumpdir.

utils/copy_data_dir.sh  dump/kaldi/chime6/dev/gss dump/kaldi/chime6/dev/my_enhanced
echo raw > dump/kaldi/chime6/dev/my_enhanced/feats_type
# Rewrite dump/kaldi/chime6/dev/my_enhanced/wav.scp
# --skip_data_prep true --skip_train true --test_sets kaldi/chime6/dev/my_enhanced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So assuming that there is no ffmpeg or sox piping I could use raw_copy to avoid the copying right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So assuming that there is no ffmpeg or sox piping I could use raw_copy to avoid the copying right ?

Yes, that's right.

@popcornell
Copy link
Contributor Author

Can we merge ?

@kamo-naoyuki
Copy link
Collaborator

kamo-naoyuki commented Mar 22, 2023

By the way, could you also put the number of utterances in the README for each set at the final state? Especially, I'd like to know the number of training set since this recipe is complicated and I wonder I could generate all correctly.

@kamo-naoyuki
Copy link
Collaborator

kamo-naoyuki commented Mar 22, 2023

Please let me know if you think ready, then I'll merge quickly

@popcornell
Copy link
Contributor Author

I have added the details of the number of utterances for each dataset.
I think you can merge

@kamo-naoyuki
Copy link
Collaborator

Thanks many!

@kamo-naoyuki kamo-naoyuki added the auto-merge Enable auto-merge label Mar 22, 2023
@popcornell
Copy link
Contributor Author

Well, thank you actually !

@popcornell
Copy link
Contributor Author

Tests fail because some link is not reachable

@mergify mergify bot merged commit b579d70 into espnet:master Mar 23, 2023
24 checks passed
for dset in chime6 dipco; do
lhotse kaldi export -p ${manifests_root}/${dset}/${dset_part}/${dset}-${mic}_recordings_${dset_part}.jsonl.gz ${manifests_root}/${dset}/${dset_part}/${dset}-${mic}_supervisions_${dset_part}.jsonl.gz data/kaldi/${dset}/${dset_part}/${mic}
for dset in chime6 dipco mixer6; do
lhotse kaldi export ${manifests_root}/${dset}/${dset_part}/${dset}-${mic}_recordings_${dset_part}.jsonl.gz ${manifests_root}/${dset}/${dset_part}/${dset}-${mic}_supervisions_${dset_part}.jsonl.gz data/kaldi/${dset}/${dset_part}/${mic}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add the --prefix-spk-id flag in lhotse kaldi export otherwise the created data dir was not sorted by spk id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion auto-merge Enable auto-merge ESPnet2 Installation README Recipe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants