-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) #5228
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,7 +67,7 @@ nbpe=500 | |
asr_max_epochs=8 | ||
# put popcornell/chime7_task1_asr1_baseline if you want to test with pretrained model | ||
use_pretrained= | ||
decode_only=0 | ||
decode_only="" | ||
diar_score=0 | ||
|
||
. ./path.sh | ||
|
@@ -80,9 +80,15 @@ asr_batch_size=$(calc_int 128*$ngpu) # reduce 128 bsz if you get OOMs errors | |
asr_max_lr=$(calc_float $ngpu/10000.0) | ||
asr_warmup=$(calc_int 40000.0/$ngpu) | ||
|
||
if [ $decode_only -eq 1 ]; then | ||
if [ $decode_only == "dev" ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will you add a check if
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is nice but will require too much change in the codebase at this stage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, indeed. We may not need to update this at the moment. |
||
# apply gss only on dev | ||
gss_dsets="chime6_dev,dipco_dev,mixer6_dev" | ||
asr_tt_set="kaldi/chime6/dev/gss kaldi/dipco/dev/gss/ kaldi/mixer6/dev/gss/" | ||
elif | ||
[ $decode_only == "eval" ]; then | ||
# apply gss only on dev | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment should be updated as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
gss_dsets="chime6_eval,dipco_eval,mixer6_eval" | ||
asr_tt_set="kaldi/chime6/eval/gss kaldi/dipco/eval/gss/ kaldi/mixer6/eval/gss/" | ||
fi | ||
|
||
if [ ${stage} -le 0 ] && [ $stop_stage -ge 0 ]; then | ||
|
@@ -100,16 +106,24 @@ fi | |
if [ ${stage} -le 1 ] && [ $stop_stage -ge 1 ]; then | ||
# parse all datasets to lhotse | ||
for dset in chime6 dipco mixer6; do | ||
for dset_part in train dev; do | ||
for dset_part in "train" "dev" "eval"; do | ||
|
||
|
||
if [ $dset == dipco ] && [ $dset_part == train ]; then | ||
continue # dipco has no train set | ||
fi | ||
|
||
if [ $decode_only == 1 ] && [ $dset_part == train ]; then | ||
if [ -n "$decode_only" ] && [ $dset_part == train ]; then | ||
continue | ||
fi | ||
|
||
if [ ! -d $chime7_root/$dset/audio/$dset_part ]; then | ||
log "Skipping $dset $dset_part because it does not exist on disk. This is | ||
fine if you don't have evaluation set yet." | ||
continue | ||
fi | ||
|
||
if [ $use_chime6_falign == 1 ] && [ $dset == chime6 ]; then | ||
if [ $use_chime6_falign == 1 ] && [ $dset == chime6 ] && [ $dset_part != train ]; then | ||
if ! [ -d ./CHiME7_DASR_falign ]; then | ||
log "Getting forced alignment annotation for CHiME-6 Scenario" | ||
git clone https://github.com/chimechallenge/CHiME7_DASR_falign | ||
|
@@ -153,6 +167,7 @@ if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then | |
fi | ||
|
||
log "Running Guided Source Separation for ${dset_name}/${dset_part}, results will be in ${gss_dump_root}/${dset_name}/${dset_part}" | ||
# shellcheck disable=SC2039 | ||
local/run_gss.sh --manifests-dir $manifests_root --dset-name $dset_name \ | ||
--dset-part $dset_part \ | ||
--exp-dir $gss_dump_root \ | ||
|
@@ -220,6 +235,11 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then | |
--lm_train_text "data/${asr_train_set}/text" ${pretrained_affix} | ||
fi | ||
|
||
if [ ${decode_only} == "eval" ]; then | ||
log "Scoring not available for eval set till the end of the challenge." | ||
exit | ||
fi | ||
|
||
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then | ||
# final scoring | ||
log "Scoring ASR predictions for CHiME-7 DASR challenge." | ||
|
@@ -256,7 +276,7 @@ if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then | |
# the content of this output folder is what you should send for evaluation to the | ||
# organizers. | ||
done | ||
split=dev | ||
split=dev # participants cannot evaluate eval | ||
LOG_OUT=${asr_exp}/${inference_tag}/scoring/scoring.log | ||
python local/da_wer_scoring.py -s ${asr_exp}/${inference_tag}/chime7dasr_hyp/$split \ | ||
-r $chime7_root -p $split -o ${asr_exp}/${inference_tag}/scoring -d $diar_score 2>&1 | tee $LOG_OUT | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,12 +132,12 @@ ln -s ../asr1/chime7_task1 . | |
|
||
#### Main Track with Pyannote-based Diarization System | ||
To reproduce our results which use our pre-trained ASR model [https://huggingface.co/popcornell/chime7_task1_asr1_baseline](https://huggingface.co/popcornell/chime7_task1_asr1_baseline) and pre-trained | ||
[pyannote segmentation model](https://huggingface.co/popcornell/pyannote-segmentation-chime6-mixer6) | ||
[pyannote segmentation model](https://huggingface.co/popcornell/pyannote-segmentation-chime6-mixer6), on the dev set, | ||
you can run: | ||
```bash | ||
./run.sh --chime7-root YOUR_PATH_TO_CHiME7_ROOT --stage 2 --ngpu YOUR_NUMBER_OF_GPUs \ | ||
--use-pretrained popcornell/chime7_task1_asr1_baseline \ | ||
--decode-only 1 --gss-max-batch-dur 30-360-DEPENDING_ON_GPU_MEM \ | ||
--decode-only dev --gss-max-batch-dur 30-360-DEPENDING_ON_GPU_MEM \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. skip_train and test_sets instead of decode-only. |
||
--pyan-use-pretrained popcornell/pyannote-segmentation-chime6-mixer6 | ||
``` | ||
You can also play with diarization hyperparameters such as: | ||
|
@@ -147,16 +147,31 @@ You can also play with diarization hyperparameters such as: | |
|
||
as said merge-closer can have quite an impact on the final WER. | ||
|
||
**NOTE** | ||
We found the diarization baseline to be highly sensitive to the `diar-merge-closer` parameter and | ||
to the CUDA/CUDNN version used. <br> | ||
For example, the best results on our side were obtained with `conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia`. | ||
This however was by using Ampere devices (A100) on our side, and the results might | ||
change for you if your machine is different. <br> | ||
See [this Pyannote issue](https://github.com/pyannote/pyannote-audio/issues/1370) related to replicability of the diarization baseline, where we have | ||
reported the full specs of our system and the conda environment used. <br> | ||
|
||
To enhance replicability, we provide in this [repository](https://github.com/popcornell/CHiME7DASRDiarizationBaselineJSONs) our pre-computed outputs | ||
for the diarization baseline. | ||
You can use them in this recipe by passing `--download-baseline-diarization 1 ` this | ||
will skip your "local" diarization baseline and instead download directly our predictions. | ||
|
||
--- | ||
If you want to run this recipe from scratch, **including dataset generation** and pyannote segmentation | ||
model fine-tuning you can run it from stage 0: | ||
model fine-tuning you can run it from stage 0 (use `--decode-only eval` for evaluation set): | ||
```bash | ||
./run.sh --chime6-root YOUR_PATH_TO_CHiME6 --dipco-root PATH_WHERE_DOWNLOAD_DIPCO \ | ||
--mixer6-root YOUR_PATH_TO_MIXER6 --stage 0 --ngpu YOUR_NUMBER_OF_GPUs \ | ||
--use-pretrained popcornell/chime7_task1_asr1_baseline \ | ||
--decode-only 1 --gss-max-batch-dur 30-360-DEPENDING_ON_GPU_MEM \ | ||
--decode-only dev --gss-max-batch-dur 30-360-DEPENDING_ON_GPU_MEM \ | ||
--pyan-use-pretrained popcornell/pyannote-segmentation-chime6-mixer6 | ||
``` | ||
|
||
--- | ||
**If you want only to generate data you can run only stage 0.** | ||
```bash | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use ${skip_train} and ${test_sets} instead of decode_only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as before, we cannot change extensively the recipe at this point. I added the check for the argument.
Note that this will also require to change all two README.md and notify participants.