-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs #5183
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5183 +/- ##
==========================================
+ Coverage 65.94% 74.54% +8.60%
==========================================
Files 640 640
Lines 57267 57267
==========================================
+ Hits 37762 42688 +4926
+ Misses 19505 14579 -4926
Flags with carried forward coverage won't be shown. Click here to find out more. see 107 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I think we can ignore this issue. Because the default value of But probably some descriptions should be added to mention when we should set this argument to 1. |
…1_diar # Conflicts: # egs2/chime7_task1/asr1/run.sh
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
FYI changed the scope of this PR to be more broad.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments in the code, which may cause some issues. But I'm not 100% sure myself.
egs2/chime7_task1/diar_asr1/run.sh
Outdated
@@ -51,7 +52,7 @@ gss_max_batch_dur=90 | |||
|
|||
# ASR config | |||
use_pretrained= | |||
decode_only=1 | |||
decode_only= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it is empty, I'm afraid line 182 will raise an error.
--decode_only $decode_only --gss-max-batch-dur $gss_max_batch_dur \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add quotes, good catch
egs2/chime7_task1/diar_asr1/run.sh
Outdated
@@ -166,5 +181,5 @@ if [ ${stage} -le 4 ] && [ $stop_stage -ge 4 ]; then | |||
--use-pretrained $use_pretrained \ | |||
--decode_only $decode_only --gss-max-batch-dur $gss_max_batch_dur \ | |||
--gss-iterations 5 \ | |||
--diar-score 1 | |||
--diar-score 1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base on my previous experience, this extra \
will raise an error. You need an extra blank line.
egs2/chime7_task1/diar_asr1/run.sh
Outdated
log "Using organizer-provided JSON manifests from the baseline diarization system." | ||
#git clone | ||
#mv | ||
stage=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to double check if we can skip stage 3 if we have this. How can we download baseline diarization results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right I changed it.
Now I also added the downloading part
@@ -147,14 +147,17 @@ You can also play with diarization hyperparameters such as: | |||
|
|||
as said merge-closer can have quite an impact on the final WER. | |||
|
|||
**NOTE** | |||
We found the diarization baseline to be highly sensitive to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete note?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed
egs2/chime7_task1/asr1/run.sh
Outdated
@@ -67,7 +67,7 @@ nbpe=500 | |||
asr_max_epochs=8 | |||
# put popcornell/chime7_task1_asr1_baseline if you want to test with pretrained model | |||
use_pretrained= | |||
decode_only=0 | |||
decode_only= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that when it is empty, line 206 data_opts+=" --decode-only $decode_only"
will raise some error, e.g. Error: No positional arguments are required
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it is an empty string.
I think this PR is ready. |
FYI: there is still the numpy problem:
|
@simpleoier Can we merge this ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
But there are many changes in non-relevant files. Probably due to black version. Also, @popcornell, there is one failure test: check_kaldi_symlinks
, can you solve it?
@sw005320 Can we merge this?
I can rechange them back. |
@popcornell my black is
|
I guess then it is fine ?! Like the black version is not enforced explicitly. |
The |
Yes, this part is not an issue. If they are necessary changes, it would be better to move this file to the local (recipe-specific) directory. The issue is that they look very different due to the different formats, and I could not figure out the above changes. |
for more information, see https://pre-commit.ci
Chime7task1 diar eval
Made a fresh PR because it is faster. Moved to #5228 |
@Emrys365 does this fix it ?