Support SOT training on LibriMix data. #4861

pengchengguo · 2023-01-10T10:47:21Z

Xuankai and I would like to include SOT training (https://arxiv.org/pdf/2003.12687.pdf) in ESPnet.

We modify the CSV files of the original LibriMix repo by adding random overlap offsets from 1-1.5s. This PR provides a unified simulation metric and a comparable benchmark for non-full-overlapped multi-speaker ASR.

Note that it is not ready for merging yet. We will update some results and pre-trained models later.

TODO:

Results of Transformer PIT w/ or w/o WavLM
Results of Transformer SOT w/ or w/o WavLM
Results of Conformer SOT w/ or w/o WavLM

simpleoier

Thanks @pengchengguo !

simpleoier · 2023-01-11T02:32:34Z

egs2/librimix/sot_asr1/conf/tuning/train_pit_asr_transformer_wavlm.yaml

@@ -0,0 +1,90 @@
+# network architecture


PIT model may use different data preparation with SOT. It may be a bit surprising to put this config, especially this recipe folder is under sot_asr1. I don't know. Maybe we can explicitly make some notes about this config, e.g. what data preparation, what purpose, what differences, and how to use are about it.

The motivation for including PIT is to compare PIT and SOT on the same dataset (overlapped speech with a certain time delay). It is ok for me to remove it and only keep the SOT file here~

simpleoier · 2023-01-11T02:32:57Z

egs2/librimix/sot_asr1/conf/tuning/train_pit_asr_transformer_wavlm.yaml

+frontend_conf:
+    frontend_conf:
+        upstream: wavlm_local
+        path_or_url: "/home/work_nfs6/pcguo/asr/librimix/hub/wavlm_large.pt"


This path is for your local experiments.

simpleoier · 2023-01-11T02:33:24Z

egs2/librimix/sot_asr1/conf/tuning/train_sot_asr_conformer_wavlm.yaml

+frontend_conf:
+    frontend_conf:
+        upstream: wavlm_local
+        path_or_url: "/home/work_nfs6/pcguo/asr/librimix/hub/wavlm_large.pt"


The path of the config is for your own.

mergify · 2023-01-17T16:23:06Z

This pull request is now in conflict :(

codecov · 2023-02-09T10:48:59Z

Codecov Report

Merging #4861 (32ad96c) into master (98823bd) will increase coverage by 0.00%.
The diff coverage is 87.50%.

@@           Coverage Diff           @@
##           master    #4861   +/-   ##
=======================================
  Coverage   75.86%   75.86%           
=======================================
  Files         615      615           
  Lines       54684    54689    +5     
=======================================
+ Hits        41487    41491    +4     
- Misses      13197    13198    +1

Flag	Coverage Δ
test_integration_espnet1	`66.29% <ø> (ø)`
test_integration_espnet2	`48.52% <75.00%> (+<0.01%)`	⬆️
test_python	`65.83% <50.00%> (-0.01%)`	⬇️
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/text/build_tokenizer.py	`77.14% <ø> (ø)`
espnet2/train/preprocessor.py	`29.37% <66.66%> (+0.18%)`	⬆️
espnet2/bin/tokenize_text.py	`85.96% <100.00%> (+0.12%)`	⬆️
espnet2/text/char_tokenizer.py	`83.33% <100.00%> (+0.40%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320

does it have to be sot_asr1?
since it just changes the config, I think we can move this to asr1 and change local/data.sh to local/data_sot.sh, and it is called from the original local/data.sh
If this makes conflicts with the existing data directory, your current idea is better.
it is better to add some descriptions to README.md, for example, why we need CSV files, what is the SOT text format, and how <sc> symbols are treated in the tokenizer through the --add_nonsplit_symbol option

sw005320 · 2023-02-09T19:15:02Z

egs2/TEMPLATE/asr1/asr.sh

@@ -766,6 +772,9 @@ if ! "${skip_data_prep}"; then

            _opts="--non_linguistic_symbols ${nlsyms_txt}"

+            if ${sot_asr} && [ "${token_type}" = char ]; then


It is better to briefly explain the SOT-based text format with an example here

only char case?
Then, again, it's better to explain it.

The main difference between sot_asr1 and asr1 is the data simulation process and the training data. If merging these two directories, we may need to introduce additional files (eg. data/{train_sot,dev_sot,test_sot}, run_sot.sh). I think it is ok to add some files but the results in README.md can not be compared with each other (due to the different training data, full-overlapped or partial-overlapped).

I will add more descriptions about SOT

"only char case?", in our experiments, SOT only converges well on char units, thus we only considered this case, I will check the compatibility with other units like BPE.

OK, thanks.

only char case?

When we prepare the text, <sc> is inserted between speakers, with space on the left and right.

For the word case, <sc> is automatically regarded as a single word unit. So no special process for it.

For the BPE case, we use _opts_spm+=" --user_defined_symbols=<sc>" to take care of it. It will not be mixed in other BPE subwords.

For the char case, we need to add this special arguments to let the text processing know this is a single unit. It wouldn't be split into <, s, c, >.

sw005320 · 2023-02-09T21:00:13Z

egs2/librimix/sot_asr1/local/metadta_Libri2Mix_offset/Storage_info.txt

@@ -0,0 +1,154 @@
+842M	Libri2Mix/wav16k/max/dev/s2


where is it used?

I will remove it.

sw005320 · 2023-02-09T21:02:03Z

egs2/librimix/sot_asr1/local/metadta_Libri2Mix_offset/libri2mix_dev-clean.csv

@@ -0,0 +1,3001 @@
+mixture_ID,source_1_path,source_1_gain,source_2_path,source_2_gain,noise_path,noise_gain,offset


can you explain why this is needed?
to avoid the full overlap in the original librimix?

Yes, the updated csv file includes an additional column of "offset", which is used as the time delay when simulating overlapped data. Here we randomly generate offsets from 1s-1.5s.
The motivation for fixing the offset is to allow people to do a fair comparison on the same simulated data among different models.

mergify · 2023-03-17T09:41:08Z

This pull request is now in conflict :(

sw005320 · 2023-03-17T12:54:16Z

@pengchengguo, can we try to finish this PR?

pengchengguo · 2023-03-17T13:00:40Z

Sorry for the delay, will finish it this weekend.

sw005320 · 2023-03-21T12:40:58Z

There is a conflict due to the recent refactoring of asr.sh
Can you fix this?

pengchengguo · 2023-03-21T12:42:29Z

There is a conflict due to the recent refactoring of asr.sh Can you fix this?

Sure, I am checking it now.

for more information, see https://pre-commit.ci

sw005320 · 2023-03-29T11:35:03Z

egs2/librimix/sot_asr1/conf/tuning/decode_pit.yaml

where was it used?

sw005320 · 2023-03-29T12:02:35Z

LGTM, thanks a lot!

pengchengguo · 2023-03-29T13:11:18Z

Sorry, it seems the change (remove useless decode_pit.yaml file) was not submitted successfully, I may need to open another PR to remove this file or we can leave it to a later PR, like PR for updating results.

sw005320 · 2023-03-29T13:27:27Z

OK!
I think this is my mistake.
Sorry...

simpleoier and others added 2 commits December 20, 2022 22:26

support SOT with special <sc> symbol & add librimix SOT recipe (partial)

9651c0d

Support SOT training on Librimix data.

ed32e09

mergify bot added the ESPnet2 label Jan 10, 2023

simpleoier reviewed Jan 11, 2023

View reviewed changes

Fix CI test errors.

b7f0503

sw005320 added New Features ASR Automatic speech recogntion labels Jan 11, 2023

sw005320 added this to the v.202301 milestone Jan 11, 2023

mergify bot added the conflicts label Jan 17, 2023

Merge from master branch.

d79a461

mergify bot removed the conflicts label Feb 1, 2023

kan-bayashi modified the milestones: v.202301, v.202303 Feb 1, 2023

pengchengguo added 4 commits February 1, 2023 19:20

Add results.

fe82477

Upload pre-trained models to huggingface.

88b1f2d

Fix CI test error.

f7a1a3c

Fix ci test errors.

7c01ea4

sw005320 reviewed Feb 9, 2023

View reviewed changes

mergify bot added the conflicts label Mar 17, 2023

Detail description for SOT training.

1e93cf2

mergify bot added the README label Mar 21, 2023

pengchengguo and others added 2 commits March 21, 2023 21:16

Fix conflicts.

3695924

[pre-commit.ci] auto fixes from pre-commit.com hooks

df503d3

for more information, see https://pre-commit.ci

mergify bot removed the conflicts label Mar 21, 2023

Merge branch 'master' into sot

32ad96c

simpleoier approved these changes Mar 29, 2023

View reviewed changes

sw005320 reviewed Mar 29, 2023

View reviewed changes

sw005320 approved these changes Mar 29, 2023

View reviewed changes

sw005320 added the auto-merge Enable auto-merge label Mar 29, 2023

mergify bot merged commit 275637a into espnet:master Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SOT training on LibriMix data. #4861

Support SOT training on LibriMix data. #4861

pengchengguo commented Jan 10, 2023 •

edited

Loading

simpleoier left a comment

simpleoier Jan 11, 2023

pengchengguo Feb 1, 2023

simpleoier Jan 11, 2023

pengchengguo Feb 1, 2023

simpleoier Jan 11, 2023

pengchengguo Feb 1, 2023

mergify bot commented Jan 17, 2023

codecov bot commented Feb 9, 2023 •

edited

Loading

sw005320 left a comment

sw005320 Feb 9, 2023

sw005320 Feb 9, 2023

pengchengguo Feb 17, 2023

sw005320 Feb 17, 2023

simpleoier Feb 17, 2023 •

edited

Loading

sw005320 Feb 9, 2023

pengchengguo Feb 17, 2023

sw005320 Feb 9, 2023

pengchengguo Feb 17, 2023

mergify bot commented Mar 17, 2023

sw005320 commented Mar 17, 2023

pengchengguo commented Mar 17, 2023

sw005320 commented Mar 21, 2023

pengchengguo commented Mar 21, 2023

sw005320 Mar 29, 2023

pengchengguo Mar 29, 2023

sw005320 commented Mar 29, 2023

pengchengguo commented Mar 29, 2023

sw005320 commented Mar 29, 2023

		@@ -766,6 +772,9 @@ if ! "${skip_data_prep}"; then

		_opts="--non_linguistic_symbols ${nlsyms_txt}"

		if ${sot_asr} && [ "${token_type}" = char ]; then

		@@ -0,0 +1,3001 @@
		mixture_ID,source_1_path,source_1_gain,source_2_path,source_2_gain,noise_path,noise_gain,offset

Support SOT training on LibriMix data. #4861

Support SOT training on LibriMix data. #4861

Conversation

pengchengguo commented Jan 10, 2023 • edited Loading

simpleoier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Jan 17, 2023

codecov bot commented Feb 9, 2023 • edited Loading

Codecov Report

sw005320 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simpleoier Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Mar 17, 2023

sw005320 commented Mar 17, 2023

pengchengguo commented Mar 17, 2023

sw005320 commented Mar 21, 2023

pengchengguo commented Mar 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sw005320 commented Mar 29, 2023

pengchengguo commented Mar 29, 2023

sw005320 commented Mar 29, 2023

pengchengguo commented Jan 10, 2023 •

edited

Loading

codecov bot commented Feb 9, 2023 •

edited

Loading

simpleoier Feb 17, 2023 •

edited

Loading