Fixes + Channel Selection for CHiME-7 Task #4934

popcornell · 2023-02-11T00:54:05Z

No description provided.

popcornell · 2023-02-11T00:55:20Z

egs2/chime7_task1/asr1/local/gss_micrank.py

+from torch.utils.data import DataLoader, Dataset
+
+
+class EnvelopeVariance(torch.nn.Module):


Channel selection is based on Envelope variance right now.
It is not guarantee it will work because of overlapped speech

popcornell · 2023-02-11T00:55:50Z

egs2/chime7_task1/asr1/local/install_dependencies.sh

 fi

-sox_conda=`command -v ../../../tools/venv/bin/sox 2>/dev/null`
+sox_conda=`command -v $(dirname $(which python))/sox 2>/dev/null`


hopefully this fixes the sox issue.

FYI. Conda setups some useful shell environment variables: CONDA_PREFIX, CONDA_EXE, etc.

If sox was installed by sox, the path should be ${CONDA_PREFIX}/bin/sox

Many thanks I was not aware of CONDA_PREFIX. Seems much better to use that, it is more clean

Updated. Also updated [local/generate_chime6_data.sh] with same variable (https://github.com/espnet/espnet/pull/4934/files#diff-83abe1d71c47a59774fbbdc7721c39519004e681d12764a55c81c66f9ffaceae)

@kamo-naoyuki I followed your suggestion and added a script + JSON file to check the MD5 checksum for each file https://github.com/espnet/espnet/blob/cfbb957d9c71c5c7aed27a1d4b2b85b62721381a/egs2/chime7_task1/asr1/local/check_data_gen.py but I had also to add a .json file to this recipe. Is it ok ?

codecov · 2023-02-12T00:19:11Z

Codecov Report

Merging #4934 (5179f7a) into master (34d6117) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4934   +/-   ##
=======================================
  Coverage   76.56%   76.56%           
=======================================
  Files         603      603           
  Lines       53756    53756           
=======================================
  Hits        41158    41158           
  Misses      12598    12598

Flag	Coverage Δ
test_integration_espnet1	`66.33% <ø> (ø)`
test_integration_espnet2	`47.58% <ø> (ø)`
test_python	`66.44% <ø> (ø)`
test_utils	`23.35% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

popcornell · 2023-02-13T17:38:35Z

This is ready for review.
Automatic Channel Selection now improves further the results (I keep top 80% channels on each dataset):
See the README too https://github.com/espnet/espnet/blob/cfbb957d9c71c5c7aed27a1d4b2b85b62721381a/egs2/chime7_task1/asr1/README.md

Dataset	split	front-end	SA-WER (%)	macro SA-WER (%)
CHiME-6	dev	GSS (EV top 80%)	34.5	30.8
DiPCo		GSS (EV top 80%)	36.8
Mixer-6		GSS (EV top 80%)	21.2

vs (classic way) using all outer mics on CHiME-6

Dataset	split	front-end	SA-WER (%)	macro SA-WER (%)
CHiME-6	dev	GSS (all outer)	35.5	31.8
DiPCo		GSS (all outer)	39.3
Mixer-6		GSS (all)	20.6

popcornell · 2023-02-13T18:09:11Z

egs2/librispeech/ssl1/local/hubert_feature_loader.py

@@ -139,12 +139,9 @@ def get_feats(self, data, ref_len=None):
            x = torch.from_numpy(x).float().to(self.device)
            x = x.view(1, -1)

-            feat = self.model.wav2vec2.extract_features(


I had to apply black here otherwise tests were failing.

popcornell · 2023-02-13T18:09:20Z

espnet/nets/chainer_backend/transformer/mask.py

@@ -12,6 +12,8 @@ def make_history_mask(xp, block):
    """
    batch, length = block.shape
    arange = xp.arange(length)
-    history_mask = (arange[None] <= arange[:, None])[None,]


popcornell · 2023-02-13T21:00:11Z

Some tests are failing because of some broken dowload.

popcornell · 2023-02-14T13:38:07Z

@sw005320 @simpleoier can we merge this PR ?

sw005320 · 2023-02-14T13:55:09Z

I quickly scan the changes, and it looks good to me.
After the CI check, I'll merge it.
I'll try to find time to check more details later.

popcornell · 2023-02-14T14:00:25Z

If somebody can test it, it would be great.
I did it on my side, but there are lot of stuff going on and some issues could be encountered for sure.
Thanks to @kamo-naoyuki help and @YoshikiMas for testing.

kamo-naoyuki · 2023-02-14T22:00:24Z

I checked local/check_data_gen.py, but I got

MD5 Checksum for mixer6/transcriptions/train_call/20090717_104045_LDC_120312.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
MD5 Checksum for mixer6/transcriptions/train_call/20090805_100942_LDC_120221.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
MD5 Checksum for mixer6/transcriptions/train_call/20090810_160537_LDC_120235.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
...

(Maybe all json files under train_call are failed) I think some problems still exist in this recipe.

kamo-naoyuki · 2023-02-14T22:05:55Z

For quick check, I'll paste the head of my json file here: mixer6/transcriptions/train_call/20090921_130118_LDC_120501.json

[
    {
        "start_time": "1956.580",
        "end_time": "1957.290",
        "words": "hi",
        "speaker": "120501"
    },
    {
        "start_time": "1957.720",
        "end_time": "1959.310",
        "words": "i can\u2019t really hear",
        "speaker": "120501"
    },
    {
        "start_time": "1960.420",
        "end_time": "1962.030",
        "words": "lena",
        "speaker": "120501"
    },

I'm suspicious about the \u2019. It is correct?

popcornell · 2023-02-14T22:09:18Z

For sure, @kamo-naoyuki can you give me your path to train_call relative to your root folder as provided by LDC ?

\u2019 is correct, if it appears in the original data as provided by LDC. It should however disappear after text normalization as this is applied

espnet/egs2/chime7_task1/asr1/local/gen_task1_data.py

Line 39 in 5d4615f

jiwer_chime6_scoring = jiwer.Compose(

Do you still have after running the data generation step ?
EDIT: it should disappear only from the transcription scoring. Mine is

[
    {
        "start_time": "1956.580",
        "end_time": "1957.290",
        "words": "hi",
        "speaker": "120501"
    },
    {
        "start_time": "1957.720",
        "end_time": "1959.310",
        "words": "i can\u2019t really hear",
        "speaker": "120501"
    },
    {
        "start_time": "1960.420",
        "end_time": "1962.030",
        "words": "lena",
        "speaker": "120501"
    },

kamo-naoyuki · 2023-02-14T22:16:16Z

For sure, @kamo-naoyuki can you give me your path to train_call relative to your root folder as provided by LDC ?

splits/train_call

popcornell · 2023-02-14T22:18:31Z

I have the same now. But this does not really explain why the MD5 checksum is not the same.
There must be some difference in the JSON at this point. Can you send me it in Slack, so I can compare ?

popcornell · 2023-02-14T23:09:30Z

UPDATE: the annotation I have differs from one LDC gave to participants (in a non significant way but it changes the hash).
I will recompute the hash with LDC annotation.
That is what I can do.
Also the check_data script should better return an error rather than printing.

popcornell · 2023-02-14T23:14:14Z

@kamo-naoyuki Did the check fail for other stuff too ?

kamo-naoyuki · 2023-02-19T23:53:34Z

egs2/chime7_task1/asr1/README.md

-[gpu_gss]:
-[gss]: 
-
+We would like to thank Dr. Naoyuki Kamo for his precious help. 


Thank you for mention me, but sorry, I don't have Ph.D. :->

Oh I am sorry. I can remove it. Honestly you deserve an honorary one ;)

popcornell added 6 commits February 10, 2023 13:18

addressing Taejin pointed out issues

69fe371

addressing Taejin pointed out issues

5bf3c5a

fixed md5sum check on original chime6 script

644e58e

Merge branch 'master' of https://github.com/espnet/espnet

dc770b4

adding channel selection

445e3ae

revert

b217ed7

mergify bot added ESPnet1 ESPnet2 labels Feb 11, 2023

popcornell commented Feb 11, 2023

View reviewed changes

popcornell added 12 commits February 11, 2023 02:00

revert

2518f81

revert

721a64a

added skip stages to asr dprep

c4a58b2

added flag to generate evaluation

f57c1d5

addes contain function to data.sh

447bd9d

minor changes to run.sh

1f106f7

with pretrained

0d84fcb

data.sh, skipping for decoding only

263c36c

soundfile much faster than torchaudio

92dedbd

revised channel selection

133dcd6

applied linters

a594319

applied linters

d888a38

popcornell mentioned this pull request Feb 12, 2023

CHiME-7 Task1 recipe #4894

Merged

popcornell added 6 commits February 12, 2023 20:47

added jiwer and conda prefix

e8dc4d3

added dr kamo suggestion

dd91c90

changed stage

17efdb2

better default

a531c03

readme changed instructions

a412cc4

gss2lhotse changed

d99720d

popcornell added 6 commits February 13, 2023 13:28

data prep is needed

31292ce

addressed LDC path issues with train calls and mixer6

d73ddda

changed error display

930d388

some comments changed

ebe8db9

default is 80% mics channel selection

270fad9

Merge branch 'chime7task1' of https://github.com/popcornell/espnet

cfbb957

popcornell marked this pull request as ready for review February 13, 2023 17:36

applied black

c1abe1b

popcornell commented Feb 13, 2023

View reviewed changes

applied black

f28cca8

popcornell added 4 commits February 13, 2023 22:34

added registration link to README.md

e87df34

added details about evaluation script

a49870a

added details about non determinism in GSS inference

00308e1

Merge branch 'master' into chime7task1

5179f7a

sw005320 added the auto-merge Enable auto-merge label Feb 14, 2023

mergify bot merged commit 5d4615f into espnet:master Feb 14, 2023

kamo-naoyuki reviewed Feb 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes + Channel Selection for CHiME-7 Task #4934

Fixes + Channel Selection for CHiME-7 Task #4934

popcornell commented Feb 11, 2023

popcornell Feb 11, 2023

popcornell Feb 11, 2023

kamo-naoyuki Feb 12, 2023

popcornell Feb 12, 2023

popcornell Feb 12, 2023

popcornell Feb 13, 2023

codecov bot commented Feb 12, 2023 •

edited

popcornell commented Feb 13, 2023 •

edited

popcornell Feb 13, 2023

popcornell Feb 13, 2023

popcornell commented Feb 13, 2023

popcornell commented Feb 14, 2023

sw005320 commented Feb 14, 2023

popcornell commented Feb 14, 2023

kamo-naoyuki commented Feb 14, 2023

kamo-naoyuki commented Feb 14, 2023

popcornell commented Feb 14, 2023 •

edited

kamo-naoyuki commented Feb 14, 2023

popcornell commented Feb 14, 2023

popcornell commented Feb 14, 2023

popcornell commented Feb 14, 2023

kamo-naoyuki Feb 19, 2023

popcornell Feb 20, 2023

		from torch.utils.data import DataLoader, Dataset


		class EnvelopeVariance(torch.nn.Module):

Fixes + Channel Selection for CHiME-7 Task #4934

Fixes + Channel Selection for CHiME-7 Task #4934

Conversation

popcornell commented Feb 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 12, 2023 • edited

Codecov Report

popcornell commented Feb 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

popcornell commented Feb 13, 2023

popcornell commented Feb 14, 2023

sw005320 commented Feb 14, 2023

popcornell commented Feb 14, 2023

kamo-naoyuki commented Feb 14, 2023

kamo-naoyuki commented Feb 14, 2023

popcornell commented Feb 14, 2023 • edited

kamo-naoyuki commented Feb 14, 2023

popcornell commented Feb 14, 2023

popcornell commented Feb 14, 2023

popcornell commented Feb 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 12, 2023 •

edited

popcornell commented Feb 13, 2023 •

edited

popcornell commented Feb 14, 2023 •

edited