Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes + Channel Selection for CHiME-7 Task #4934

Merged
merged 39 commits into from Feb 14, 2023

Conversation

popcornell
Copy link
Contributor

No description provided.

from torch.utils.data import DataLoader, Dataset


class EnvelopeVariance(torch.nn.Module):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel selection is based on Envelope variance right now.
It is not guarantee it will work because of overlapped speech

fi

sox_conda=`command -v ../../../tools/venv/bin/sox 2>/dev/null`
sox_conda=`command -v $(dirname $(which python))/sox 2>/dev/null`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully this fixes the sox issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI. Conda setups some useful shell environment variables: CONDA_PREFIX, CONDA_EXE, etc.

If sox was installed by sox, the path should be ${CONDA_PREFIX}/bin/sox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks I was not aware of CONDA_PREFIX. Seems much better to use that, it is more clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamo-naoyuki I followed your suggestion and added a script + JSON file to check the MD5 checksum for each file https://github.com/espnet/espnet/blob/cfbb957d9c71c5c7aed27a1d4b2b85b62721381a/egs2/chime7_task1/asr1/local/check_data_gen.py but I had also to add a .json file to this recipe. Is it ok ?

@codecov
Copy link

codecov bot commented Feb 12, 2023

Codecov Report

Merging #4934 (5179f7a) into master (34d6117) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4934   +/-   ##
=======================================
  Coverage   76.56%   76.56%           
=======================================
  Files         603      603           
  Lines       53756    53756           
=======================================
  Hits        41158    41158           
  Misses      12598    12598           
Flag Coverage Δ
test_integration_espnet1 66.33% <ø> (ø)
test_integration_espnet2 47.58% <ø> (ø)
test_python 66.44% <ø> (ø)
test_utils 23.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@popcornell popcornell mentioned this pull request Feb 12, 2023
@popcornell popcornell marked this pull request as ready for review February 13, 2023 17:36
@popcornell
Copy link
Contributor Author

popcornell commented Feb 13, 2023

This is ready for review.
Automatic Channel Selection now improves further the results (I keep top 80% channels on each dataset):
See the README too https://github.com/espnet/espnet/blob/cfbb957d9c71c5c7aed27a1d4b2b85b62721381a/egs2/chime7_task1/asr1/README.md

Dataset split front-end SA-WER (%) macro SA-WER (%)
CHiME-6 dev GSS (EV top 80%) 34.5 30.8
DiPCo GSS (EV top 80%) 36.8
Mixer-6 GSS (EV top 80%) 21.2

vs (classic way) using all outer mics on CHiME-6

Dataset split front-end SA-WER (%) macro SA-WER (%)
CHiME-6 dev GSS (all outer) 35.5 31.8
DiPCo GSS (all outer) 39.3
Mixer-6 GSS (all) 20.6

@@ -139,12 +139,9 @@ def get_feats(self, data, ref_len=None):
x = torch.from_numpy(x).float().to(self.device)
x = x.view(1, -1)

feat = self.model.wav2vec2.extract_features(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to apply black here otherwise tests were failing.

@@ -12,6 +12,8 @@ def make_history_mask(xp, block):
"""
batch, length = block.shape
arange = xp.arange(length)
history_mask = (arange[None] <= arange[:, None])[None,]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@popcornell
Copy link
Contributor Author

Some tests are failing because of some broken dowload.

@popcornell
Copy link
Contributor Author

@sw005320 @simpleoier can we merge this PR ?

@sw005320 sw005320 added the auto-merge Enable auto-merge label Feb 14, 2023
@sw005320
Copy link
Contributor

I quickly scan the changes, and it looks good to me.
After the CI check, I'll merge it.
I'll try to find time to check more details later.

@popcornell
Copy link
Contributor Author

If somebody can test it, it would be great.
I did it on my side, but there are lot of stuff going on and some issues could be encountered for sure.
Thanks to @kamo-naoyuki help and @YoshikiMas for testing.

@mergify mergify bot merged commit 5d4615f into espnet:master Feb 14, 2023
@kamo-naoyuki
Copy link
Collaborator

I checked local/check_data_gen.py, but I got

MD5 Checksum for mixer6/transcriptions/train_call/20090717_104045_LDC_120312.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
MD5 Checksum for mixer6/transcriptions/train_call/20090805_100942_LDC_120221.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
MD5 Checksum for mixer6/transcriptions/train_call/20090810_160537_LDC_120235.json is not the same. Data has not been generated correctly.You can retry to generate it or re-download it.If this does not work, please reach us. 
...

(Maybe all json files under train_call are failed) I think some problems still exist in this recipe.

@kamo-naoyuki
Copy link
Collaborator

For quick check, I'll paste the head of my json file here: mixer6/transcriptions/train_call/20090921_130118_LDC_120501.json

[
    {
        "start_time": "1956.580",
        "end_time": "1957.290",
        "words": "hi",
        "speaker": "120501"
    },
    {
        "start_time": "1957.720",
        "end_time": "1959.310",
        "words": "i can\u2019t really hear",
        "speaker": "120501"
    },
    {
        "start_time": "1960.420",
        "end_time": "1962.030",
        "words": "lena",
        "speaker": "120501"
    },

I'm suspicious about the \u2019. It is correct?

@popcornell
Copy link
Contributor Author

popcornell commented Feb 14, 2023

For sure, @kamo-naoyuki can you give me your path to train_call relative to your root folder as provided by LDC ?

\u2019 is correct, if it appears in the original data as provided by LDC. It should however disappear after text normalization as this is applied

jiwer_chime6_scoring = jiwer.Compose(

Do you still have after running the data generation step ?
EDIT: it should disappear only from the transcription scoring. Mine is

[
    {
        "start_time": "1956.580",
        "end_time": "1957.290",
        "words": "hi",
        "speaker": "120501"
    },
    {
        "start_time": "1957.720",
        "end_time": "1959.310",
        "words": "i can\u2019t really hear",
        "speaker": "120501"
    },
    {
        "start_time": "1960.420",
        "end_time": "1962.030",
        "words": "lena",
        "speaker": "120501"
    },

@kamo-naoyuki
Copy link
Collaborator

For sure, @kamo-naoyuki can you give me your path to train_call relative to your root folder as provided by LDC ?

splits/train_call

@popcornell
Copy link
Contributor Author

I have the same now. But this does not really explain why the MD5 checksum is not the same.
There must be some difference in the JSON at this point. Can you send me it in Slack, so I can compare ?

@popcornell
Copy link
Contributor Author

UPDATE: the annotation I have differs from one LDC gave to participants (in a non significant way but it changes the hash).
I will recompute the hash with LDC annotation.
That is what I can do.
Also the check_data script should better return an error rather than printing.

@popcornell
Copy link
Contributor Author

@kamo-naoyuki Did the check fail for other stuff too ?

[gpu_gss]:
[gss]:

We would like to thank Dr. Naoyuki Kamo for his precious help.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for mention me, but sorry, I don't have Ph.D. :->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I am sorry. I can remove it. Honestly you deserve an honorary one ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion auto-merge Enable auto-merge ESPnet1 ESPnet2 README Recipe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants