AVSR recipe for Easycom Dataset #5630

ms-dot-k · 2024-01-22T08:21:17Z

What?

New recipe for training audio-visual speech recognition model on Easycom dataset.
The recipe is based on LRS3 avsr recipe which utilizes pre-trained AV-HuBERT model. (Dumped features)

I added data augmentation techniques to the espnet2/asr/encoder/avhubert_encoder.py

acoustic noise perturbation: a babble noise is corrupted with random noisy strengths at the feature level.
modality dropout: audio and video streams are randomly dropped out, so that we can still perform audio-visual or audio-only, visual-only prediction after the model is trained.

The dataset is very challenging due to noise and long-distance voice.
Previous ASR model (wav2vec2.0) trained on 60k hours of data achieves 87.5% WER (https://arxiv.org/pdf/2212.11377.pdf). Therefore, by employing the visual information, we can improve the performance greatly by complementing the insufficient audio information (due to noise, overlapped speech, and long-distance voice) during speech recognition.

The trained model using the recipe was trained on 1,759 hours of data for pre-training (AV-HuBERT) and 438 hours of data for finetuning. Considering the data amount, the current performance seems reasonable.

One possible direction to improve the performance is using more audio-visual data including LRS2, VoxCeleb, and AVSpeech.

for more information, see https://pre-commit.ci

ftshijt

Very cool extension! Many thanks for the effort.

Could you please also add en entry in egs2/README.md for the dataset?

Also, two minor comments as follows:

egs2/easycom/avsr1/local/data.sh

for more information, see https://pre-commit.ci

codecov · 2024-01-26T07:30:35Z

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (27f292d) 76.11% compared to head (0582547) 76.13%.

Files	Patch %	Lines
espnet2/asr/encoder/avhubert_encoder.py	33.33%	26 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5630      +/-   ##
==========================================
+ Coverage   76.11%   76.13%   +0.01%     
==========================================
  Files         743      743              
  Lines       69117    69151      +34     
==========================================
+ Hits        52608    52647      +39     
+ Misses      16509    16504       -5

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <ø> (+0.14%)`	⬆️
test_integration_espnet2	`48.48% <2.56%> (-0.04%)`	⬇️
test_python_espnet1	`18.39% <0.00%> (-0.01%)`	⬇️
test_python_espnet2	`52.66% <33.33%> (+0.03%)`	⬆️
test_utils	`22.15% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

for more information, see https://pre-commit.ci

sw005320 · 2024-01-31T14:42:42Z

Thanks a lot!

ms-dot-k and others added 3 commits January 22, 2024 17:07

AVSR recipe for EASYCOM dataset (training with LRS3 dataset)

02de038

Improve training with modality dropout and noise augmentation, Availa…

3df53b0

…ble for audio-only training and inference

Update db.sh

d854971

Add easycom dataset

mergify bot added ESPnet2 Installation labels Jan 22, 2024

pre-commit-ci bot and others added 2 commits January 22, 2024 08:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

afbaaf2

for more information, see https://pre-commit.ci

README

29364e5

mergify bot added the README label Jan 22, 2024

ms-dot-k and others added 3 commits January 22, 2024 18:31

add model

0c02f35

[pre-commit.ci] auto fixes from pre-commit.com hooks

02454c9

for more information, see https://pre-commit.ci

Merge branch 'easycom' of https://github.com/ms-dot-k/espnet into eas…

c12528f

…ycom

sw005320 requested a review from ftshijt January 22, 2024 12:30

sw005320 added the AV Audio visual processing label Jan 22, 2024

sw005320 added this to the v.202312 milestone Jan 22, 2024

sw005320 reviewed Jan 22, 2024

View reviewed changes

ms-dot-k and others added 6 commits January 22, 2024 22:35

Reflect the comment

ba4ced1

Reflect the comment

dba195b

Modification according to ci

198012e

Modification according to ci

6b72ef1

Modification according to ci

a7d720e

[pre-commit.ci] auto fixes from pre-commit.com hooks

569d6d7

for more information, see https://pre-commit.ci

ftshijt reviewed Jan 25, 2024

View reviewed changes

egs2/easycom/avsr1/local/data.sh Outdated Show resolved Hide resolved

egs2/easycom/avsr1/local/data.sh Outdated Show resolved Hide resolved

ms-dot-k and others added 5 commits January 26, 2024 14:43

Reflecting comments

7aaab99

[pre-commit.ci] auto fixes from pre-commit.com hooks

15e42c3

for more information, see https://pre-commit.ci

Add new dataset in egs2/README.md

27ae6a2

ci

67e05f4

[pre-commit.ci] auto fixes from pre-commit.com hooks

88fbf17

for more information, see https://pre-commit.ci

ms-dot-k and others added 2 commits January 28, 2024 13:44

for ci test

8c69e87

Merge branch 'master' into easycom

a78511d

[pre-commit.ci] auto fixes from pre-commit.com hooks

0582547

for more information, see https://pre-commit.ci

sw005320 merged commit 348139f into espnet:master Jan 31, 2024
26 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVSR recipe for Easycom Dataset #5630

AVSR recipe for Easycom Dataset #5630

ms-dot-k commented Jan 22, 2024

sw005320 left a comment

sw005320 Jan 22, 2024

ms-dot-k Jan 22, 2024

ftshijt left a comment

codecov bot commented Jan 26, 2024 •

edited

sw005320 commented Jan 31, 2024

+              |---|---|---|---|---|---|---|---|---|
+              |inference_asr_model_valid.acc.ave/test_with_LRS3|694|8886|70.4|18.6|11.0|5.0|34.6|75.4|
+              ## Audio-only Speech Recognition Results (Audio-only) <br> exp/asr_train_avsr_avhubert_large_with_lrs3_noise_extracted_en_bpe1000

AVSR recipe for Easycom Dataset #5630

AVSR recipe for Easycom Dataset #5630

Conversation

ms-dot-k commented Jan 22, 2024

What?

See also

sw005320 left a comment

Choose a reason for hiding this comment

sw005320 Jan 22, 2024

Choose a reason for hiding this comment

ms-dot-k Jan 22, 2024

Choose a reason for hiding this comment

ftshijt left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 26, 2024 • edited

Codecov Report

sw005320 commented Jan 31, 2024

codecov bot commented Jan 26, 2024 •

edited