Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

akreal · 2022-11-22T15:28:45Z

No description provided.

codecov · 2022-11-22T15:56:51Z

Codecov Report

Merging #4777 (e75d8dc) into master (ca2193d) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4777      +/-   ##
==========================================
- Coverage   80.32%   80.31%   -0.01%     
==========================================
  Files         530      530              
  Lines       46527    46527              
==========================================
- Hits        37372    37369       -3     
- Misses       9155     9158       +3

Flag	Coverage Δ
test_integration_espnet1	`66.37% <ø> (ø)`
test_integration_espnet2	`48.85% <ø> (-0.02%)`	⬇️
test_python	`68.61% <ø> (-0.01%)`	⬇️
test_utils	`23.30% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/asr/transducer/beam_search_transducer.py	`97.85% <0.00%> (-0.92%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320 · 2022-11-22T17:09:27Z

Thanks, @akreal!
Is it possible to use wavlm for the speech encoder since the result is not better than WavLM and I want to check the effectiveness of mBART.

@siddhu001, can you review this PR?

akreal · 2022-11-22T17:54:25Z

Is it possible to use wavlm for the speech encoder since the result is not better than WavLM and I want to check the effectiveness of mBART.

Yes, sure. I started WavLM experiment, it should finish on Friday. Hopefully the results will be better, since WavLM is more focused on English.

I mainly used this setting (but with Conformer) to test different strategies to pretrain the interface between XLS-R and mBART and did not test mBART effectiveness specifically. There are more differences between this configuration and WavLM configuration from recipe. To see the effect of mBART better, I would suggest to take the recipe's WavLM configuration and add mBART usage. I can run this experiment next week.

siddhu001 · 2022-11-22T19:14:28Z

egs2/slue-voxpopuli/asr1/run-hugging-face.sh

+    --feats_normalize utterance_mvn \
+    --asr_config "${asr_config}" \
+    --inference_config "${inference_config}" \
+    --inference_nj 1 \


Suggested change

--inference_nj 1 \

siddhu001 · 2022-11-22T19:14:31Z

egs2/slue-voxpopuli/asr1/run-hugging-face.sh

+    --asr_config "${asr_config}" \
+    --inference_config "${inference_config}" \
+    --inference_nj 1 \
+    --gpu_inference true \


Suggested change

--gpu_inference true \

siddhu001 · 2022-11-22T19:16:38Z

egs2/slue-voxpopuli/asr1/README.md

+
+## Using XLS-R pretrained speech Encoder and mBART-50 Large pretrained text Encoder-Decoder
+
+- ASR config: [conf/tuning/train_asr_branchformer_xlsr_mbart.yaml](conf/tuning/train_asr_branchformer_xlsr_mbart.yaml)


Could you try to also upload this model to hugging face? It would be useful for future experimentation on this.

Some time ago I tried to upload SLURP model and the file size was too large. I can try it again once the WavLM experiment will finish. If this does not work, I could share it somehow else (Zenodo?).

Yeah zenodo will work too if hugging face gives some issues.

siddhu001

Hi @akreal Thanks for the PR. This is very useful and I would be very interested to understand the effectiveness of mBART. I have only a few minor suggestions, otherwise I think it is ready to be merged.

akreal · 2022-11-22T20:21:54Z

Hi @siddhu001 ! Thank you for your review. I'll include the changes once the WavLM configuration will finish to train. I'll comment about the effectiveness of mBART next week.

akreal · 2022-11-28T18:43:01Z

I started WavLM experiment, it should finish on Friday. Hopefully the results will be better, since WavLM is more focused on English.

WavLM experiments finished and the results are only tiny bit better than this recipe (GA is gradient accumulation steps):

	GA	Macro F1(%)	Micro F1 (%)	Macro Label F1(%)	Micro Label F1 (%)
This recipe (WavLM)	2	61.0	74.5	81.6	88.0
XLS-R + mBART	4	55.96	70.60	76.74	83.31
WavLM + mBART	2	60.45	74.81	82.82	87.73
WavLM + mBART	4	60.35	74.57	82.93	88.06

I'm running one last experiment with GA=8, it should finish tomorrow and then I'll update this PR with the best configuration.

To see the effect of mBART better, I would suggest to take the recipe's WavLM configuration and add mBART usage.

I tried this but the results are quite bad:

	Macro F1(%)	Micro F1 (%)	Macro Label F1(%)	Micro Label F1 (%)
This recipe	61.0	74.5	81.6	88.0
This recipe + mBART Enc & Dec	57.1	72.0	80.6	86.2
This recipe + mBART Dec	49.6	63.5	74.1	80.8

So overall mBART does not look very useful for this dataset, at least without some extra pretraining (for WavLM-mBART interface and/or NER task).

siddhu001 · 2022-11-29T04:54:21Z

@akreal Thanks for computing these results! They look very interesting.

The decline in results with WavLM configuration can also be because the current wavlm configuration is maybe not the best config with hugging face tokenization and parameters probably need to be tuned. I believe analyzing the model errors and trying with additional pretraining for WavLM BERT interface, as you suggested, are exciting future directions in this space.

akreal · 2022-11-29T12:31:03Z

Here are the results:

	GA	Macro F1(%)	Micro F1 (%)	Macro Label F1(%)	Micro Label F1 (%)	Avg.
This recipe (WavLM)	2	61.0	74.5	81.6	88.0	76.28
XLS-R + mBART	4	55.96	70.60	76.74	83.31	71.65
WavLM + mBART	2	60.45	74.81	82.82	87.73	76.45
WavLM + mBART	4	60.35	74.57	82.93	88.06	76.47
WavLM + mBART	8	59.95	74.58	82.62	88.05	76.30

They are very similar, but I'll keep WavLM + mBART with GA=4.

the current wavlm configuration is maybe not the best config with hugging face tokenization and parameters probably need to be tuned

That's right, I had to change learning rate (otherwise it did not work at all) but did not tune it.

sw005320 · 2022-11-29T14:07:02Z

Thanks a lot, @akreal!
After the CI check, I'll merge this PR.

mergify bot added ESPnet2 README labels Nov 22, 2022

sw005320 added the SLU Spoken language understanding label Nov 22, 2022

sw005320 added this to the v.202211 milestone Nov 22, 2022

siddhu001 reviewed Nov 22, 2022

View reviewed changes

siddhu001 requested changes Nov 22, 2022

View reviewed changes

sw005320 added the Recipe label Nov 29, 2022

Add SLUE-VoxPopuli results for WavLM with mBART-50

e75d8dc

akreal force-pushed the hf-slue-voxpopuli branch from 24cfde7 to e75d8dc Compare November 29, 2022 13:19

akreal changed the title ~~Add SLUE-VoxPopuli results for XLS-R with mBART-50~~ Add SLUE-VoxPopuli results for WavLM with mBART-50 Nov 29, 2022

sw005320 added the auto-merge Enable auto-merge label Nov 29, 2022

mergify bot merged commit 0ad9561 into espnet:master Nov 29, 2022

akreal deleted the hf-slue-voxpopuli branch October 12, 2023 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

akreal commented Nov 22, 2022

codecov bot commented Nov 22, 2022 •

edited

sw005320 commented Nov 22, 2022

akreal commented Nov 22, 2022

siddhu001 Nov 22, 2022

siddhu001 Nov 22, 2022

siddhu001 Nov 22, 2022

akreal Nov 22, 2022

siddhu001 Nov 22, 2022

siddhu001 left a comment

akreal commented Nov 22, 2022

akreal commented Nov 28, 2022

siddhu001 commented Nov 29, 2022

akreal commented Nov 29, 2022

sw005320 commented Nov 29, 2022


		## Using XLS-R pretrained speech Encoder and mBART-50 Large pretrained text Encoder-Decoder

		- ASR config: [conf/tuning/train_asr_branchformer_xlsr_mbart.yaml](conf/tuning/train_asr_branchformer_xlsr_mbart.yaml)

Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

Conversation

akreal commented Nov 22, 2022

codecov bot commented Nov 22, 2022 • edited

Codecov Report

sw005320 commented Nov 22, 2022

akreal commented Nov 22, 2022

siddhu001 Nov 22, 2022

Choose a reason for hiding this comment

siddhu001 Nov 22, 2022

Choose a reason for hiding this comment

siddhu001 Nov 22, 2022

Choose a reason for hiding this comment

akreal Nov 22, 2022

Choose a reason for hiding this comment

siddhu001 Nov 22, 2022

Choose a reason for hiding this comment

siddhu001 left a comment

Choose a reason for hiding this comment

akreal commented Nov 22, 2022

akreal commented Nov 28, 2022

siddhu001 commented Nov 29, 2022

akreal commented Nov 29, 2022

sw005320 commented Nov 29, 2022

codecov bot commented Nov 22, 2022 •

edited