Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SLUE-VoxPopuli results for WavLM with mBART-50 #4777

Merged
merged 1 commit into from Nov 29, 2022

Conversation

akreal
Copy link
Contributor

@akreal akreal commented Nov 22, 2022

No description provided.

@codecov
Copy link

codecov bot commented Nov 22, 2022

Codecov Report

Merging #4777 (e75d8dc) into master (ca2193d) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4777      +/-   ##
==========================================
- Coverage   80.32%   80.31%   -0.01%     
==========================================
  Files         530      530              
  Lines       46527    46527              
==========================================
- Hits        37372    37369       -3     
- Misses       9155     9158       +3     
Flag Coverage Δ
test_integration_espnet1 66.37% <ø> (ø)
test_integration_espnet2 48.85% <ø> (-0.02%) ⬇️
test_python 68.61% <ø> (-0.01%) ⬇️
test_utils 23.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/asr/transducer/beam_search_transducer.py 97.85% <0.00%> (-0.92%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added the SLU Spoken language understanding label Nov 22, 2022
@sw005320 sw005320 added this to the v.202211 milestone Nov 22, 2022
@sw005320
Copy link
Contributor

Thanks, @akreal!
Is it possible to use wavlm for the speech encoder since the result is not better than WavLM and I want to check the effectiveness of mBART.

@siddhu001, can you review this PR?

@akreal
Copy link
Contributor Author

akreal commented Nov 22, 2022

Is it possible to use wavlm for the speech encoder since the result is not better than WavLM and I want to check the effectiveness of mBART.

Yes, sure. I started WavLM experiment, it should finish on Friday. Hopefully the results will be better, since WavLM is more focused on English.

I mainly used this setting (but with Conformer) to test different strategies to pretrain the interface between XLS-R and mBART and did not test mBART effectiveness specifically. There are more differences between this configuration and WavLM configuration from recipe. To see the effect of mBART better, I would suggest to take the recipe's WavLM configuration and add mBART usage. I can run this experiment next week.

--feats_normalize utterance_mvn \
--asr_config "${asr_config}" \
--inference_config "${inference_config}" \
--inference_nj 1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--inference_nj 1 \

--asr_config "${asr_config}" \
--inference_config "${inference_config}" \
--inference_nj 1 \
--gpu_inference true \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--gpu_inference true \


## Using XLS-R pretrained speech Encoder and mBART-50 Large pretrained text Encoder-Decoder

- ASR config: [conf/tuning/train_asr_branchformer_xlsr_mbart.yaml](conf/tuning/train_asr_branchformer_xlsr_mbart.yaml)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to also upload this model to hugging face? It would be useful for future experimentation on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some time ago I tried to upload SLURP model and the file size was too large. I can try it again once the WavLM experiment will finish. If this does not work, I could share it somehow else (Zenodo?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah zenodo will work too if hugging face gives some issues.

Copy link
Collaborator

@siddhu001 siddhu001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @akreal Thanks for the PR. This is very useful and I would be very interested to understand the effectiveness of mBART. I have only a few minor suggestions, otherwise I think it is ready to be merged.

@akreal
Copy link
Contributor Author

akreal commented Nov 22, 2022

Hi @siddhu001 ! Thank you for your review. I'll include the changes once the WavLM configuration will finish to train. I'll comment about the effectiveness of mBART next week.

@akreal
Copy link
Contributor Author

akreal commented Nov 28, 2022

I started WavLM experiment, it should finish on Friday. Hopefully the results will be better, since WavLM is more focused on English.

WavLM experiments finished and the results are only tiny bit better than this recipe (GA is gradient accumulation steps):

GA Macro F1(%) Micro F1 (%) Macro Label F1(%) Micro Label F1 (%)
This recipe (WavLM) 2 61.0 74.5 81.6 88.0
XLS-R + mBART 4 55.96 70.60 76.74 83.31
WavLM + mBART 2 60.45 74.81 82.82 87.73
WavLM + mBART 4 60.35 74.57 82.93 88.06

I'm running one last experiment with GA=8, it should finish tomorrow and then I'll update this PR with the best configuration.

To see the effect of mBART better, I would suggest to take the recipe's WavLM configuration and add mBART usage.

I tried this but the results are quite bad:

Macro F1(%) Micro F1 (%) Macro Label F1(%) Micro Label F1 (%)
This recipe 61.0 74.5 81.6 88.0
This recipe + mBART Enc & Dec 57.1 72.0 80.6 86.2
This recipe + mBART Dec 49.6 63.5 74.1 80.8

So overall mBART does not look very useful for this dataset, at least without some extra pretraining (for WavLM-mBART interface and/or NER task).

@siddhu001
Copy link
Collaborator

@akreal Thanks for computing these results! They look very interesting.

The decline in results with WavLM configuration can also be because the current wavlm configuration is maybe not the best config with hugging face tokenization and parameters probably need to be tuned. I believe analyzing the model errors and trying with additional pretraining for WavLM BERT interface, as you suggested, are exciting future directions in this space.

@akreal
Copy link
Contributor Author

akreal commented Nov 29, 2022

Here are the results:

GA Macro F1(%) Micro F1 (%) Macro Label F1(%) Micro Label F1 (%) Avg.
This recipe (WavLM) 2 61.0 74.5 81.6 88.0 76.28
XLS-R + mBART 4 55.96 70.60 76.74 83.31 71.65
WavLM + mBART 2 60.45 74.81 82.82 87.73 76.45
WavLM + mBART 4 60.35 74.57 82.93 88.06 76.47
WavLM + mBART 8 59.95 74.58 82.62 88.05 76.30

They are very similar, but I'll keep WavLM + mBART with GA=4.

the current wavlm configuration is maybe not the best config with hugging face tokenization and parameters probably need to be tuned

That's right, I had to change learning rate (otherwise it did not work at all) but did not tune it.

@akreal akreal changed the title Add SLUE-VoxPopuli results for XLS-R with mBART-50 Add SLUE-VoxPopuli results for WavLM with mBART-50 Nov 29, 2022
@sw005320 sw005320 added the auto-merge Enable auto-merge label Nov 29, 2022
@sw005320
Copy link
Contributor

Thanks a lot, @akreal!
After the CI check, I'll merge this PR.

@mergify mergify bot merged commit 0ad9561 into espnet:master Nov 29, 2022
@akreal akreal deleted the hf-slue-voxpopuli branch October 12, 2023 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enable auto-merge ESPnet2 README Recipe SLU Spoken language understanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants