Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Conv2dSubsampling1 module and test it in AphasiaBank ASR recipe #4892

Merged
merged 5 commits into from Jan 30, 2023

Conversation

tjysdsg
Copy link
Contributor

@tjysdsg tjysdsg commented Jan 27, 2023

Conv2dSubsampling1 module is similar to Conv2dSubsampling except the subsampling ratio is 1:1.

AphasiaBank English ASR experiments using E-Branchformer+WavLM showed a CER decrease from 17.7 to 17.0 by switching from Conv2dSubsampling2 to Conv2dSubsampling1.

The user can specify input_layer: conv2d1 in the encoder configuration to use this module.

@codecov
Copy link

codecov bot commented Jan 27, 2023

Codecov Report

Merging #4892 (c1dc65f) into master (3970558) will decrease coverage by 0.04%.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##           master    #4892      +/-   ##
==========================================
- Coverage   76.61%   76.57%   -0.04%     
==========================================
  Files         603      603              
  Lines       53677    53737      +60     
==========================================
+ Hits        41122    41151      +29     
- Misses      12555    12586      +31     
Flag Coverage Δ
test_integration_espnet1 66.33% <22.22%> (-0.05%) ⬇️
test_integration_espnet2 47.59% <16.66%> (-0.08%) ⬇️
test_python 66.45% <80.00%> (-0.02%) ⬇️
test_utils 23.35% <22.22%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/asr/encoder/transformer_encoder.py 92.04% <50.00%> (-0.98%) ⬇️
...pnet2/asr/encoder/transformer_encoder_multispkr.py 82.60% <50.00%> (-0.98%) ⬇️
...et/nets/pytorch_backend/transformer/subsampling.py 86.17% <77.77%> (-1.99%) ⬇️
espnet2/asr/encoder/branchformer_encoder.py 94.19% <100.00%> (+0.05%) ⬆️
espnet2/asr/encoder/conformer_encoder.py 96.35% <100.00%> (+0.05%) ⬆️
espnet2/asr/encoder/e_branchformer_encoder.py 95.15% <100.00%> (+0.05%) ⬆️
espnet2/asr/encoder/longformer_encoder.py 79.50% <100.00%> (+0.34%) ⬆️
espnet2/asr/espnet_model.py 77.69% <0.00%> (-3.36%) ⬇️
espnet2/train/trainer.py 76.40% <0.00%> (-0.94%) ⬇️
espnet2/train/preprocessor.py 27.14% <0.00%> (-0.26%) ⬇️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@simpleoier simpleoier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better results!
Looks good to me in general!
Just one concern about the name of the new module, Conv2dSubsampling1 doesn't do subsampling actually.

@tjysdsg
Copy link
Contributor Author

tjysdsg commented Jan 27, 2023

Better results! Looks good to me in general! Just one concern about the name of the new module, Conv2dSubsampling1 doesn't do subsampling actually.

Oh right, what's your suggestion? Should we just call it Conv2d1 or Conv2dInputLayer?

@@ -9,6 +9,24 @@
- Git hash: `39c1ec0509904f16ac36d25efc971e2a94ff781f`
- Commit date: `Wed Dec 21 12:50:18 2022 -0500`

## asr_train_asr_ebranchformer_small_wavlm_large1

- [train_asr_ebranchformer_small_wavlm_large.yaml](conf/tuning/train_asr_ebranchformer_small_wavlm_large.yaml)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short note here to explain the difference? E.g., this config uses xxx as the input layer which does not perform downsampling.

Copy link
Collaborator

@pyf98 pyf98 Jan 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this config name and path correct? It looks same as the previous one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed in 4734d75

@sw005320 sw005320 added this to the v.202301 milestone Jan 29, 2023
@sw005320
Copy link
Contributor

Better results! Looks good to me in general! Just one concern about the name of the new module, Conv2dSubsampling1 doesn't do subsampling actually.

Oh right, what's your suggestion? Should we just call it Conv2d1 or Conv2dInputLayer?

I think Conv2dSubsampling1 is OK, but it is better to add a comment that this does not do any subsampling in appropriate places.

@sw005320 sw005320 added the Enhancement Enhancement label Jan 29, 2023
@tjysdsg
Copy link
Contributor Author

tjysdsg commented Jan 29, 2023

@sw005320 I made the comment clearer about the functionality in c1dc65f

@sw005320 sw005320 merged commit e37ee27 into espnet:master Jan 30, 2023
@sw005320
Copy link
Contributor

Thanks a lot!

@tjysdsg tjysdsg deleted the conv2d1 branch January 30, 2023 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants