Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESPnet2] Intermediate/Self-conditioned CTC #4084

Merged
merged 5 commits into from
Feb 23, 2022

Conversation

YosukeHiguchi
Copy link
Contributor

@YosukeHiguchi YosukeHiguchi commented Feb 17, 2022

This PR adds implementations for Intermediate/Self-conditioned CTC (Inter/SC-CTC) on espnet2, along with a strong baseline configuration for training a pure Conformer-CTC model. I'm still running some experiments to verify that the implementation is accurate and after that I will add the results.

@mergify mergify bot added the ESPnet2 label Feb 17, 2022
@sw005320 sw005320 added ASR Automatic speech recogntion New Features labels Feb 17, 2022
@sw005320 sw005320 added this to the v.0.10.7 milestone Feb 17, 2022
@sw005320
Copy link
Contributor

@brianyan918, can I ask you to review this PR?
Let's help @YosukeHiguchi and merge this PR ASAP

@brianyan918
Copy link
Contributor

I believe this implementation is well done. I agree with the way to pass the pointer to the ctc module from forward() instead of during init. I still need to fix that way in espnet1, because it would cause issues when using multi-gpu.

@YosukeHiguchi
Copy link
Contributor Author

YosukeHiguchi commented Feb 21, 2022

First results on LibriSpeech-100h (w/o an LM and beam-search decoding):

dev_clean dev_other test_clean test_other
CTC 7.6 21.2 7.9 21.4
InterCTC 6.8 20.0 7.1 20.3
SC-CTC 6.8 20.2 7.2 20.5

Overall, the results are really good except the SC-CTC results are slightly worse than InterCTC's, which contradicts our observations on espnet1 (https://arxiv.org/pdf/2110.05249.pdf). I will look over my implementation again and try to tune the models.

@codecov
Copy link

codecov bot commented Feb 21, 2022

Codecov Report

Merging #4084 (2abfaa6) into master (527e093) will increase coverage by 0.00%.
The diff coverage is 44.57%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #4084    +/-   ##
========================================
  Coverage   80.94%   80.94%            
========================================
  Files         435      435            
  Lines       37425    37651   +226     
========================================
+ Hits        30294    30477   +183     
- Misses       7131     7174    +43     
Flag Coverage Δ
test_integration_espnet1 67.13% <ø> (ø)
test_integration_espnet2 52.07% <25.30%> (-0.37%) ⬇️
test_python 66.68% <39.75%> (+0.07%) ⬆️
test_utils 24.45% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/asr/encoder/conformer_encoder.py 83.57% <35.71%> (-12.01%) ⬇️
espnet2/asr/encoder/transformer_encoder.py 82.22% <47.82%> (-11.99%) ⬇️
espnet2/asr/ctc.py 84.21% <50.00%> (-0.93%) ⬇️
espnet2/asr/espnet_model.py 79.18% <50.00%> (-4.32%) ⬇️
espnet2/bin/asr_inference.py 88.00% <50.00%> (-0.35%) ⬇️
espnet2/main_funcs/average_nbest_models.py 98.30% <0.00%> (-1.70%) ⬇️
espnet2/train/trainer.py 77.66% <0.00%> (ø)
espnet2/enh/layers/dnn_beamformer.py 97.87% <0.00%> (+1.46%) ⬆️
espnet2/enh/layers/complex_utils.py 77.06% <0.00%> (+5.79%) ⬆️
espnet2/enh/layers/beamformer.py 84.63% <0.00%> (+10.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 527e093...2abfaa6. Read the comment docs.

@sw005320
Copy link
Contributor

Can you add a test?

@sw005320 sw005320 merged commit 9c24b3a into espnet:master Feb 23, 2022
@sw005320
Copy link
Contributor

Thanks, @YosukeHiguchi!
It would be great if you keep on working on improving self-condition CTC and adding tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet2 New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants