Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update result for full libri + GigaSpeech using transducer_stateless. #231

Merged
merged 1 commit into from Mar 1, 2022

Conversation

csukuangfj
Copy link
Collaborator

This PR provides the WER for full libri + GigaSpeech. See #213 for more details.

The following tables compare the WERs with and without using multiple datasets:

Baseline (without using multiple dataset)

Time per epoch (~2 hours 46 minutes, using 4 GPUs)

test-clean test-other comment
greedy search (max sym per frame 1) 2.67 6.67 --epoch 63, --avg 19, --max-duration 100
modified beam search (beam size 4) 2.67 6.57 --epoch 63, --avg 19, --max-duration 100

(tensorboard log: https://tensorboard.dev/experiment/qgvWkbF2R46FYA6ZMNmOjA/#scalars)

With multiple dataset (--giga-prob 0.2)

test-clean test-other comment
greedy search (max sym per frame 1) 2.64 6.55 --epoch 39, --avg 15, --max-duration 100
modified beam search (beam size 4) 2.61 6.46 --epoch 39, --avg 15, --max-duration 100

(tensorboard log: https://tensorboard.dev/experiment/xmo5oCgrRVelH9dCeOkYBg/)

Time per epoch (~4 hours 15 minutes, using 4 GPUs)

The training time per epoch is increased as it is using more data in the training. However, it converges faster (39 epochs vs 63 epochs). If we decrease the probability to select data from GigaSpeech, it will definitely decrease the training time, but it needs more experiments to see how it affects the WER.

@danpovey
Copy link
Collaborator

danpovey commented Mar 1, 2022

Cool!
Hopefully it will give more improvement in situations where we have less training data available (or where
the model is larger).

@csukuangfj csukuangfj merged commit 05cb297 into k2-fsa:master Mar 1, 2022
@csukuangfj csukuangfj deleted the update-results-2 branch March 1, 2022 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants