-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the documentation to include "ctc-decoding" #71
Changes from 7 commits
45f742e
19749d9
dda56e2
957010f
ae3f83e
07d5dd1
8695fbf
b96bd64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -292,9 +292,18 @@ The commonly used options are: | |||||
|
||||||
- ``--method`` | ||||||
|
||||||
This specifies the decoding method. | ||||||
This specifies the decoding method. This script support seven decoding methods. | ||||||
As for ctc decoding, it uses a sentence piece model to convert word pieces to words. | ||||||
And it needs neither a lexicon nor an n-gram LM. | ||||||
|
||||||
For example, the following command uses CTC topology for rescoring: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
.. code-block:: | ||||||
|
||||||
$ cd egs/librispeech/ASR | ||||||
$ ./conformer_ctc/decode.py --method ctc-decoding --max-duration 300 --bucketing-sampler False | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to specify the option |
||||||
|
||||||
The following command uses attention decoder for rescoring: | ||||||
And the following command uses attention decoder for rescoring: | ||||||
|
||||||
.. code-block:: | ||||||
|
||||||
|
@@ -310,6 +319,67 @@ The commonly used options are: | |||||
|
||||||
It has the same meaning as the one during training. A larger | ||||||
value may cause OOM. | ||||||
|
||||||
- ``--bucketing-sampler`` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please move this argument to the training part. It is not used in decoding. |
||||||
|
||||||
When enabled, the batches will come from buckets of similar duration (saves padding frames). | ||||||
|
||||||
Here are some results for reference based on CTC decoding when set vocab size as 500: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Usage: | ||||||
|
||||||
.. code-block:: bash | ||||||
|
||||||
$ cd egs/librispeech/ASR | ||||||
$ ./conformer_ctc/decode.py \ | ||||||
--epoch 25 \ | ||||||
--avg 1 \ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the averaging not helping anymore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I can have a test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From past experience, model averaging always helps. |
||||||
--max-duration 300 \ | ||||||
--bucketing-sampler 0 \ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the decoding can be faster if you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test datasets and validation datasets always use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, good point. We might want to change that, it could speed up the decoding a bit by re-ordering the cuts to get rid of unnecessary padding. I think for training there was a nice speedup but I don't remember the numbers (I think sth like 1h -> 45min per epoch, it was back in snowfall). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current code is verbatim-copied from snowfall. It's a good idea to make test dataset also support bucketing sampler. |
||||||
--full-libri 0 \ | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, |
||||||
--exp-dir conformer_ctc/exp \ | ||||||
--lang-dir data/lang_bpe_500 \ | ||||||
--method ctc-decoding | ||||||
|
||||||
The output is given below: | ||||||
|
||||||
.. code-block:: bash | ||||||
|
||||||
2021-09-26 12:44:31,033 INFO [decode.py:537] Decoding started | ||||||
2021-09-26 12:44:31,033 INFO [decode.py:538] | ||||||
{'lm_dir': PosixPath('data/lm'), 'subsampling_factor': 4, 'vgg_frontend': False, 'use_feat_batchnorm': True, | ||||||
'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'num_decoder_layers': 6, 'search_beam': 20, 'output_beam': 8, | ||||||
'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, | ||||||
'epoch': 25, 'avg': 1, 'method': 'ctc-decoding', 'num_paths': 100, 'nbest_scale': 0.5, | ||||||
'export': False, 'exp_dir': PosixPath('conformer_ctc/exp'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'full_libri': False, | ||||||
'feature_dir': PosixPath('data/fbank'), 'max_duration': 100, 'bucketing_sampler': False, 'num_buckets': 30, | ||||||
'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, | ||||||
'shuffle': True, 'return_cuts': True, 'num_workers': 2} | ||||||
2021-09-26 12:44:31,406 INFO [lexicon.py:113] Loading pre-compiled data/lang_bpe_500/Linv.pt | ||||||
2021-09-26 12:44:31,464 INFO [decode.py:548] device: cuda:0 | ||||||
2021-09-26 12:44:36,171 INFO [checkpoint.py:92] Loading checkpoint from conformer_ctc/exp/epoch-25.pt | ||||||
2021-09-26 12:44:36,776 INFO [decode.py:652] Number of model parameters: 109226120 | ||||||
2021-09-26 12:44:37,714 INFO [decode.py:473] batch 0/206, cuts processed until now is 12 | ||||||
2021-09-26 12:45:15,944 INFO [decode.py:473] batch 100/206, cuts processed until now is 1328 | ||||||
2021-09-26 12:45:54,443 INFO [decode.py:473] batch 200/206, cuts processed until now is 2563 | ||||||
2021-09-26 12:45:56,411 INFO [decode.py:494] The transcripts are stored in conformer_ctc/exp/recogs-test-clean-ctc-decoding.txt | ||||||
2021-09-26 12:45:56,592 INFO [utils.py:331] [test-clean-ctc-decoding] %WER 3.26% [1715 / 52576, 163 ins, 128 del, 1424 sub ] | ||||||
2021-09-26 12:45:56,807 INFO [decode.py:506] Wrote detailed error stats to conformer_ctc/exp/errs-test-clean-ctc-decoding.txt | ||||||
2021-09-26 12:45:56,808 INFO [decode.py:522] | ||||||
For test-clean, WER of different settings are: | ||||||
ctc-decoding 3.26 best for test-clean | ||||||
|
||||||
2021-09-26 12:45:57,362 INFO [decode.py:473] batch 0/203, cuts processed until now is 15 | ||||||
2021-09-26 12:46:35,565 INFO [decode.py:473] batch 100/203, cuts processed until now is 1477 | ||||||
2021-09-26 12:47:15,106 INFO [decode.py:473] batch 200/203, cuts processed until now is 2922 | ||||||
2021-09-26 12:47:16,131 INFO [decode.py:494] The transcripts are stored in conformer_ctc/exp/recogs-test-other-ctc-decoding.txt | ||||||
2021-09-26 12:47:16,208 INFO [utils.py:331] [test-other-ctc-decoding] %WER 8.21% [4295 / 52343, 396 ins, 315 del, 3584 sub ] | ||||||
2021-09-26 12:47:16,432 INFO [decode.py:506] Wrote detailed error stats to conformer_ctc/exp/errs-test-other-ctc-decoding.txt | ||||||
2021-09-26 12:47:16,432 INFO [decode.py:522] | ||||||
For test-other, WER of different settings are: | ||||||
ctc-decoding 8.21 best for test-other | ||||||
|
||||||
2021-09-26 12:47:16,433 INFO [decode.py:680] Done! | ||||||
|
||||||
Pre-trained Model | ||||||
----------------- | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.