-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the documentation to include "ctc-decoding" #71
Conversation
--epoch 25 \ | ||
--avg 1 \ | ||
--max-duration 300 \ | ||
--bucketing-sampler 0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the decoding can be faster if you use --bucketing-sampler 1
instead, without affecting the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test datasets and validation datasets always use single cut sampler
, I think. Please see
sampler = SingleCutSampler( |
bucketing sampler
is used only in the training datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. We might want to change that, it could speed up the decoding a bit by re-ordering the cuts to get rid of unnecessary padding. I think for training there was a nice speedup but I don't remember the numbers (I think sth like 1h -> 45min per epoch, it was back in snowfall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is verbatim-copied from snowfall. It's a good idea to make test dataset also support bucketing sampler.
$ cd egs/librispeech/ASR | ||
$ ./conformer_ctc/decode.py \ | ||
--epoch 25 \ | ||
--avg 1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the averaging not helping anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can have a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From past experience, model averaging always helps.
Would be great if you also add |
@@ -292,9 +292,18 @@ The commonly used options are: | |||
|
|||
- ``--method`` | |||
|
|||
This specifies the decoding method. | |||
This specifies the decoding method. This script support seven decoding methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specifies the decoding method. This script support seven decoding methods. | |
This specifies the decoding method. This script supports 7 decoding methods. |
As for ctc decoding, it uses a sentence piece model to convert word pieces to words. | ||
And it needs neither a lexicon nor an n-gram LM. | ||
|
||
For example, the following command uses CTC topology for rescoring: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the following command uses CTC topology for rescoring: | |
For example, the following command uses CTC topology for decoding: |
.. code-block:: | ||
|
||
$ cd egs/librispeech/ASR | ||
$ ./conformer_ctc/decode.py --method ctc-decoding --max-duration 300 --bucketing-sampler False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to specify the option --bucketing-sampler
. It is used only in the training script.
--avg 1 \ | ||
--max-duration 300 \ | ||
--bucketing-sampler 0 \ | ||
--full-libri 0 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, --full-libri
is not needed in decode.py
. It is used only in training.
|
||
When enabled, the batches will come from buckets of similar duration (saves padding frames). | ||
|
||
Here are some results for reference based on CTC decoding when set vocab size as 500: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some results for reference based on CTC decoding when set vocab size as 500: | |
Here are some results for CTC decoding with a vocab size of 500: |
@@ -310,6 +319,67 @@ The commonly used options are: | |||
|
|||
It has the same meaning as the one during training. A larger | |||
value may cause OOM. | |||
|
|||
- ``--bucketing-sampler`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this argument to the training part. It is not used in decoding.
Will take some time to do it.
|
Thank all of your suggestions. It has been modified. I think this docs can be merged first. I will build a PR when pretrained.py and tdnn_lstm_ctc support ctc decoding. |
This PR is a response for #59.
I mainly modify the decoding part in conformer_ctc.rst and add 'ctc-decoding' to the decoding part.
I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.