Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the documentation to include "ctc-decoding" #71

Merged
merged 8 commits into from
Oct 9, 2021
Merged

Update the documentation to include "ctc-decoding" #71

merged 8 commits into from
Oct 9, 2021

Conversation

luomingshuang
Copy link
Collaborator

This PR is a response for #59.

I mainly modify the decoding part in conformer_ctc.rst and add 'ctc-decoding' to the decoding part.

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

--epoch 25 \
--avg 1 \
--max-duration 300 \
--bucketing-sampler 0 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the decoding can be faster if you use --bucketing-sampler 1 instead, without affecting the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test datasets and validation datasets always use single cut sampler, I think. Please see

bucketing sampler is used only in the training datasets.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. We might want to change that, it could speed up the decoding a bit by re-ordering the cuts to get rid of unnecessary padding. I think for training there was a nice speedup but I don't remember the numbers (I think sth like 1h -> 45min per epoch, it was back in snowfall).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is verbatim-copied from snowfall. It's a good idea to make test dataset also support bucketing sampler.

$ cd egs/librispeech/ASR
$ ./conformer_ctc/decode.py \
--epoch 25 \
--avg 1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the averaging not helping anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can have a test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From past experience, model averaging always helps.

@csukuangfj
Copy link
Collaborator

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

Would be great if you also add ctc-decoding to them. Thanks!

@@ -292,9 +292,18 @@ The commonly used options are:

- ``--method``

This specifies the decoding method.
This specifies the decoding method. This script support seven decoding methods.
Copy link
Collaborator

@csukuangfj csukuangfj Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This specifies the decoding method. This script support seven decoding methods.
This specifies the decoding method. This script supports 7 decoding methods.

As for ctc decoding, it uses a sentence piece model to convert word pieces to words.
And it needs neither a lexicon nor an n-gram LM.

For example, the following command uses CTC topology for rescoring:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, the following command uses CTC topology for rescoring:
For example, the following command uses CTC topology for decoding:

.. code-block::

$ cd egs/librispeech/ASR
$ ./conformer_ctc/decode.py --method ctc-decoding --max-duration 300 --bucketing-sampler False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to specify the option --bucketing-sampler. It is used only in the training script.

--avg 1 \
--max-duration 300 \
--bucketing-sampler 0 \
--full-libri 0 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, --full-libri is not needed in decode.py. It is used only in training.


When enabled, the batches will come from buckets of similar duration (saves padding frames).

Here are some results for reference based on CTC decoding when set vocab size as 500:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here are some results for reference based on CTC decoding when set vocab size as 500:
Here are some results for CTC decoding with a vocab size of 500:

@@ -310,6 +319,67 @@ The commonly used options are:

It has the same meaning as the one during training. A larger
value may cause OOM.

- ``--bucketing-sampler``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this argument to the training part. It is not used in decoding.

@luomingshuang
Copy link
Collaborator Author

Will take some time to do it.

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

Would be great if you also add ctc-decoding to them. Thanks!

@luomingshuang
Copy link
Collaborator Author

luomingshuang commented Oct 9, 2021

Thank all of your suggestions. It has been modified. I think this docs can be merged first. I will build a PR when pretrained.py and tdnn_lstm_ctc support ctc decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants