Update the documentation to include "ctc-decoding" #71

luomingshuang · 2021-10-08T12:30:03Z

This PR is a response for #59.

I mainly modify the decoding part in conformer_ctc.rst and add 'ctc-decoding' to the decoding part.

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

pzelasko · 2021-10-08T13:29:37Z

docs/source/recipes/librispeech/conformer_ctc.rst

+      --epoch 25 \
+      --avg 1 \
+      --max-duration 300 \
+      --bucketing-sampler 0 \


I think the decoding can be faster if you use --bucketing-sampler 1 instead, without affecting the results.

Test datasets and validation datasets always use single cut sampler, I think. Please see

icefall/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py

Line 303 in adb068e

sampler = SingleCutSampler(

bucketing sampler is used only in the training datasets.

Ah, good point. We might want to change that, it could speed up the decoding a bit by re-ordering the cuts to get rid of unnecessary padding. I think for training there was a nice speedup but I don't remember the numbers (I think sth like 1h -> 45min per epoch, it was back in snowfall).

The current code is verbatim-copied from snowfall. It's a good idea to make test dataset also support bucketing sampler.

pzelasko · 2021-10-08T13:30:12Z

docs/source/recipes/librispeech/conformer_ctc.rst

+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/decode.py \
+      --epoch 25 \
+      --avg 1 \


Is the averaging not helping anymore?

Maybe I can have a test.

From past experience, model averaging always helps.

csukuangfj · 2021-10-08T15:12:38Z

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

Would be great if you also add ctc-decoding to them. Thanks!

csukuangfj · 2021-10-08T15:06:14Z

docs/source/recipes/librispeech/conformer_ctc.rst

@@ -292,9 +292,18 @@ The commonly used options are:

  - ``--method``

-    This specifies the decoding method.
+    This specifies the decoding method. This script support seven decoding methods. 


Suggested change

This specifies the decoding method. This script support seven decoding methods.

This specifies the decoding method. This script supports 7 decoding methods.

csukuangfj · 2021-10-08T15:06:43Z

docs/source/recipes/librispeech/conformer_ctc.rst

+    As for ctc decoding, it uses a sentence piece model to convert word pieces to words. 
+    And it needs neither a lexicon nor an n-gram LM.
+
+    For example, the following command uses CTC topology for rescoring:


Suggested change

For example, the following command uses CTC topology for rescoring:

For example, the following command uses CTC topology for decoding:

csukuangfj · 2021-10-08T15:07:36Z

docs/source/recipes/librispeech/conformer_ctc.rst

+    .. code-block::
+
+      $ cd egs/librispeech/ASR
+      $ ./conformer_ctc/decode.py --method ctc-decoding --max-duration 300 --bucketing-sampler False


You don't need to specify the option --bucketing-sampler. It is used only in the training script.

csukuangfj · 2021-10-08T15:08:58Z

docs/source/recipes/librispeech/conformer_ctc.rst

+      --avg 1 \
+      --max-duration 300 \
+      --bucketing-sampler 0 \
+      --full-libri 0 \


Also, --full-libri is not needed in decode.py. It is used only in training.

csukuangfj · 2021-10-08T15:10:50Z

docs/source/recipes/librispeech/conformer_ctc.rst

+
+    When enabled, the batches will come from buckets of similar duration (saves padding frames). 
+
+Here are some results for reference based on CTC decoding when set vocab size as 500:


Suggested change

Here are some results for reference based on CTC decoding when set vocab size as 500:

Here are some results for CTC decoding with a vocab size of 500:

csukuangfj · 2021-10-08T15:11:42Z

docs/source/recipes/librispeech/conformer_ctc.rst

@@ -310,6 +319,67 @@ The commonly used options are:

    It has the same meaning as the one during training. A larger
    value may cause OOM.
+
+  - ``--bucketing-sampler``


Please move this argument to the training part. It is not used in decoding.

luomingshuang · 2021-10-09T02:06:14Z

Will take some time to do it.

I wonder if it is necessary to modify the pretrained.py to support ctc decoding and make tdnn_lstm_ctc to support ctc decoding. If do this, it seems many changes are needed.

Would be great if you also add ctc-decoding to them. Thanks!

luomingshuang · 2021-10-09T02:28:35Z

Thank all of your suggestions. It has been modified. I think this docs can be merged first. I will build a PR when pretrained.py and tdnn_lstm_ctc support ctc decoding.

luomingshuang added 7 commits October 8, 2021 19:20

Update conformer_ctc.rst

45f742e

Update conformer_ctc.rst

19749d9

Update conformer_ctc.rst

dda56e2

Update conformer_ctc.rst

957010f

Update conformer_ctc.rst

ae3f83e

Update conformer_ctc.rst

07d5dd1

Update conformer_ctc.rst

8695fbf

pzelasko reviewed Oct 8, 2021

View reviewed changes

csukuangfj requested changes Oct 8, 2021

View reviewed changes

Update conformer_ctc.rst

b96bd64

csukuangfj added the ready label Oct 9, 2021

pzelasko mentioned this pull request Oct 9, 2021

Use BucketingSampler for dev and test data #73

Merged

csukuangfj merged commit 6e43905 into k2-fsa:master Oct 9, 2021

luomingshuang mentioned this pull request Oct 9, 2021

CUDA out of memory in decoding #70

Open

csukuangfj mentioned this pull request Oct 23, 2021

Update the documentation to include "ctc-decoding" #59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the documentation to include "ctc-decoding" #71

Update the documentation to include "ctc-decoding" #71

luomingshuang commented Oct 8, 2021

pzelasko Oct 8, 2021

csukuangfj Oct 8, 2021

pzelasko Oct 8, 2021

csukuangfj Oct 9, 2021

pzelasko Oct 8, 2021

luomingshuang Oct 8, 2021

csukuangfj Oct 8, 2021

csukuangfj commented Oct 8, 2021

csukuangfj Oct 8, 2021 •

edited

csukuangfj Oct 8, 2021

csukuangfj Oct 8, 2021

csukuangfj Oct 8, 2021

csukuangfj Oct 8, 2021

csukuangfj Oct 8, 2021

luomingshuang commented Oct 9, 2021

luomingshuang commented Oct 9, 2021 •

edited

	This specifies the decoding method. This script support seven decoding methods.
	This specifies the decoding method. This script supports 7 decoding methods.

	For example, the following command uses CTC topology for rescoring:
	For example, the following command uses CTC topology for decoding:


		When enabled, the batches will come from buckets of similar duration (saves padding frames).

		Here are some results for reference based on CTC decoding when set vocab size as 500:

	Here are some results for reference based on CTC decoding when set vocab size as 500:
	Here are some results for CTC decoding with a vocab size of 500:

Update the documentation to include "ctc-decoding" #71

Update the documentation to include "ctc-decoding" #71

Conversation

luomingshuang commented Oct 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csukuangfj commented Oct 8, 2021

csukuangfj Oct 8, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luomingshuang commented Oct 9, 2021

luomingshuang commented Oct 9, 2021 • edited

csukuangfj Oct 8, 2021 •

edited

luomingshuang commented Oct 9, 2021 •

edited