Support computing nbest oracle WER. #10

csukuangfj · 2021-08-18T05:00:16Z

nbest oracle WER can help us evaluate different n-best rescoring methods
as it is the best WER that we could get if we had the perfect rescoring method.

danpovey · 2021-08-18T06:35:48Z

egs/librispeech/ASR/conformer_ctc/decode.py

@@ -56,6 +57,15 @@ def get_parser():
        "consecutive checkpoints before the checkpoint specified by "
        "'--epoch'. ",
    )
+
+    parser.add_argument(
+        "--scale",


If this scale is only used for the nbest-oracle mode, perhaps that should be clarified, e.g. via the name and the documentation? Right now it's a bit unclear whether this would affect other things

I think it is also useful for other n-best rescoring methods, e.g., attention-decoder rescoring. Tuning this value can
change the number of unique paths in an n-best list, which can potentially affect the final WER.

I'm adding more documentation to clarify its usage.

csukuangfj · 2021-08-18T06:59:16Z

The following screenshot shows the nbest oracle WER with different scale values for the librispeech test-clean and test-other datasets.

Note:

For the cell "1.96 || 4.83", it means the WER for test-clean is 1.96 and the WER for test-other is 4.83
lattice from HLG decoding, it means the lattice is from the decoding using only HLG, without LM rescoring, without attention decoder
HLG + 4-gram whole lattice rescoring, it means the lattice is the one after 4-gram whole lattice rescoring
In both cases, the transformer attention decoder is not used
For the model we're using for testing nbest oracle WER, its WER when using attention decoder is: (2.76 || 6.4)

csukuangfj · 2021-08-18T07:02:21Z

The number of unique paths increases when we use a smaller scale value. The following screenshots show this kind of change.

csukuangfj · 2021-08-18T11:09:21Z

egs/librispeech/ASR/conformer_ctc/README.md

@@ -0,0 +1,27 @@
+


This is how a pre-trained model can be used to transcribe a sound file.
@danpovey

It depends on

torchaudio, for reading sound files

kaldifeat, for feature extraction

~~Only HLG decoding with the transformer encoder output is added.~~
~~Do we need to use the attention decoder for rescoring?~~

This is great-- thanks!
Regarding using the attention decoder for rescoring-- yes, I'd like you to add that, because this will probably
be a main feature of the tutorial, and I think having good results is probably worthwhile.

Also, use kaldifeat for feature extraction.

csukuangfj · 2021-08-18T23:34:20Z

egs/librispeech/ASR/conformer_ctc/pretrained.py


-    features = features.unsqueeze(0)
+    logging.info(f"Decoding started")
+    features = fbank(waves)


Replacing torchaudio.compliance.kaldi with kaldifeat
since it is easier to extract features for multiple sound
files at the same time.

Nice. I still have adding kaldifeat to Lhotse on my radar. I might remove all other kaldi-related feature extractors at the same time. But I think I won’t be able to do it before the tutorial.

csukuangfj · 2021-08-19T08:25:24Z

Now it supports transcribing multiple files with LM rescoring and attention decoder rescoring.

Ready for review.

danpovey

Great!
Perhaps we can mention concretely where one might obtain this checkpoint, words.txt and HLG.pt, if someone were to try to run this without having trained the system? E.g. download location?

csukuangfj · 2021-08-19T08:57:52Z

@pkufool

Could you please upload the following files:

best model with model averaging, without optimizer and scheduler information
data/lang_bpe/HLG.pt
data/lang_bpe/words.txt
data/lm/G_4_gram.pt
data/lang_bpe/tokens.txt (so we know the sos and eos ID)
data/lang_bpe/bpe.model

csukuangfj · 2021-08-20T03:35:59Z

Perhaps we can mention concretely where one might obtain this checkpoint, words.txt and HLG.pt, if someone were to try to run this without having trained the system? E.g. download location?

I just added some detailed documentation to show how to download and use a pre-trained model, uploaded by @pkufool.

You can find a preview by visiting
https://github.com/k2-fsa/icefall/blob/acefc703226997b0ecc543e7464cf698220ed4e2/egs/librispeech/ASR/conformer_ctc/README.md

Will also create a colab notebook to show how to use the pre-trained model.

Ready to merge.

danpovey · 2021-08-20T03:42:54Z

Wow-- very nice and complete documentation!
LGTM.

csukuangfj · 2021-08-20T03:53:23Z

Here are the logs using the CPU to transcribe the test waves. Useful if someone wants to compare
the decoding time from the logs between CUDA and CPU without running the code.

(1) HLG decoding

$ CUDA_VISIBLE_DEVICES= ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
--words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
--HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac

2021-08-20 11:44:02,306 INFO [pretrained.py:217] device: cpu
2021-08-20 11:44:02,306 INFO [pretrained.py:219] Creating model
2021-08-20 11:44:03,210 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:44:08,006 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:44:08,008 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:44:08,017 INFO [pretrained.py:271] Decoding started
2021-08-20 11:44:18,029 INFO [pretrained.py:300] Use HLG decoding
2021-08-20 11:44:18,392 INFO [pretrained.py:339]
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN

./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION


2021-08-20 11:44:18,393 INFO [pretrained.py:341] Decoding Done

(2) HLG decoding + LM rescoring

$ CUDA_VISIBLE_DEVICES= ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
--words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
--HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
--method whole-lattice-rescoring \
--G ./tmp/conformer_ctc/data/lm/G_4_gram.pt \
--ngram-lm-scale 0.8 \
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac

2021-08-20 11:46:26,077 INFO [pretrained.py:217] device: cpu
2021-08-20 11:46:26,077 INFO [pretrained.py:219] Creating model
2021-08-20 11:46:26,980 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:46:32,169 INFO [pretrained.py:246] Loading G from ./tmp/conformer_ctc/data/lm/G_4_gram.pt
2021-08-20 11:47:16,114 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:47:16,118 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:47:16,129 INFO [pretrained.py:271] Decoding started
2021-08-20 11:47:26,052 INFO [pretrained.py:305] Use HLG decoding + LM rescoring
2021-08-20 11:47:27,805 INFO [pretrained.py:339]
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN

./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION


2021-08-20 11:47:27,806 INFO [pretrained.py:341] Decoding Done

(3) HLG decoding + LM rescoring + attention decoder rescoring

$ CUDA_VISIBLE_DEVICES= ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
--words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
--HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
--method attention-decoder \
--G ./tmp/conformer_ctc/data/lm/G_4_gram.pt \
--ngram-lm-scale 1.3 \
--attention-decoder-scale 1.2 \
--lattice-score-scale 0.5 \
--num-paths 100 \
--sos-id 1 \
--eos-id 1 \
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac

2021-08-20 11:50:58,383 INFO [pretrained.py:217] device: cpu
2021-08-20 11:50:58,383 INFO [pretrained.py:219] Creating model
2021-08-20 11:50:59,271 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:51:05,072 INFO [pretrained.py:246] Loading G from ./tmp/conformer_ctc/data/lm/G_4_gram.pt
2021-08-20 11:51:49,799 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:51:49,803 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:51:49,813 INFO [pretrained.py:271] Decoding started
2021-08-20 11:52:00,036 INFO [pretrained.py:313] Use HLG + LM rescoring + attention decoder rescoring
2021-08-20 11:52:02,372 INFO [pretrained.py:339]
./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS

./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN

./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION


2021-08-20 11:52:02,372 INFO [pretrained.py:341] Decoding Done

danpovey · 2021-08-20T04:30:40Z

BTW, for this thing where we transcribe the waves, it would be nice to know how much we are being affected by batches being too irregular. It should be possible to find how big the WER impact of this is, by changing the lhotse options for the sampler used in our test code.
In speechbrain, Mirco was working on ways to make the conformer code independent of the batching:
speechbrain/speechbrain#933

Support computing nbest oracle WER.

401c1c5

danpovey reviewed Aug 18, 2021

View reviewed changes

csukuangfj added 2 commits August 18, 2021 18:42

Add scale to all nbest based decoding/rescoring methods.

38d0604

Add script to run pretrained models.

0fa4875

csukuangfj commented Aug 18, 2021

View reviewed changes

csukuangfj added 2 commits August 18, 2021 19:31

Use torchaudio to extract features.

f731996

Support decoding multiple files at the same time.

a73d3ed

Also, use kaldifeat for feature extraction.

csukuangfj commented Aug 18, 2021

View reviewed changes

csukuangfj added 2 commits August 19, 2021 16:10

Support decoding with LM rescoring and attention-decoder rescoring.

eae1674

Minor fixes.

fb1d284

Merge remote-tracking branch 'dan/master' into nbest-oracle

f841581

danpovey reviewed Aug 19, 2021

View reviewed changes

Replace scale with lattice-score-scale.

3dadffd

csukuangfj mentioned this pull request Aug 19, 2021

The training script produce WER of 2.57% on librispeech test-clean #13

Merged

csukuangfj added 2 commits August 20, 2021 10:27

Merge remote-tracking branch 'dan/master' into nbest-oracle

60211ce

Add usage example with a provided pretrained model.

acefc70

csukuangfj merged commit 9d0cc9d into k2-fsa:master Aug 20, 2021

csukuangfj deleted the nbest-oracle branch August 20, 2021 04:45

Lzhang-hub mentioned this pull request Oct 11, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this pull request Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

wwxm0523 mentioned this pull request Jan 30, 2022

LF-MMI GPU OOM #196

Open

ahazned mentioned this pull request Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support computing nbest oracle WER. #10

Support computing nbest oracle WER. #10

csukuangfj commented Aug 18, 2021

danpovey Aug 18, 2021

csukuangfj Aug 18, 2021

csukuangfj commented Aug 18, 2021 •

edited

csukuangfj commented Aug 18, 2021 •

edited

csukuangfj Aug 18, 2021 •

edited

csukuangfj Aug 18, 2021 •

edited

This comment was marked as outdated.

danpovey Aug 19, 2021

csukuangfj Aug 18, 2021

pzelasko Aug 19, 2021

csukuangfj commented Aug 19, 2021

danpovey left a comment

csukuangfj commented Aug 19, 2021 •

edited

csukuangfj commented Aug 20, 2021

danpovey commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

danpovey commented Aug 20, 2021

Support computing nbest oracle WER. #10

Support computing nbest oracle WER. #10

Conversation

csukuangfj commented Aug 18, 2021

danpovey Aug 18, 2021

Choose a reason for hiding this comment

csukuangfj Aug 18, 2021

Choose a reason for hiding this comment

csukuangfj commented Aug 18, 2021 • edited

csukuangfj commented Aug 18, 2021 • edited

csukuangfj Aug 18, 2021 • edited

Choose a reason for hiding this comment

csukuangfj Aug 18, 2021 • edited

Choose a reason for hiding this comment

This comment was marked as outdated.

danpovey Aug 19, 2021

Choose a reason for hiding this comment

csukuangfj Aug 18, 2021

Choose a reason for hiding this comment

pzelasko Aug 19, 2021

Choose a reason for hiding this comment

csukuangfj commented Aug 19, 2021

danpovey left a comment

Choose a reason for hiding this comment

csukuangfj commented Aug 19, 2021 • edited

csukuangfj commented Aug 20, 2021

danpovey commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

(1) HLG decoding

(2) HLG decoding + LM rescoring

(3) HLG decoding + LM rescoring + attention decoder rescoring

danpovey commented Aug 20, 2021

csukuangfj commented Aug 18, 2021 •

edited

csukuangfj commented Aug 18, 2021 •

edited

csukuangfj Aug 18, 2021 •

edited

csukuangfj Aug 18, 2021 •

edited

csukuangfj commented Aug 19, 2021 •

edited