Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTC HLG decoding for zipformer #1287

Merged
merged 3 commits into from
Oct 2, 2023

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Oct 2, 2023

You can find a pre-trained zipformer CTC model and HLG.fst in the following repo
https://huggingface.co/csukuangfj/sherpa-onnx-zipformer-ctc-en-2023-10-02

HLG decoding

cd egs/librispeech/ASR

./zipformer/onnx_pretrained_ctc_HLG.py \
 --nn-model sherpa-onnx-zipformer-ctc-en-2023-10-02/model.onnx \
 --words sherpa-onnx-zipformer-ctc-en-2023-10-02/words.txt \
 --HLG ./sherpa-onnx-zipformer-ctc-en-2023-10-02/HLG.fst \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/0.wav  \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/1.wav \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/2.wav 

HL decoding

cd egs/librispeech/ASR

./zipformer/onnx_pretrained_ctc_HL.py \
 --nn-model sherpa-onnx-zipformer-ctc-en-2023-10-02/model.onnx \
 --words sherpa-onnx-zipformer-ctc-en-2023-10-02/words.txt \
 --HL ./sherpa-onnx-zipformer-ctc-en-2023-10-02/HL.fst \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/0.wav  \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/1.wav \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/2.wav 

H decoding

cd egs/librispeech/ASR

./zipformer/onnx_pretrained_ctc_H.py \
 --nn-model sherpa-onnx-zipformer-ctc-en-2023-10-02/model.onnx \
 --tokens sherpa-onnx-zipformer-ctc-en-2023-10-02/tokens.txt \
 --H ./sherpa-onnx-zipformer-ctc-en-2023-10-02/H.fst \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/0.wav  \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/1.wav \
 ./sherpa-onnx-zipformer-ctc-en-2023-10-02/test_wavs/2.wav 

@csukuangfj csukuangfj added the ctc label Oct 2, 2023
@csukuangfj csukuangfj added ctc and removed ctc labels Oct 2, 2023
@csukuangfj csukuangfj merged commit 109354b into k2-fsa:master Oct 2, 2023
34 checks passed
@csukuangfj csukuangfj deleted the ctc-zipformer-hlg branch October 2, 2023 06:00
@armusc
Copy link
Contributor

armusc commented Oct 27, 2023

Hi
I was trying the HLG decoding with a kaldi decoder, i.e. onnx_pretrained_ctc_HLG.py
the onnx model is fine because decoding without HLG gives good results
I built the HLG using convert-k2-to-openfst.py
I can see from printing the ilabels from the HLG best pah decoding that token ids are shifted by 1 with respect to decoded tokens from a HLG-less decoding (this is somehow explained in the comment # are shifted by 1 during graph construction)
the words on the olabel side are definitely not correct, they correspond to the "shifted by 1" tokenization
should I have done somethingdifferent in the HLG conversion?

thanks

@csukuangfj
Copy link
Collaborator Author

@armusc

Please have a look at

./local/prepare_lang_fst.py \
--lang-dir $lang_dir \
--ngram-G ./data/lm/G_3_gram.fst.txt

You need to use ./local/prepare_lang_fst.py to generate H.fst, HL.fst and HLG.fst.

Any other scripts won't work, e.g., convert-k2-to-openfst.py will not work here.

Sorry for the confusion. We will update the documentation to describe how to use it.

@armusc
Copy link
Contributor

armusc commented Oct 28, 2023

thanks, it works

@armusc
Copy link
Contributor

armusc commented Oct 28, 2023

is HLG decoding with kaldi faster-decoder supposed to give the same results as k2-icefall 1best on the ctc output?
I observe a degradation, especially in terms of deletion, regardless of the beam value or max active;
in icefall 1 best, a lattice is first computed from which the 1best is extracted, I find beneficial setting the parameter hlg_scale (usually around 0.3 is optimal) and increasing this value leads to more deletion; (and btw, would it be possible to use the kaldi latgen-decoder on the ctc output to extract the 1 best from a kaldi lattice? would that make any difference?)

@csukuangfj
Copy link
Collaborator Author

and btw, would it be possible to use the kaldi latgen-decoder on the ctc output to extract the 1 best from a kaldi lattice

Yes, it is possible. I am working on it. Instead of 1best, you can get N-best, where N >=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants