Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use an external RNN-LM (mono-lingual) with a bilingual ASR? #1569

Closed
sangeet2020 opened this issue Mar 26, 2024 · 3 comments
Closed

Comments

@sangeet2020
Copy link

Hi K2 team,

Thank you so much for your amazingly efficient toolkit in streaming focused ASR.

I have trained an EN-DE bilingual streaming ASR model using this receipe.
However, I am not really satisfied with the performance on the English side, and I want to use an externally trained RNN LM (trained using this receipe) to strengthen the WER only on the English side.

I tried using --decoding-method modified_beam_search_lm_shallow_fusion and using English RNN-LM, however, ran into errors due to different vocab size used.
vocab size for bilingual ASR training = 1000 (500 for EN and 500 for DE) and vocab size used for English RNN-LM = 500.

I wonder if its possible to use a monolingual RNN LM with a bilingual ASR model.

Alternatively, is it possible to combine two RNN-LMs? or somehow interpolate them?
I saw some related discussions here: kaldi-asr/kaldi#2069.

Thank You

@marcoyang1998
Copy link
Collaborator

I think it's possible as long as the German bpe and English bpe are distinguishable.

And you also need to make sure which language you are decoding, otherwise you might end up rescoring the German utterance with English RNNLM.

@sangeet2020
Copy link
Author

sangeet2020 commented Mar 28, 2024

but wouldnt different vocab size of the BPE model for ASR and RNN-LM create an issue in the first place.

When the loading the RNN LM

            model = RnnLmModel(
                vocab_size=params.vocab_size,
                embedding_dim=params.rnn_lm_embedding_dim,
                hidden_dim=params.rnn_lm_hidden_dim,
                num_layers=params.rnn_lm_num_layers,
                tie_weights=params.rnn_lm_tie_weights,
            )

params.vocab_size is the size of the sentence piece tokenizer from ASR (1000 in my case), which is different from the actual RNN LM vocab size (500 in my case). How can I overcome this?

@marcoyang1998
Copy link
Collaborator

You need to change the code, I only mean that it's theoretically possible to use a mono-lingual RNNLM to rescore multi-lingual ASR model.

@JinZr JinZr closed this as completed Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants