Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LM Rescoring for Seamless text decoder #366

Open
Sameep-c opened this issue Feb 27, 2024 · 1 comment
Open

LM Rescoring for Seamless text decoder #366

Sameep-c opened this issue Feb 27, 2024 · 1 comment

Comments

@Sameep-c
Copy link

Can we use an external LM rescoring model such as KenLM for the text decoder part of Seamless M4T for tasks such as ASR or S2T translation?

@avidale
Copy link
Contributor

avidale commented Feb 27, 2024

Of course we can!
A challenging part would be to properly align the tokens from the language model and from Seamless. I am not sure there is code that you can apply out of the box for this, but it is certainly a solvable task.

But I think that LM rescoring with Seamless doesn't make as much sense as with CTC-based ASR models, because the Seamless text decoder is already an autoregressive transformer language model on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants