Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to adapt or train AV-HuBERT for other languages? #92

Open
cooelf opened this issue May 18, 2023 · 1 comment
Open

How to adapt or train AV-HuBERT for other languages? #92

cooelf opened this issue May 18, 2023 · 1 comment

Comments

@cooelf
Copy link

cooelf commented May 18, 2023

Thanks for the awesome work! I am wondering if it is possible to make AV-HuBERT work for other languages, e.g., Chinese.

I notice that there is a multilingual version in the paper. Is it compatible with different languages? Otherwise, could you provide any suggestions, assuming there is a Chinese lip movement dataset.

Thanks!

@chevalierNoir
Copy link
Contributor

@cooelf Yes, using AV-HuBERT for other languages should also work. You can choose a pre-trained checkpoint (large or base) and fine-tune that with Chinese lip reading dataset following the fine-tuning command and refer to this for how to prepare the data. Alternatively, pre-training an AV-HuBERT model of Chinese version from scratch is also doable if you have sufficiently large amount of the audio-visual data.

We mentioned a multilingually pre-trained AV-HuBERT in the paper but that model was not released as it's not as good as the English-only one on LRS3 benchmark. JFYI, we did multilingually fine-tuned AV-HuBERT in our follow-up work and you can find the model checkpoints in this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants