You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the awesome work! I am wondering if it is possible to make AV-HuBERT work for other languages, e.g., Chinese.
I notice that there is a multilingual version in the paper. Is it compatible with different languages? Otherwise, could you provide any suggestions, assuming there is a Chinese lip movement dataset.
Thanks!
The text was updated successfully, but these errors were encountered:
@cooelf Yes, using AV-HuBERT for other languages should also work. You can choose a pre-trained checkpoint (large or base) and fine-tune that with Chinese lip reading dataset following the fine-tuning command and refer to this for how to prepare the data. Alternatively, pre-training an AV-HuBERT model of Chinese version from scratch is also doable if you have sufficiently large amount of the audio-visual data.
We mentioned a multilingually pre-trained AV-HuBERT in the paper but that model was not released as it's not as good as the English-only one on LRS3 benchmark. JFYI, we did multilingually fine-tuned AV-HuBERT in our follow-up work and you can find the model checkpoints in this repo.
Thanks for the awesome work! I am wondering if it is possible to make AV-HuBERT work for other languages, e.g., Chinese.
I notice that there is a multilingual version in the paper. Is it compatible with different languages? Otherwise, could you provide any suggestions, assuming there is a Chinese lip movement dataset.
Thanks!
The text was updated successfully, but these errors were encountered: