-
Notifications
You must be signed in to change notification settings - Fork 621
Closed
Description
Hi, and thanks for the fantastic job!
I am planning to add support for the Tajik language, which has 90% intersection with the Cyrillic alphabet. I have a couple of questions. Could you please answer them?
- Do I have to update VOCABS for training, or is it used only on inference
- Is it possible to use a pre-trained model that supports the Cyrillic alphabet and fine-tune it to the Tajik alphabet if the alphabets are almost identical (and how to choose a specific pre-trained model for FT during training)?
- What dataset sizes would you recommend for training and fine-tuning for good results?
Thanks in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels