You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the short term we are focusing on building up our language list by training easy to segment LTR languages, as they don't require changes to the training pipeline, and are immediately supported in Firefox. These are broken into 3 groups, based on resource count from the OPUS datasets.
Data Availability
Sentence Count
High Resource
> 80 million
Med Resource
20 - 80 million
Low Resource
< 20 million
Assuming that resource availability is roughly equivalent to the quality we will be available to achieve yields the following table:
High Quality
Medium Quality
Low Quality
Russian (en-ru)
Vietnamese
Norwegian (Bokmål)
Indonesian
Slovak
Basque
Czech (en-cs)
Ukrainian (en-uk)
Galician
Hungarian (en-hu)
Slovenian (en-sl)
Norwegian (Nynorsk)
Turkish (en-tr)
Catalan (ready to ship)
Greek (en-el)
Lithuanian
Finnish (en-fi)
Croatian
Swedish
Serbian
Romanian
Latvian
Danish
Valenciano
Bosnian
We will focus on potentially "high quality" languages first, and follow-up with "medium quality". It's unclear how well the "low quality" languages will be and if they will meet our shippable criteria or not, but that can be evaluated.
To request additional languages post a request on Mozilla Connect or find an existing request for a language and give it a thumbs up.
Native Speakers
If you are a native speaker (L1 language) in any of these languages and want to help out, feel free to leave a comment on this issue or join us in Firefox Translations on matrix. We can always use help with qualitative model evaluation, and questions regarding language.
The text was updated successfully, but these errors were encountered:
In the short term we are focusing on building up our language list by training easy to segment LTR languages, as they don't require changes to the training pipeline, and are immediately supported in Firefox. These are broken into 3 groups, based on resource count from the OPUS datasets.
Assuming that resource availability is roughly equivalent to the quality we will be available to achieve yields the following table:
We will focus on potentially "high quality" languages first, and follow-up with "medium quality". It's unclear how well the "low quality" languages will be and if they will meet our shippable criteria or not, but that can be evaluated.
More links
Native Speakers
If you are a native speaker (L1 language) in any of these languages and want to help out, feel free to leave a comment on this issue or join us in Firefox Translations on matrix. We can always use help with qualitative model evaluation, and questions regarding language.
The text was updated successfully, but these errors were encountered: