CPU-optimized NMT models for Firefox Translations.
The model files are hosted using Git LFS.
prod - higher quality models
dev - test models under development (can be of low quality or speed).
When a dev model has satisfactory quality, it is moved to prod.
The evaluation is run as part of a pull request in CI.
The PR should include the models in the models/dev or models/prod category.
The evaluation will automatically run, and then commits will be added to the pull request.
The evaluation uses Microsoft and Google translation APIs, Argos Translate, NLLB and Opus-MT models and pushes results back to the branch (not available for forks).
It is performed using the evals tool.
Use Firefox Translations training pipeline or browsermt/students recipe to train CPU-optimized models. They should have similar size and inference speed to already submitted models.
Do not use SacreBLEU or Flores datasets as a part of training data, otherwise evaluation will not be correct.
To see SacreBLEU datasets run sacrebleu --list.
All models should be contributed to dev folder first.
Create a pull request to the main branch from another branch in this repo (not a fork).
This pull request should include the models, and the evaluation will be added as extra commits in the CI task.
Create a pull request to the contrib branch.
When it is reviewed and merged, a maintainer should create a pull request from contrib to main.
This second PR will run the automatic evaluation and add the evaluation commits.
You can run model evaluation locally by running bash scripts/update-results.sh.
Make sure to set environment variables GCP_CREDS_PATH and AZURE_TRANSLATOR_KEY to use Google and Microsoft APIs.
If you want to run it with bergamot only, remove mentions of those variables from bash scripts/update-results.sh and remove microsoft,google from scripts/eval.sh.
Prefix of the vocabulary file in the model registry:
vocab.- vocabulary is reused for the source and target languagessrcvocab.andtrgvocab.- different vocabularies for the source and target languages
Suffix of the model file in the registry:
intgemm8.bin- supportsgemm-precision: int8shiftAllinference settingintgemm.alphas.bin- supportsgemm-precision: int8shiftAlphaAllinference setting
Example:
cd scripts
SRC=lt TRG=en TASK_ID=SjPZGW9CRYeb9PQr68jCUw bash pull_models.sh
Where TASK_ID is a Taskcluster ID of the export task.
Models are deployed to Remote Settings to be delivered to Firefox.
Records and attachments are uploaded via a CLI tool which lives in the
remote_settings directory in this repository.
View the remote_settings README for more details on publishing models.
Prod models are available in all Firefox channels including Release. Dev models are available in Nightly only.
- Bulgarian <-> English
- Catalan <-> English
- Chinese (Simplified) -> English
- Croatian -> English
- Czech <-> English
- Danish <-> English
- Dutch <-> English
- Estonian <-> English
- Finnish <-> English
- French <-> English
- German <-> English
- Greek <-> English
- Hungarian <-> English
- Indonesian <-> English
- Italian <-> English
- Japanese <-> English
- Korean <-> English
- Latvian (Lettish) -> English
- Lithuanian -> English
- Polish <-> English
- Portuguese <-> English
- Romanian <-> English
- Russian <-> English
- Serbian -> English
- Slovak -> English
- Slovenian <-> English
- Spanish <-> English
- Swedish <-> English
- Turkish <-> English
- Ukrainian -> English
- Vietnamese -> English
- Arabic <- English
- Bosnian -> English
- Chinese (Simplified) <- English
- Croatian <- English
- Icelandic -> English
- Latvian (Lettish) <- English
- Maltese -> English
- Norwegian Bokmål -> English
- Norwegian Nynorsk -> English
- Persian (Farsi) <-> English
- Slovak <- English
- Ukrainian <- English