Update MMS README.md (#5202)

facebookresearch · Jun 14, 2023 · 91c364b · 91c364b
1 parent b2d5b78
commit 91c364b
Showing 1 changed file with 23 additions and 97 deletions.
diff --git a/examples/mms/README.md b/examples/mms/README.md
@@ -9,97 +9,22 @@ An overview of the languages covered by MMS can be found [here](https://dl.fbaip
 ## 🤗 Transformers
 
 MMS has been added to Transformers. For more information, please refer to [Transformers' MMS docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
-[Click here](https://huggingface.co/models?other=mms) to find all MMS checkpoints on the Hub.
+
+[Click here](https://huggingface.co/models?other=mms) to find all MMS checkpoints on the Hub. 
+
+Checkout the demo here [![Open In HF Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/facebook/MMS) 
 
 ## Finetuned models
 ### ASR
 
-| Model | Languages | Dataset | Model | Dictionary* | Supported languages |
-|---|---|---|---|---|---
-MMS-1B:FL102 | 102 | FLEURS | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_fl102/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102_langs.html) 
-MMS-1B:L1107| 1107 | MMS-lab | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_l1107/eng.txt)  | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107_langs.html) 
-MMS-1B-all| 1162 | MMS-lab + FLEURS <br>+ CV + VP + MLS |  [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_all/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all_langs.html)
+| Model | Languages | Dataset | Model | Dictionary* | Supported languages |  |
+|---|---|---|---|---|---|---
+MMS-1B:FL102 | 102 | FLEURS | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_fl102/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-fl102)
+MMS-1B:L1107| 1107 | MMS-lab | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_l1107/eng.txt)  | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-l1107)
+MMS-1B-all| 1162 | MMS-lab + FLEURS <br>+ CV + VP + MLS |  [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_all/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-all)
 
 \* In the `Dictionary` column, we provide the download link for token dictionary in English language. To download token dictionary for a different language supported by the model, modify the language code in the URL appropriately. For example, to get token dictionary of FL102 model for Hindi language, use [this](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_fl102/hin.txt) link. 
 
-**🤗 Transformers**
-
-First, we install transformers and some other libraries
-```
-pip install torch datasets[audio]
-pip install --upgrade transformers
-````
-
-**Note**: In order to use MMS you need to have at least `transformers >= 4.30` installed. If the `4.30` version
-is not yet available [on PyPI](https://pypi.org/project/transformers/) make sure to install `transformers` from 
-source:
-```
-pip install git+https://github.com/huggingface/transformers.git
-```
-
-Next, we load a couple of audio samples via `datasets`. Make sure that the audio data is sampled to 16000 kHz.
-
-```py
-from datasets import load_dataset, Audio
-
-# English
-stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
-stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
-en_sample = next(iter(stream_data))["audio"]["array"]
-
-# Swahili
-stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "sw", split="test", streaming=True)
-stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
-sw_sample = next(iter(stream_data))["audio"]["array"]
-```
-
-Next, we load the model and processor
-
-```py
-from transformers import Wav2Vec2ForCTC, AutoProcessor
-import torch
-
-model_id = "facebook/mms-1b-all"
-
-processor = AutoProcessor.from_pretrained(model_id)
-model = Wav2Vec2ForCTC.from_pretrained(model_id)
-```
-
-Now we process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2-base-960h](https://huggingface.co/facebook/wav2vec2-base-960h)
-
-```py
-inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
-
-with torch.no_grad():
-    outputs = model(**inputs).logits
-
-ids = torch.argmax(outputs, dim=-1)[0]
-transcription = processor.decode(ids)
-# 'joe keton disapproved of films and buster also had reservations about the media'
-```
-
-We can now keep the same model in memory and simply switch out the language adapters by calling the convenient [`load_adapter()`]() function for the model and [`set_target_lang()`]() for the tokenizer. We pass the target language as an input - "swh" for Swahili.
-
-```py
-processor.tokenizer.set_target_lang("swh")
-model.load_adapter("swh")
-
-inputs = processor(sw_sample, sampling_rate=16_000, return_tensors="pt")
-
-with torch.no_grad():
-    outputs = model(**inputs).logits
-
-ids = torch.argmax(outputs, dim=-1)[0]
-transcription = processor.decode(ids)
-# 'wachambuzi wa soka wanamtaja mesi kama nyota hatari zaidi duniani'
-# => In English: "soccer analysts describe Messi as the most dangerous player in the world"
-```
-
-In the same way the language can be switched out for all other supported languages. Please have a look at:
-```py
-processor.tokenizer.vocab.keys()
-```
-
 ### TTS
 1. Download the list of [iso codes](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) of 1107 languages.
 2. Find the iso code of the target language and download the checkpoint. Each folder contains 3 files: `G_100000.pth`,  `config.json`, `vocab.txt`. The `G_100000.pth` is the generator trained for 100K updates, `config.json` is the training config, `vocab.txt` is the vocabulary for the TTS model. 
@@ -117,14 +42,14 @@ wget https://dl.fbaipublicfiles.com/mms/tts/full_model/eng.tar.gz # English (eng
 
 ### LID
 
-\# Languages | Dataset | Model | Dictionary | Supported languages |
-|---|---|---|---|---
-126 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l126/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126_langs.html)
-256 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l256/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256_langs.html)
-512 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l512/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512_langs.html)
-1024 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l1024/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024_langs.html)
-2048 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l2048/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048_langs.html)
-4017 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l4017/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017_langs.html)
+\# Languages | Dataset | Model | Dictionary | Supported languages | |
+|---|---|---|---|---|---
+126 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l126/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-lid-126)
+256 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l256/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-lid-256)
+512 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l512/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-512)
+1024 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l1024/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-1024)
+2048 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l2048/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-2048)
+4017 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l4017/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-4017)
 
 ## Commands to run inference 
 
@@ -185,7 +110,8 @@ Available options:
   cd bindings/python 
   python3 setup.py install
   ```
-  Train a [KenLM language model](https://github.com/flashlight/wav2letter/tree/main/recipes/rasr#language-model) and prepare a lexicon file in [this](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lexicon.txt) format. 
+  Train a [KenLM language model](https://github.com/flashlight/wav2letter/tree/main/recipes/rasr#language-model) and prepare a lexicon file in [this](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lexicon.txt) format. Pretrained languages models from our paper can be found in [🤗 Hub](https://huggingface.co/facebook/mms-cclms/).
+
   ```
    LANG=<iso> # for example - 'eng', 'azj-script_latin'
    PYTHONPATH=. PREFIX=INFER HYDRA_FULL_ERROR=1  python examples/speech_recognition/new/infer.py  --config-dir=examples/mms/asr/config \
@@ -246,10 +172,10 @@ We also provide an Ipython notebook example inside `lid/tutorial` folder [ipynb]
 
 ## Pretrained models
 
-| Model | Link
-|---|---
-MMS-300M | [download](https://dl.fbaipublicfiles.com/mms/pretraining/base_300m.pt)
-MMS-1B | [download](https://dl.fbaipublicfiles.com/mms/pretraining/base_1b.pt)
+| Model | Link | |
+|---|---|---
+MMS-300M | [download](https://dl.fbaipublicfiles.com/mms/pretraining/base_300m.pt) | [🤗 Hub](https://huggingface.co/facebook/mms-300m)
+MMS-1B | [download](https://dl.fbaipublicfiles.com/mms/pretraining/base_1b.pt) | [🤗 Hub](https://huggingface.co/facebook/mms-1b)
 
 Example commands to finetune the pretrained models can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#fine-tune-a-pre-trained-model-with-ctc).