<a href="https://colab.research.google.com/github/kaizengrowth/machine_learning_projects/blob/main/Testing_metaMMS_for_TTS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Massively Multilingual Speech (MMS)

| Code Credits | Link |
| ----------- | ---- |
| 🎉 Repository | [![GitHub Repository](https://img.shields.io/github/stars/facebookresearch/fairseq?style=social)](https://github.com/facebookresearch/fairseq) |
| Original Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/facebookresearch/fairseq/blob/main/examples/mms/tts/tutorial/MMS_TTS_Inference_Colab.ipynb#scrollTo=vGyb3dGWpmks) |
| 🚀 Online inference | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/mms-meta/MMS) |
| 🔥 Discover More Colab Notebooks | [![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-black?style=flat-square&logo=github)](https://github.com/R3gm/Colab-resources/) |


The Massively Multilingual Speech (MMS) project, led by Meta, is focused on expanding the language coverage of speech technology. Their goal is to surpass the existing coverage of approximately one hundred languages and extend it to over 1,000 languages. To achieve this, they have developed innovative approaches utilizing a new dataset derived from publicly available religious texts and leveraging self-supervised learning techniques.

The MMS project has successfully created several models to support their objective. They have developed pre-trained wav2vec 2.0 models that cover an impressive 1,406 languages. Additionally, they have designed a single multilingual automatic speech recognition model capable of working with 1,107 languages. Furthermore, speech synthesis models have been developed for the same number of languages. Lastly, the project has produced a language identification model that can identify a staggering 4,017 languages.

The results achieved by the MMS models have surpassed existing models and offer coverage for ten times more languages.

In [1]:
!pip install transformers datasets

Collecting datasets
  Downloading datasets-2.19.2-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.1/542.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting requests (from transformers)
  Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m999.7 kB/s[0m eta [36m0:00:00[0m
[?25hCollecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[0mCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[

In [2]:
import numpy as np
from IPython.display import Audio as audio_show
from datasets import load_dataset, Audio

## Multilingual Automatic Speech Recognition (ASR).
Adapter models to transcribe 1000+ languages

| Model | Languages | Dataset | Model | Dictionary* | Supported languages |  |
|---|---|---|---|---|---|---
MMS-1B:FL102 | 102 | FLEURS | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_fl102/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_fl102_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-fl102)
MMS-1B:L1107| 1107 | MMS-lab | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_l1107/eng.txt)  | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_l1107_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-l1107)
MMS-1B-all| 1162 | MMS-lab + FLEURS <br>+ CV + VP + MLS |  [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all.pt) | [download](https://dl.fbaipublicfiles.com/mms/asr/dict/mms1b_all/eng.txt) | [download](https://dl.fbaipublicfiles.com/mms/asr/mms1b_all_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-1b-all)




Load audio data in different languages using the Datasets.

In [None]:
from datasets import load_dataset, Audio

# the audio need a sample rate of 16000
# English
stream_data = load_dataset("mozilla-foundation/common_voice_11_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# French
stream_data = load_dataset("mozilla-foundation/common_voice_11_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]

Reading metadata...: 16354it [00:02, 5883.71it/s]
Reading metadata...: 16089it [00:02, 5920.66it/s]


In [None]:
audio_show(en_sample, rate=16000)

Load the model and processor

In [None]:
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch

model_id = "facebook/mms-1b-all"

processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

Pass the processed audio data to the model and transcribe the model output

In [None]:
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
transcription

'joe keton disapproved of films and buster also had reservations about the media'

### Language adapters
We can now keep the same model in memory and simply switch out the language adapters by calling the convenient load_adapter() function for the model and set_target_lang() for the tokenizer. We pass the target language as an input - "fra" for French.

In [None]:
audio_show(fr_sample, rate=16000)

In [None]:
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")

inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
transcription

"ce dernier est volé tout au long de l'histoire romaine"

In [None]:
## Alternative with pipeline
# from transformers import pipeline

# model_id = "facebook/mms-1b-all"
# target_lang = "fra"

# pipe = pipeline(model=model_id, model_kwargs={"target_lang": "fra", "ignore_mismatched_sizes": True})

In [None]:
## Alternative set the language before load the model
# from transformers import Wav2Vec2ForCTC, AutoProcessor

# model_id = "facebook/mms-1b-all"
# target_lang = "fra"

# processor = AutoProcessor.from_pretrained(model_id, target_lang=target_lang)
# model = Wav2Vec2ForCTC.from_pretrained(model_id, target_lang=target_lang, ignore_mismatched_sizes=True)

### Dict of supported languages

In [None]:
processor.tokenizer.vocab.keys()

## Spoken Language Identification (LID).

Classifies raw audio input to a probability distribution over 4017 output classes (each class representing a language)


| Languages | Dataset | Model | Dictionary | Supported languages | |
|---|---|---|---|---|---
126 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l126/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l126_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-lid-126)
256 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l256/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l256_langs.html) | [🤗 Hub](https://huggingface.co/facebook/mms-lid-256)
512 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l512/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l512_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-512)
1024 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l1024/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l1024_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-1024)
2048 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l2048/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l2048_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-2048)
4017 | FLEURS + VL + MMS-lab-U + MMS-unlab | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017.pt) | [download](https://dl.fbaipublicfiles.com/mms/lid/dict/l4017/dict.lang.txt) | [download](https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017_langs.html)| [🤗 Hub](https://huggingface.co/facebook/mms-lid-4017)



Load the audio

In [None]:
# the audio need a sample rate of 16000
stream_data = load_dataset("mozilla-foundation/common_voice_11_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]
audio_show(ar_sample, rate=16000)

Reading metadata...: 10440it [00:02, 4188.05it/s]


Load the model and processor

In [None]:
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

model_id = "facebook/mms-lid-4017"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)

Downloading (…)rocessor_config.json:   0%|          | 0.00/212 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.88G [00:00<?, ?B/s]

Pass the processed audio data to the model to classify it into a language

In [None]:
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
detected_lang

'ara'

See the language name

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

response = requests.get('https://dl.fbaipublicfiles.com/mms/lid/mms1b_l4017_langs.html')
html = response.content
soup = BeautifulSoup(html, 'html.parser')

# Extract language data using list comprehension
data = [(p.get_text().split('\u2003')[0].strip(), p.get_text().split('\u2003')[1].strip())
        for p in soup.find_all('p') if 'Iso Code' not in p.get_text() and 'Language Name' not in p.get_text()]

df = pd.DataFrame(data, columns=['Iso Code', 'Language Name'])

print(len(df), 'languages')
df[df['Iso Code'].isin(['ara'])]

4017 languages


Unnamed: 0,Iso Code,Language Name
0,ara,Arabic


## Multilingual Text-To-Speech (TTS).
Speech technology across a diverse range of languages

### 1. Preliminaries
This section installs necessary python packages for the other sections. Run it first.

In [None]:
%pwd
!git clone https://github.com/jaywalnut310/vits.git
!python --version
%cd vits/

!pip install Cython==0.29.21
!pip install librosa==0.8.0
!pip install phonemizer==2.2.1
!pip install scipy
!pip install numpy
!pip install torch
!pip install torchvision
!pip install matplotlib
!pip install Unidecode==1.1.1

%cd monotonic_align/
%mkdir monotonic_align
!python3 setup.py build_ext --inplace
%cd ../
%pwd

Cloning into 'vits'...
remote: Enumerating objects: 81, done.[K
remote: Total 81 (delta 0), reused 0 (delta 0), pack-reused 81[K
Unpacking objects: 100% (81/81), 3.33 MiB | 5.55 MiB/s, done.
Python 3.10.12
/content/vits
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting Cython==0.29.21
  Downloading Cython-0.29.21-py2.py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.2/974.2 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Cython
  Attempting uninstall: Cython
    Found existing installation: Cython 0.29.35
    Uninstalling Cython-0.29.35:
      Successfully uninstalled Cython-0.29.35
Successfully installed Cython-0.29.21
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting librosa==0.8.0
  Downloading librosa-0.8.0.tar.gz (183 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting phonemizer==2.2.1
  Downloading phonemizer-2.2.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.0/49.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting segments (from phonemizer==2.2.1)
  Downloading segments-2.2.1-py2.py3-none-any.whl (15 kB)
Collecting clldutils>=1.7.3 (from segments->phonemizer==2.2.1)
  Downloading clldutils-3.19.0-py2.py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m24.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting csvw>=1.5.6 (from segments->phonemizer==2.2.1)
  Downloading csvw-3.1.3-py2.py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.7/56.7 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting colorlog (from clldutils>=1.7.3->segments->phonemizer==2.2.1)
  Downloading colorlog-6.7.0-py2.py3-no

'/content/vits'

`RESTART RUNTIME`

### 2. Choose a language and download its checkpoint
Find the ISO code for your target language [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html). You can find more details about the languages we currently support for TTS in this [table](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).

In [None]:
import os
import subprocess
import locale
locale.getpreferredencoding = lambda: "UTF-8"

def download(lang, tgt_dir="./"):
  lang_fn, lang_dir = os.path.join(tgt_dir, lang+'.tar.gz'), os.path.join(tgt_dir, lang)
  cmd = ";".join([
        f"wget https://dl.fbaipublicfiles.com/mms/tts/{lang}.tar.gz -O {lang_fn}",
        f"tar zxvf {lang_fn}"
  ])
  print(f"Download model for language: {lang}")
  subprocess.check_output(cmd, shell=True)
  print(f"Model checkpoints in {lang_dir}: {os.listdir(lang_dir)}")
  return lang_dir

LANG = "eng" #@param {'type': 'string'}
ckpt_dir = download(LANG)

Download model for language: eng
Model checkpoints in ./eng: ['vocab.txt', 'config.json', 'G_100000.pth']


### 3. Load the checkpoint

In [None]:
from IPython.display import Audio
import os
import re
import glob
import json
import tempfile
import math
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
import numpy as np
import commons
import utils
import argparse
import subprocess
from data_utils import TextAudioLoader, TextAudioCollate, TextAudioSpeakerLoader, TextAudioSpeakerCollate
from models import SynthesizerTrn
from scipy.io.wavfile import write

def preprocess_char(text, lang=None):
    """
    Special treatement of characters in certain languages
    """
    print(lang)
    if lang == 'ron':
        text = text.replace("ț", "ţ")
    return text

class TextMapper(object):
    def __init__(self, vocab_file):
        self.symbols = [x.replace("\n", "") for x in open(vocab_file, encoding="utf-8").readlines()]
        self.SPACE_ID = self.symbols.index(" ")
        self._symbol_to_id = {s: i for i, s in enumerate(self.symbols)}
        self._id_to_symbol = {i: s for i, s in enumerate(self.symbols)}

    def text_to_sequence(self, text, cleaner_names):
        '''Converts a string of text to a sequence of IDs corresponding to the symbols in the text.
        Args:
        text: string to convert to a sequence
        cleaner_names: names of the cleaner functions to run the text through
        Returns:
        List of integers corresponding to the symbols in the text
        '''
        sequence = []
        clean_text = text.strip()
        for symbol in clean_text:
            symbol_id = self._symbol_to_id[symbol]
            sequence += [symbol_id]
        return sequence

    def uromanize(self, text, uroman_pl):
        iso = "xxx"
        with tempfile.NamedTemporaryFile() as tf, \
             tempfile.NamedTemporaryFile() as tf2:
            with open(tf.name, "w") as f:
                f.write("\n".join([text]))
            cmd = f"perl " + uroman_pl
            cmd += f" -l {iso} "
            cmd +=  f" < {tf.name} > {tf2.name}"
            os.system(cmd)
            outtexts = []
            with open(tf2.name) as f:
                for line in f:
                    line =  re.sub(r"\s+", " ", line).strip()
                    outtexts.append(line)
            outtext = outtexts[0]
        return outtext

    def get_text(self, text, hps):
        text_norm = self.text_to_sequence(text, hps.data.text_cleaners)
        if hps.data.add_blank:
            text_norm = commons.intersperse(text_norm, 0)
        text_norm = torch.LongTensor(text_norm)
        return text_norm

    def filter_oov(self, text):
        val_chars = self._symbol_to_id
        txt_filt = "".join(list(filter(lambda x: x in val_chars, text)))
        print(f"text after filtering OOV: {txt_filt}")
        return txt_filt

def preprocess_text(txt, text_mapper, hps, uroman_dir=None, lang=None):
    txt = preprocess_char(txt, lang=lang)
    is_uroman = hps.data.training_files.split('.')[-1] == 'uroman'
    if is_uroman:
        with tempfile.TemporaryDirectory() as tmp_dir:
            if uroman_dir is None:
                cmd = f"git clone git@github.com:isi-nlp/uroman.git {tmp_dir}"
                print(cmd)
                subprocess.check_output(cmd, shell=True)
                uroman_dir = tmp_dir
            uroman_pl = os.path.join(uroman_dir, "bin", "uroman.pl")
            print(f"uromanize")
            txt = text_mapper.uromanize(txt, uroman_pl)
            print(f"uroman text: {txt}")
    txt = txt.lower()
    txt = text_mapper.filter_oov(txt)
    return txt

if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print(f"Run inference with {device}")
vocab_file = f"{ckpt_dir}/vocab.txt"
config_file = f"{ckpt_dir}/config.json"
assert os.path.isfile(config_file), f"{config_file} doesn't exist"
hps = utils.get_hparams_from_file(config_file)
text_mapper = TextMapper(vocab_file)
net_g = SynthesizerTrn(
    len(text_mapper.symbols),
    hps.data.filter_length // 2 + 1,
    hps.train.segment_size // hps.data.hop_length,
    **hps.model)
net_g.to(device)
_ = net_g.eval()

g_pth = f"{ckpt_dir}/G_100000.pth"
print(f"load {g_pth}")

_ = utils.load_checkpoint(g_pth, net_g, None)

Run inference with cpu
load ./eng/G_100000.pth


### 4. Generate an audio given text
Specify the sentence you want to synthesize and generate the audio

In [None]:
txt = "Expanding the language coverage of speech technology has the potential to improve access to information for many more people"

print(f"text: {txt}")
txt = preprocess_text(txt, text_mapper, hps, lang=LANG)
stn_tst = text_mapper.get_text(txt, hps)
with torch.no_grad():
    x_tst = stn_tst.unsqueeze(0).to(device)
    x_tst_lengths = torch.LongTensor([stn_tst.size(0)]).to(device)
    hyp = net_g.infer(
        x_tst, x_tst_lengths, noise_scale=.667,
        noise_scale_w=0.8, length_scale=1.0
    )[0][0,0].cpu().float().numpy()

print(f"Generated audio")
Audio(hyp, rate=hps.data.sampling_rate)

text: Expanding the language coverage of speech technology has the potential to improve access to information for many more people
eng
text after filtering OOV: expanding the language coverage of speech technology has the potential to improve access to information for many more people
Generated audio


### `MMS-TTS`  alternative library

In [None]:
!pip install ttsmms

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ttsmms
  Downloading ttsmms-0.7-py3-none-any.whl (29 kB)
Collecting monotonic-align (from ttsmms)
  Downloading monotonic_align-1.0.0.tar.gz (4.8 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: monotonic-align
  Building wheel for monotonic-align (pyproject.toml) ... [?25l[?25hdone
  Created wheel for monotonic-align: filename=monotonic_align-1.0.0-cp310-cp310-linux_x86_64.whl size=328441 sha256=8355ad72ada8914c847e160f802df73d33433c3348edea14325385be6bdae883
  Stored in directory: /root/.cache/pip/wheels/2d/6b/3f/b627321313060822a9a1cde0b07e5d78728d80a290b6a5496a
Successfully built monotonic-align
Installing collected packages: monotonic-align, ttsmms
Succe

In [None]:
from ttsmms import TTS, download
import IPython.display as ipd
%matplotlib inline

lang_code = 'spa' #@param {'type':'string'}
text = "Ampliar la cobertura ling\xFC\xEDstica de la tecnolog\xEDa de reconocimiento de voz tiene el potencial de mejorar el acceso a la informaci\xF3n para muchas m\xE1s personas."  #@param {'type':'string'}
save_audio = True #@param {'type':'boolean'}

dir_path = download(lang_code,"./data") # lang_code, dir for save model
tts=TTS(dir_path)
if save_audio:
  wav = tts.synthesis(text,wav_path="example.wav")
  display(ipd.Audio(wav, rate=16000, autoplay=True))
else:
  wav =tts.synthesis(text)
  display(ipd.Audio(wav["x"], rate=wav["sampling_rate"], autoplay=True))



## Additional Resources

- [VITS](https://github.com/jaywalnut310/vits)
- [TTSMMS](https://github.com/wannaphong/ttsmms)
- [Fairseq](https://github.com/facebookresearch/fairseq)
- [mTTS Demo](https://dpc-mmstts.hf.space)
- [Fairseq mms Example](https://github.com/facebookresearch/fairseq/blob/main/examples/mms/README.md)
