# SeamlessM4T

![img](https://github.com/facebookresearch/seamless_communication/raw/main/seamlessM4T.png)

SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.

SeamlessM4T covers:
- 📥 101 languages for speech input.
- ⌨️ 96 Languages for text input/output.
- 🗣️ 35 languages for speech output.

This unified model enables multiple tasks without relying on multiple separate models:
- Speech-to-speech translation (S2ST)
- Speech-to-text translation (S2TT)
- Text-to-speech translation (T2ST)
- Text-to-text translation (T2TT)
- Automatic speech recognition (ASR)

Links:
- [Github](https://github.com/facebookresearch/seamless_communication/tree/main)
- [Blog](https://ai.meta.com/blog/seamless-m4t)
- [Paper](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf)
- [Demo](https://seamless.metademolab.com/)
- [🤗 Hugging Face space](https://huggingface.co/spaces/facebook/seamless_m4t)

# Quick Start

In [1]:
!pip install fairseq2==0.1 pydub yt-dlp
!git clone https://github.com/facebookresearch/seamless_communication.git
%cd seamless_communication
!git checkout 01c1042841f9bce66902eb2c7512dbdd71d42112 
!pip install .

fatal: destination path 'seamless_communication' already exists and is not an empty directory.
/kaggle/working/seamless_communication
HEAD is now at 01c1042 Update README.md
Processing /kaggle/working/seamless_communication
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: seamless-communication
  Building wheel for seamless-communication (setup.py) ... [?25ldone
[?25h  Created wheel for seamless-communication: filename=seamless_communication-0.1-py3-none-any.whl size=42677 sha256=077f8e83b477ac3ae14f71f9b7db04cfe116f9f59d4138bfe89396ee53af6645
  Stored in directory: /tmp/pip-ephem-wheel-cache-1jsqwz6q/wheels/f0/cf/23/abec8184257cf69b02e15bebac2f2ad64cd2876e4f48b471cf
Successfully built seamless-communication
Installing collected packages: seamless-communication
  Attempting uninstall: seamless-communication
    Found existing installation: seamless-communication 0.1
    Uninstalling seamless-communication-0.1:
      Successfully uninstalled

In [2]:
from seamless_communication.models.inference import Translator
import torch
import os

In [3]:
# Initialize a Translator object with a multitask model, vocoder on the GPU.
translator = Translator(
    "seamlessM4T_large",
    "vocoder_36langs",
    torch.device("cuda:0")
)

Using the cached checkpoint of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached tokenizer of the model 'seamlessM4T_large'. Set `force=True` to download again.
Using the cached checkpoint of the model 'vocoder_36langs'. Set `force=True` to download again.


# Download test audio file

In [4]:
!wget -P ./audio_samples/ https://github.com/deepanshu88/Datasets/raw/master/Audio/audio_file_test.wav

--2023-08-24 09:00:55--  https://github.com/deepanshu88/Datasets/raw/master/Audio/audio_file_test.wav
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/deepanshu88/Datasets/master/Audio/audio_file_test.wav [following]
--2023-08-24 09:00:56--  https://raw.githubusercontent.com/deepanshu88/Datasets/master/Audio/audio_file_test.wav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 185388 (181K) [audio/wav]
Saving to: ‘./audio_samples/audio_file_test.wav.1’


2023-08-24 09:00:56 (7.63 MB/s) - ‘./audio_samples/audio_file_test.wav.1’ saved [185388/185388]



# Speech to text translate

In [5]:
translated_text, _, _ = translator.predict('/kaggle/working/seamless_communication/audio_samples/audio_file_test.wav', "s2tt", 'eng')

translated_text

CString('it's a lovely day to day and whatever you've got to do would be so happy to be doing it with you')

# Text to text translate

In [6]:
text = "سلام امروز هوا چطوره؟"

In [7]:
translated_text, _, _ = translator.predict(text, "t2tt", 'eng', src_lang='pes')

translated_text

CString('Hi. How's the weather today?')

# **Learn more about supported languages [here](https://github.com/facebookresearch/seamless_communication/tree/main/scripts/m4t/predict)**