# **EASY IMPLEMENTATION OF TRANSLATION USING SEAMLESSM4T**

---

📎 README

SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.


About:

*   Speech-to-speech translation (S2ST)
*   Speech-to-text translation (S2TT)
*   Text-to-speech translation (T2ST)
*   Text-to-text translation (T2TT)
*   Automatic speech recognition (ASR)

References:

1.  [Huggig Face, SeamlessM4T Large](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t).
2. [Kaggle SeamlessM4T Usage in Transformers](https://www.kaggle.com/code/yoachlcmb/seamlessm4t-usage-in-transformers).


In [None]:
# @title #1. ✨ Installing dependences.
!pip install --quiet git+https://github.com/huggingface/transformers.git &> /dev/null
#!pip install --quiet git+https://github.com/google/sentencepiece &> /dev/null
!pip install sentencepiece &> /dev/null


import transformers
import sentencepiece
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

from transformers import AutoProcessor, SeamlessM4TModel
processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-medium")
model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-medium").to(device)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
# @title #2.1 ✨ Speech to Speech.

!pip install datasets &> /dev/null
from datasets import load_dataset
dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True)
audio_sample = next(iter(dataset))["audio"]

# now, process it
audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt").to(device)

audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="eng")[0].cpu().numpy().squeeze()

sampling_rate = 16000

from scipy.io.wavfile import write as write_wav

write_wav("speech_to_speech.wav", sampling_rate, audio_array_from_audio)

from IPython.display import Audio

Audio(audio_array_from_audio, rate=sampling_rate)

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.


In [None]:
# @title #2.2 ✨ Speech to Text.

# let's load an audio sample from an Hindi speech corpus
from datasets import load_dataset
dataset = load_dataset("google/fleurs", "hi_in", split="train", streaming=True)
audio_sample = next(iter(dataset))["audio"]

# now, process it

audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt").to(device)

output_tokens = model.generate(**audio_inputs, tgt_lang="spa", generate_speech=False)
translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
print(f"\n\n Translation from audio:\n\n {translated_text_from_audio}")

It is strongly recommended to pass the `sampling_rate` argument to this function. Failing to do so can result in silent errors that might be hard to debug.




 Translation from audio:

 políticos dijo que ellos habían decidido no a una manera inávisca para determinar la constitución afgana


In [None]:
# @title #2.3 ✨ Text to Speech.

# now, process some English test as well
text_inputs = processor(text = "Hola, mi perro es pequeño", src_lang="esp", return_tensors="pt").to(device)

sampling_rate = 16000
#text_inputs = processor(text = "Podemos definir Hugging Face como una empresa de tecnologia que se dedica al desarrollo de herramientas y plataformas de procesamiento de lenguaje natural o NLP basadas en inteligencia artificial.", src_lang="spa", return_tensors="pt").to(device)
audio_array_from_text = model.generate(**text_inputs, tgt_lang="eng")[0].cpu().numpy()#.squeeze()

from IPython.display import Audio

Audio(audio_array_from_text, rate=sampling_rate)


`tgt_lang=__esp__` has not be found in the `vocabulary`. Behaviour will probably be unexpected because the language token id will be replaced by the unknown token id.


In [None]:
# @title #2.4 ✨ Text to Text.

text_inputs = processor(text = ["J'aime HF de tout mon coeur.", "La vie est belle."], src_lang="fra", return_tensors="pt").to(device)

output_tokens = model.generate(**text_inputs, tgt_lang="eng", generate_speech=False)
translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
print(f"\n\nTranslation from text:\n\n {translated_text_from_text}")




Translation from text:

 I love HF with all my heart.
