You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import torchaudio
from transformers import AutoProcessor, SeamlessM4Tv2Model
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
fileName="asr.wav"
audio, orig_freq = torchaudio.load(fileName)
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16000)
audio_inputs = processor(audios=audio, return_tensors="pt")
output_tokens = model.generate(audio_inputs, tgt_lang="cmn", generate_speech=False)
translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
//ASR result : 今天天气真不错
When I set tgt_lang="cmn",the result is correct.The original audio is in Chinese.
But when I set tgt_lang=None,the ASR result is "The weather is really nice today".It has been translated into English!
I think it could have automatically determined the language in the audio
The text was updated successfully, but these errors were encountered:
lilongwei5054
changed the title
Do ASR must specify the parameter “tgl_lang” ? (ASR 必须要指定tgt_lang这个参数吗)
Do ASR must specify the parameter “tgt_lang” ? (ASR 必须要指定tgt_lang这个参数吗)
Dec 25, 2023
ASR with Seamless is treated as a special case of translation, where the source and target languages are the same.
But the Seamless models were not trained to predict the target language on their own, so it is your responsibility to provide the right tgt_lang tag.
import torchaudio
from transformers import AutoProcessor, SeamlessM4Tv2Model
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
fileName="asr.wav"
audio, orig_freq = torchaudio.load(fileName)
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16000)
audio_inputs = processor(audios=audio, return_tensors="pt")
output_tokens = model.generate(audio_inputs, tgt_lang="cmn", generate_speech=False)
translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
//ASR result : 今天天气真不错
When I set tgt_lang="cmn",the result is correct.The original audio is in Chinese.
But when I set tgt_lang=None,the ASR result is "The weather is really nice today".It has been translated into English!
I think it could have automatically determined the language in the audio
The text was updated successfully, but these errors were encountered: