Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect translation while using demo app from the example #237

Open
iamshreeram opened this issue Dec 3, 2023 · 7 comments
Open

Incorrect translation while using demo app from the example #237

iamshreeram opened this issue Dec 3, 2023 · 7 comments

Comments

@iamshreeram
Copy link

I'm utilizing the demo app on my Mac M1 to try the model and the translation is not accurate. When attempting to translate the audio from the provided example, the model generates predictions, but the translated content is entirely unrelated. Below is the snapshot after translating the audio to spanish -

image

I'm seeking clarity on whether this behavior is expected or if there might be an underlying issue causing this discrepancy.

@iamshreeram
Copy link
Author

The illustration shown on the webpage for the translation demo appears to excessively exaggerate reality. In actuality, the translation does not seem to function as anticipated, wouldn't you agree?

Sources : Source 1, Source 2

Additionally, it would be beneficial if you could include some examples for running seamless streaming.

@cocktailpeanut
Copy link

cocktailpeanut commented Dec 4, 2023

I thought I was the only one having this issue, why aren't enough people talking about this?

Just like @iamshreeram I get some random text that has nothing to do with the original content. My input was an audio clip of 40 seconds, the result was some random sentence (like 3 words) that has nothing to do with the original content.

I first thought this was my fault, thought maybe I messed up something with installation or messed up with running the code in a specific way.

But I can confirm the exact same problem happens on the Huggingface space deployed by Meta. You can try yourself here: https://huggingface.co/spaces/facebook/seamless-m4t-v2-large If even the official demo deployed by the team is not working, I'm sure it's not our fault. There's something seriously wrong with the demo.

ps.

For the record, the broken ones are s2st and s2tt. Seems there's something wrong with audio process. The t2st and the t2tt ones work fine.

@iamshreeram
Copy link
Author

@Vaibhavs10 / @ggerganov , I would appreciate it if you could investigate this issue. The primary features of the model provided as an example do not appear to be functioning correctly, and there is a possibility that many similar issues may be reported.

@rodriheck
Copy link

I think I am facing the same problem. I was the whole day trying to debug it and even left a comment at their space:
https://huggingface.co/spaces/facebook/seamless-streaming/discussions/17

@cocktailpeanut
Copy link

can the devs do something?...

@ylacombe
Copy link
Contributor

ylacombe commented Dec 20, 2023

Hey there,
I don't know about the official implementation (cc @cndn and @elbayadm) but it seems to work with transformers when I tried to reproduce your errors:

from transformers import pipeline

pipeline_generator = pipeline(
    "automatic-speech-recognition",
    "facebook/seamless-m4t-v2-large",
    chunk_length_s=30,
    device=3
)

transcript = pipeline_generator("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav",
                                    generate_kwargs={
        "tgt_lang": "spa",   },
        )

Output: 'nosotros el pueblo de los estados unidos en orden de formar una más perfecta unión establecer justicia asegurar tranquilidad doméstica proporcionar para la común defensa'

@iamshreeram
Copy link
Author

@ylacombe , Thank you, I appreciate it, your code snippet is working as intended. However, the question still remains as to why the ui would produce incorrect/irrelevant translations.

Additionally, I have two minor questions:

  1. Is there any method to determine the language through audio or text? (Language recognition based on audio)
  2. When provided with audio, we want the translated audio in a specific language. Is there anyway to provide the task_str as s2st?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants