Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default test files are generating repeating text / hallucinating #9

Open
greenman2 opened this issue Apr 21, 2023 · 3 comments
Open

Comments

@greenman2
Copy link

I used the default settings on the Kaggle notebook.

https://huggingface.co/datasets/sanchit-gandhi/whisper-jax-test-files

i000056

@sanchit-gandhi
Copy link
Owner

Looks like a hallucination (known issue of the Whisper model) - worth trying with timestamps

@sanchit-gandhi
Copy link
Owner

We could implement a logits processor that stops repetition of repeated tokens (similar to what we have in PyTorch: https://github.com/huggingface/transformers/blob/04ab5605fbb4ef207b10bf2772d88c53fc242e83/src/transformers/generation/logits_process.py#L474)

This might be quite difficult to do in a JAX compiled way though without returning the value of each token generated

@rpatel15-hue
Copy link

See this issue can be improved by reducing the --compression_ratio_threshold from default 2.4. openai/whisper#192
How can I modify this with this library?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants