Default test files are generating repeating text / hallucinating #9

greenman2 · 2023-04-21T00:10:47Z

I used the default settings on the Kaggle notebook.

https://huggingface.co/datasets/sanchit-gandhi/whisper-jax-test-files

sanchit-gandhi · 2023-04-21T11:40:15Z

Looks like a hallucination (known issue of the Whisper model) - worth trying with timestamps

sanchit-gandhi · 2023-04-24T15:08:46Z

We could implement a logits processor that stops repetition of repeated tokens (similar to what we have in PyTorch: https://github.com/huggingface/transformers/blob/04ab5605fbb4ef207b10bf2772d88c53fc242e83/src/transformers/generation/logits_process.py#L474)

This might be quite difficult to do in a JAX compiled way though without returning the value of each token generated

rpatel15-hue · 2023-04-25T15:29:07Z

See this issue can be improved by reducing the --compression_ratio_threshold from default 2.4. openai/whisper#192
How can I modify this with this library?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default test files are generating repeating text / hallucinating #9

Default test files are generating repeating text / hallucinating #9

greenman2 commented Apr 21, 2023

sanchit-gandhi commented Apr 21, 2023

sanchit-gandhi commented Apr 24, 2023

rpatel15-hue commented Apr 25, 2023

Default test files are generating repeating text / hallucinating #9

Default test files are generating repeating text / hallucinating #9

Comments

greenman2 commented Apr 21, 2023

sanchit-gandhi commented Apr 21, 2023

sanchit-gandhi commented Apr 24, 2023

rpatel15-hue commented Apr 25, 2023