Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got infinite logprob #15

Closed
tkorchagin opened this issue Jan 20, 2023 · 9 comments
Closed

Got infinite logprob #15

tkorchagin opened this issue Jan 20, 2023 · 9 comments

Comments

@tkorchagin
Copy link

Using this track:
https://drive.google.com/file/d/1-tunB84weKDR_uIA6raGG9vJS1H34YEL/view?usp=share_link

I got this error:
track.wav
93%|█████████▎| 443400/475781 [00:52<00:03, 8523.11frames/s]Got infinite logprob

base model

@Jeronymous
Copy link
Member

Can you please test with the last version? (unless it's already the last version).

If it fails with the last version, can you give me the full set of options you use (in the CLI or in python)? model, language, ...

@tkorchagin
Copy link
Author

python3

import whisper_timestamped as whisper
model = whisper.load_model("base")
audio = whisper.load_audio(audio_path)
result = whisper.transcribe(model, audio, language='ru')

@Jeronymous
Copy link
Member

I was asking the version of this package.
If you used pip to install, you should see it with:

pip freeze | grep whisper

And you should see version 1.5.4 if you're up to date.
If not try:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/Jeronymous/whisper-timestamped

@tkorchagin
Copy link
Author

@Jeronymous I use google colab and it resets virtual machine every time

Still got the same error here commands as you asked:

!pip freeze | grep whisper

openai-whisper @ git+https://github.com/openai/whisper.git@9f7aba609971434b9de2a8d34ca2de766976904d
whisper-timestamped @ git+https://github.com/Jeronymous/whisper-timestamped@826778f91f9dbadbd80b6f86df64e7352b0c9796

@samheutmaker
Copy link

I'm also having the same issue on the latest version.

@Jeronymous
Copy link
Member

Jeronymous commented Jan 22, 2023

Thanks a lot @tkorchagin for your effort to help narrowing this down.
Unfortunately I am not able to reproduce your audio ekaterina_koval 05.01.2023, 17-31.wav with the base model.
Neither on CPU nor GPU (I also tried other model sizes, to be sure).

Maybe @samheutmaker can share the audio and option details to check if I have "more chance" with his case?

@Jeronymous
Copy link
Member

@tkorchagin I pushed a new version, where the assertion failure give more details (the list of logprobs).
I would appreciate if you can re-run the transcription that fails and share the new failure message. Maybe I can see something obvious...

Also, you can use option compute_word_confidence = False in transcribe().
This should prevent the failure to occur (just you won't have word confidence). Then I'm interested in seeing the output you get (sharing the json file would be awesome). To try to understand why I cannot reproduce your issue...

@Jeronymous
Copy link
Member

I'm assuming that it's not occurring on the latest version.
Feel free to re-open (and give details) if it occurs again.

@Rtut654
Copy link

Rtut654 commented Sep 7, 2023

@Jeronymous
I'm getting similar issue once in ~ 30 transcribe function calling. Probably can't share audio since it is user's data. I use default setup, with no custom parameters same as in the example.
Here is the logs:

File "/home/test/rep/rep/lib/python3.8/site-packages/whisper_timestamped/transcribe.py", line 688, in may_flush_segment
    assert min([p.isfinite().item() for p in logprobs]), \
AssertionError: Got infinite logprob among (24) [(286, ' I', -inf), (519, ' think', -5.2494893074035645), (8815, ' television', -16.253812789916992), (815, ' may', -10.85153293
6096191), (362, ' have', -6.786816120147705), (257, ' a', -9.276606559753418), (562, ' when', -inf), (436, ' they', -5.740912437438965), (1401, ' read', -7.83854341506958), (3642, ' books', -12.358419418334961), (11, ',', -6.387020587921143), (50257, '
<|endoftext|>', -11.639397621154785)]

ps. I understand what it means, but suppose it shouldn't ruin the transcribe procedure as it does now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants