Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription stopped halfway #36

Closed
nhan000 opened this issue Jul 14, 2023 · 8 comments
Closed

Transcription stopped halfway #36

nhan000 opened this issue Jul 14, 2023 · 8 comments

Comments

@nhan000
Copy link

nhan000 commented Jul 14, 2023

I downloaded this 27 min Youtube video (uploaded it here).

I run the transcription using this code
whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp4" --language en --model large-v2 --batch_recursive true

and it stopped at [13:15.860 --> 13:18.860] His greatest achievement was just around the corner.

I downloaded the mp3 file from that YouTube video (uploaded it here)
whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp3" --language en --model large-v2 --batch_recursive true

and it was able to run to [26:44.760 --> 26:46.180] might have been enough.

Interestingly, it didn't transcribe the advertisement at the beginning and at the end of the video.

@Purfview
Copy link
Owner

Purfview commented Jul 14, 2023

Check if .srt subtitle file is created. [when you think that it's "stopped"]

@nhan000
Copy link
Author

nhan000 commented Jul 14, 2023

The srt file was created and the later half was missing, same as the timestamp in the command prompt.

@Purfview
Copy link
Owner

Purfview commented Jul 14, 2023

Do you run it on cuda? If yes then try --compute_type=int8 parameter.

@nhan000
Copy link
Author

nhan000 commented Jul 14, 2023

I added the parameter you gave me, so the code is

whisper-faster "C:\Users\ntnha\Videos\4K Video Downloader\Carl Sagan Astronomer of the People.mp4" --language en --model large-v2 --batch_recursive true --compute_type=int8

It ran on cuda
image

And it still stopped at the same location

image

@Purfview
Copy link
Owner

I reproduced this issue on my side. Later I'll check what can be done about it.
Interestingly, this hallucination starts on the advertisement.

@nhan000
Copy link
Author

nhan000 commented Jul 14, 2023

Thanks for looking into this, and separately, thanks for making this program. Very noob-friendly for people who are not very techy like me.

The video has 3 advertisement segments:

  • One at the beginning that Whisper Standalone doesn't transcribe for both mp4 and mp3 files.
  • One at the middle (13:19) that it transcribes in the mp3 file but stopped for the mp4 file.
  • One at the end (26:46) that it also doesn't transcribe in the mp3 file.

@Purfview
Copy link
Owner

It doesn't stuck with -beam_size=5 option.

Ad at start/end is still ignored, probably models are trained to ignore that ad. Btw tiny and base models transcribe that ad.

@nhan000
Copy link
Author

nhan000 commented Jul 16, 2023

Thanks a lot! I will keep the beam size parameter in mind and change it around when I ran into issues.

@nhan000 nhan000 closed this as completed Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants