See lazysys for the revamped version.
A command-line tool to convert long-form videos into multiple short-form videos, with burned-in text and subtitles. It also cuts out unwanted silence.
original.mp4
corrected.mp4
As you can see, in Hungarian the medium
model works quite well: considering the bad quality of my input.
The large
model could be even better: if you have the hardware. :)
whisper.mp4
See lazyshorts -h
I use Whisper to transcribe audible voices to text.
Obviously with non-english languages the accuracy can be lower: you can help that by...
- ...using a different (for now, only Whisper) model (be wary, the
medium
model is hard to run even with 8GBs of RAM.) - ...editing subtitles manually from segment to segment. (
{lazyshorts-py} e1 2 45 78
...)
- I don't know if running Whisper on GPU works, you could try CUDA. See
--whisper_device
and PyTorch/Whisper documentation. Also, get the CUDA enabled PyTorch as I define the CPU one in therequirements.txt
.
- 'subprocess.run' somehow blocks UI process.
- We could use
rich
to have nice progress bars, as currently you have to manually poll the status of the renders. - Cropping is just arbitrary: I wanted to use MediaPipe. It's not easy to even get it to run, but my resources were not enough. Maybe a less demanding model or cloud is needed?
- Don't combine segments that are less than
end_time
, you'll get an exception.