Skip to content

skeskinen/transcribing-video-cutter

Repository files navigation

Logo

Transcribing Video Cutter

Software to cut videos, podcasts, etc. with some local AI features. Currently does transcripting, with denoising (a.k.a. studio sound) coming later.

Aspiring to be simple to use. Currently work in progress and pretty rough.

Youtube demo

Features

  • Fast video cutting. Frame accurate cuts without recoding.
  • Transcribing with whisper (faster-whisper). Detects speech segments with Silero VAD.
  • Transcriptions are done separately for each track. Visual display of which audio track is speaking. Also handles multiple tracks speaking simultaniously.
  • Audio mixing of multiple tracks. Add external tracks from audio files.

Issues and disclaimer

Warning: this is a pre-release version. There are still many limitations and pitfalls.

  • Some audio combinations won't work. E.g. all tracks need to have the same sample rate. Some channel layout combinations are not supported. #1
  • No packaging, you need to setup the environment yourself. Python knowledge will help. requirements.txt will help as well. Also requires Qt6 and probably some other system libraries. #7
  • CUDA might work you'll need to work for it (I have an AMD card and ROCm doesn't work for me). Currently everything is done on the CPU. #9
  • HEVC / h265 is not really supported. #8

Usage

Some actions can currently only be done with keyboard shortcuts

  • P - Play / Pause
  • C - Center timeline to selection
  • X - Cut Selection, Ctrl + X: Uncut
  • L / K, seek one frame forwards / backwards
  • Ctrl + A: Select the whole timeline

Usage of the timeline:

  • Left click the top part to seek
  • Click-and-drag with left click to select
  • Click-and-drag with right click to pan
  • Use mouse wheel to zoom in and out

Screenshots

Transcribes different audio tracks separately. Here one track is the original English audio and the other one is a Spanish dub. sintel

Handles pure audio files as well. Here the 3 different tracks are loaded from 3 different .mp3 files. Each is 90min for total of 270mins of audio. The speech segments are available before the whole transcript is done so you can start editing out silent parts, etc. lgc

About

GUI program for cutting videos / audio with AI features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages