Skip to content

jaxgeller/yt-karaoke

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YT-Karaoke

This CLI uses Demucs and Whisperx to extract vocals from a YouTube video, transcribe the lyrics on a per-character level, and then generate a karaoke video with real-time subtitles. For more details see this blog post.

all-star.mp4

Demo of "All Star" generated using the highest settings.

Installation

To get started, you'll need Python 3.8+ and ffmpeg installed. Then, install the CLI with pip

pip install git+https://github.com/jaxgeller/yt-karaoke.git

Or you can checkout the notebook

Open In Colab

Usage

To use the CLI, simply run yt_karaoke with a YouTube URL as the first argument. This may take awhile

yt_karaoke "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

To run with the highest possible accuracy, use

yt_karaoke "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --device cuda --model large --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --shifts 10

Results

Using this notebook, I've generated karaoke videos for the following songs. The results are pretty good, but there's still a lot of room for improvement. If you have any suggestions, please open an issue or PR.

  • All Star - Smash Mouth: Great transcription, separation, and character alignment
  • Buddy Holly - Weezer: Good transcription, but not great in some of the verses. Does pick up on the "Woo-hoos" in the chorus. Character alignment is solid.
  • God's Plan - Drake: Not the best instrumental separation by Demucs or transcription by Whisper, but adlibs are picked up correctly.
  • Parked by the Lake - Dean Summerwind: Good transcription and alignment, but WhisperX might have a bug in aligning numeric characters. Lookout for when "80 miles" shows up in video.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published