A web app that, given a song name and its artist, will display the full karaoke experience to sing along too. Complete with the song, lyrics, and a synchronized lyric guide.
Karaoke has 3 main components:
- Music to hear
- Lyrics to read
- A lyric guide in time with the audio
While the first 2 items can be straightforwardly scraped from the Internet with no prep necessary, the lyric guide (also known as the Forced Alignment) requires more heavy lifting. Specifically, it requires the audio and lyrics beforehand, and for a performance increase, the audio should be source separated to just be the vocals.
This directly motivates our design.
- To get the song audio, I used the
youtube-dl
library to scrape Youtube for the video containing the song's audio, and extracted it viaffmpeg
. - To get the lyrics, I used the
requests
andbeautifulsoup4
libraries to scrape Google Search results, since they conveniently serve the raw lyrics. - For improved alignment, I passed the song into a source separator. I chose to use
Spleeter by Deezer
since they provided an easy to use Python API. - To get the alignment, I passed in the lyrics and separated vocals into a forced aligner. I chose to use the
Gentle Aligner
in the form of a Docker container which functions as a REST API. Unfortunately, the official container on Docker Hub is not actively maintained, so I'm using this updated one insteadcnbeining/gentle
-
An HTML form will receive the desired song/artist name and desired mode (for now karaoke is only supported)
-
Javascript uses the Fetch API to query the Flask backend with one POST request at the
/create-alignment
endpoint, creating and storing the relevant files in a directory named after the title of the Youtube video the song was downloaded from (this is for consistency among multiple requests mapping to the same song). For example, say we requested a song which downloaded its audio from a Youtube video called "Example Song". The directory would be organized as follows:- Example Song/ - lyrics.txt - song.wav (used for audio player) - song/ - vocals.wav (used for improved alignment) - accompaniment.wav (used for audio player if desired) - align.json
-
Display the fetched lyrics
-
Fetch the song from the
/audio/Example Song
endpoint and feed it into a player -
Fetch the alignment from the
/alignment/Example Song
endpoint and feed it into an animation pipeline that will highlight the appropriate word(s) in time with the song playing
This entire process takes between 1-2 minutes. The very first run will take a bit longer because it needs to download the pretrained models necessary for the aligner to work.
To keep up to date with modern app/service development practices, I decided to container-ize this with Docker. The details of the built image can be found in Dockerfile
. By itself, the container will do everything except forced alignment of lyrics, which is done by another Docker container. So to make the overall architecture work, we need to Compose these 2 Docker images into one project. The implementation of this can be found in docker-compose.yml
.
Every push to the main
branch triggers an image build via Github Actions to seanfarhat/karaoke:latest
.
-
Download Docker
-
Clone this repository
-
cd
into the directory wheredocker-compose.yml
is -
Run the following code in your terminal
docker compose up
-
The app is now running on
localhost:5000
Alternatively, you can find the Docker image on Dockerhub.
Improve the UI