Skip to content

Enhanced version of carykh's lazykh project

License

Notifications You must be signed in to change notification settings

gamingwithevets/lazykh

 
 

Repository files navigation

This is an enhanced version of Cary Huang (carykh)'s automatic lip-syncing project described in these videos by him:

2020: https://www.youtube.com/watch?v=y3B8YqeLCpY

2022 (includes walkthrough tutorial): https://www.youtube.com/watch?v=ItVnKlwyWwA

Creating a video using this lazykh code is a 5-step process, made even easier by my Python knowledge. (note: it takes long)

2022/07/31 UPDATE: Cary has added frame-caching to the video drawer! I have re-implemented my quality-of-life features into this new, revised video drawer. <3

2022/11/14 UPDATE: Reuploaded the modified code to the repo since GitHub keeps complaining about the repo not being "up-to-date"...

New features

Of course, I have to list all the new features of this enhanced lazykh code!

Better usage of the argparse module
Cary appears to not really know how to use the argparse module (e.g. an unnecessary --input_file argument). That's why I fixed it for him! After all, I've had Python projects using this module before.

Messages and error handling
When coding, the text is always most important in my opinion. That's why I've added messages during runtime, as well as error handling (which is simply printing "an error occurred" and the traceback, then exiting).

File placement changes and path checks
To explain this easier, let's say I'm on Step 3 (see below). I have the text script in one location, and the Gentle JSON data in another location. Unfortunately, I have to move them to the same location and make them both have the same name. I even have to change the extensions of the text script and the JSON data to .txt and .json respectively if they aren't already. If I don't, the script will fail, as it expects both files to have the same name, with one having a TXT extension, and one having a JSON extension. But that's a literal waste of time! Why do I have to do that? Which is why I've now made it so that it now needs both files' locations. This makes it way easier and saves time.
I also implemented file/folder path checking. If the specified path to a file/folder doesn't exist the script will now tell you so instead of throwing an error.
tl;dr: I made it so that each script needs the path to the files it needs and implemented file/folder path checks.

More options
Each script now has a -o / --output argument, which lets you specify where the generated file/folder's path will be. If not specified then the script saves like Cary's original code.

More VIDEO options!
For the video drawer script, I've added two new options: -r / --frames to specify the number of frames that will be generated (the program calculates frames on its own if not specified), as well as -f / --fps to specify the frame rate. Because I prefer 60 FPS over 30 FPS.
For the video finisher script, once again, I've added the frame rate specification argument, but also the -a / --audio-track argument, which lets you specify the audio track for the video. This is just to allow the final video to have no audio.

Preparation

First of all, make sure you have Python installed. No, there aren't any binaries (executables) for this repo. You're going to need some programming experience if you're just an average user who found this cool.

For Step 2 (see below), Gentle (the phoneme detection library used) will only work on *NIX-based operating systems like macOS and Linux (you can use WSL to get it to run on Windows). The remaining steps can be done on any type of operating system (with Python, of course).

After Python is installed (or if it's already installed), let's install Gentle! You should run these commands within the lazykh folder like Cary did.
First, install the Twisted module: pip install Twisted
Then, clone the Gentle GitHub repo: git clone https://github.com/lowerquality/gentle.git

Installing Gentle (for me, at least, and probably for non-Mac users) isn't an easy task. You have to first cd into the Gentle repo folder (that's obvious).
Then, Cary instructs you to run the install.sh script. Unfortunately, for Linux and WSL users (?), this does not work. If it does not work, you have to open the script with a text editor, then execute each individual command inside, running a command with sudo if needed.

Here comes the long part: installing Kaldi. Once you hit the line that says (cd ext && ./install_kaldi.sh), cd into the ext folder. Now open the install_kaldi.sh script in a text editor, and, again, execute each command inside.
But before running the first make instruction (make clean inside the tools folder of the Kaldi repo), run extras/check_dependencies.sh from the tools folder (don't use cd!). This will give you a list of dependencies you need to install Kaldi.
After installing everything, run that script again, and if it says OK, it's time to build! Place yourself in the tools folder again (if you changed to the extras folder). Before running make, I highly recommend you to run touch python/.use_default_python, unless you don't mind Kaldi telling you Python 2 is not the default Python.
OK! Time to make. Run make clean && make. This will take long, so go watch YouTube or play some games to pass the time.
After it's done, continue executing each command from install_kaldi.sh, now starting from after the make clean and make command. When you reach the make depend command, go do something else to pass the time.
Now you have installed Kaldi.

After installing Kaldi, go back to the install.sh script from Gentle and continue executing commands from after the (cd ext && ./install_kaldi.sh) line.
Then you will run another set of make commands. Run both in one line to save time running the next, since you could never really know when one make command will finish to run the next one.

After you're done, hurray! You have installed Gentle.

Now put the Gentle repo folder in the lazykh repo folder if you haven't. Name the folder whatever you want -- for the example below I'll be using the folder name gentle.

The script

Here's Cary's word (modified for easier reading):

The script is a .txt file that tells Gentle what words you're actually saying, and it should match your spoken audio as close as possible, if not perfectly. Read through example/ev.txt for an example.

You'll notice that occasionally, the text in ev.txt are synonym words to what I actually say. For example, instead of "Cary-ness" (which is what I say), ev.txt contains the words "caring es". Simiarly, instead of "Minecraft", it has "Mine craft". This is because Gentle's dictionary only contains common words. If you include a word in the script that Gentle doesn't know, the stick figure will just not lip-sync that word at all, which isn't ideal. The janky solution is to type common words that produce the same mouth shapes as the uncommon word, to get the same desired effect. For example, I might say the word "Amogus" in the audio, but since that word is so recent, Gentle doesn't know it. I might type "Um hoe cuss" as a substitute, and hope Gentle can connect the dots. To find better substitutes, it's helpful to know which phonemes use the same letters (F/V, B/P/M, K/D/G/J/N/S/T/Z, L/Y/H, etc.).

Anything in triangle brackets is an emotion that is not verbally said. There are only 6 permitted emotions:

explain,happy,sad,angry,confused,rq

Example:

<happy> It would be really cool, to see other Minecraft players
playing around with my giant Earth.
<angry> I just hope they don't destroy it like my brother did last time!

Most of these are pretty self-explanatory. explain is a generally positive emotion where the stick figure is giving the audience information, but not in an over-the-top happy way. rq stands for "rhetorical question", and will give the stick figure a shrug-like pose for questions like "But what is gnocchi anyway?" that are answered directly afterward. When you denote an emotion, the stick figure will become that emotion at that part of the script, and will retain that emotion until the next emotion marker (whether that's 1 line away or 100 lines away).

Square brackets denote the "topic" of a line. These are integrated into spoken lines, so they should be spoken in the audio file, too. If a line doesn't have any square brackets, this program will assume the entire line is the "topic". In the below example, "tarantula" is spoken and it's the topic. explain is not spoken.

<explain> Despite bring over 3 inches in length, the [tarantula] is not large enough to have a measurable gravitational pull on the Sun.

Square brackets are not necessary, and including them or not doesn't affect the timing of the video at all. The only purpose they have is for drawing billboards. When you draw the billboard for the above line, it will be called "tarantula.png", and there will be a subtitle under the image that says "Tarantula". This is useful because if there is a another line 5 minutes later that also uses the word "tarantula", you can indicate that [tarantula] is again the topic, so you can reuse the same billboard image. If you don't care too much about the billboards, you can ignore including any square brackets at all.

Single line breaks indicate a change in pose without the emotion changing. There are 5 poses within each emotion category (e.g., 5 angry poses), so it's posibble to see the stick figure move his limbs around while still saying the same emotion. If you enable billboards, single line breaks also indiciate a billboard change.

Double line breaks change the background image and flip the entire screen (so if the stick figure was on the left side of the screen, now he's on the right). This is to make the video feel like it's distinctly in a "new section" of discussion. However, double line breaks are never necessary.

Creating an actual lazykh video

The lazykh repo comes with an example audio and text file you can use, provided by Cary. They are located in the exampleVideo folder.
You can run each script shown here with the -h argument for help. Replace the tags with your files' location.

All runtimes shown here are estimates based on Cary's MacBook he uses, therefore the runtimes may be different depending on the speed of your computer!

Step 1 - Remove the annotations from the script to make it "Gentle-friendly" (Runtime: instant)
This command will create another text file with an appended _g in the same folder as the specified text script.

python3 code/gentleScriptWriter.py <text-file>

Step 2 - Calculate phoneme timestamps with Gentle (Runtime: 2 min. for a 5 min. video)
This command will create a file with Gentle JSON data. Make sure to specify the "Gentle-friendly" text file!

python3 gentle/align.py <audio-file> <gentle-friendly-text-file> -o <json-file>

Step 3 - Create a simplified timetable (Runtime: 2 seconds for a 5 min. video)
This script will create a CSV file in the same folder as the specified text script. According to Cary, it's solely Gentle code and not his.

python3 code/scheduler.py <text-file> <json-file>

Step 4 - Render the frames (Runtime: 12 min. for a 5 min. video)
This script will create 1080p, 30 FPS image files in a folder in the same directory as the text script. These are only the minimal arguments; run the script with the -h argument to see all of them.

python3 code/videoDrawer.py <text-file> <csv-file>

Step 5 - Convert the image sequence to a video and add audio (Runtime: 8 minutes for a 5-min video)
This script will create video.mp4 in the directory your terminal is currently in and delete all the image files.

python3 code/videoFinisher.py <frame-directory> -a <audio-file>

You can watch an example here (edited for your viewing pleasure).

Adding billboards

Running the 5 standard commands listed above will give you a video with no synchronized billboards to the side of the avatar-speaker-talking-guy. If you want to include billboard drawings, do these steps after step 3:

Step 3.1
Run this command to launch a Pygame applet that lets the user draw really crappy scribbles for each line of the script.

python3 code/humanImager.py <text-file>

When the applet gives you a line, you have 30 seconds to draw it in the given zone. You can hit SPACE to advance to the next line early. Also, you can hit ESCAPE to exit if you mess up, and then run the code again. (It will save all the billboards you've finished.)

There will be a folder in the same directory as the specified text script that contains all the billboard files. Feel free to swap them out with any other image, like legitimate artwork, or something from the internet.

Step 4-alt
Run the same command as Step 4, but be sure to add the -b / --billboards argument along with the billboards folder.

python3 code/videoDrawer.py <text-file> <csv-file> -b <billboards-folder>

About

Enhanced version of carykh's lazykh project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%