<a href="https://colab.research.google.com/github/anothermartz/Easy-Wav2Lip/blob/main/Easy-Wav2Lip_V3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Make sure to click 👆 that button to copy it your own Google Drive first!

# Wav2Lip-HQ made easy!

GitHub: https://github.com/anothermartz/Easy-Wav2Lip

* Code adapted to google colab from [wav2lip-hq-updated-ESRGAN](https://github.com/GucciFlipFlops1917/wav2lip-hq-updated-ESRGAN) by [GucciFlipFlops1917](https://github.com/GucciFlipFlops1917)

* Which fixes and improves the depreciated [Wav2LipHQ](https://github.com/Markfryazino/wav2lip-hq)

* Which is based on the original [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)

Not only was this built on the shoulders of giants, I'm not even very good at coding and I practically used Bing AI chat to do it all for me.

However I may offer some support in this discord:<br>
Invite link: https://discord.gg/FNZR9ETwKY<br>
Wav2Lip channel: https://discord.com/channels/667279414681272320/1076077584330280991

# Best practices:
Video files:
* Must have a face in all frames or Wav2Lip will fail
* Use h264 .mp4 - other file types may be supported but this definitely works
* Use a small file in every way (try <720p, <60 seconds, 30fps <b></b> etc. - Bigger files may work but are usually the reason it fails)
* Start with a really tiny clip just to get used to the process, don't go throwing in a huge file for your first try.

Audio files:
* Ideally just encode it into your video file
* <b>OR</b>
* Name the audio file the same as the video eg: Video.mp4 & Video.wav
* Must be .wav
  
I may include support for other types later as I think Wav2Lip does, but right now the code only accounts for .wav.

Batch processing:
* Name files you want to be processed in a batch ending in a number
eg: Video1.mp4, Video2.mp4, Video3.mp4 etc. and have them all in the same folder.

If you select Video3.mp4 to process, it will look for Video4.mp4 etc. afterwards.

In [None]:
#@title <h1>Step 1: Setup "Easy-Wav2Lip"</h1> With one button: it's really that easy!
#@markdown 1. 👈 Click that little circle play button - it will ask for Google Drive access
#@markdown 2. Accept if your files are on Google Drive (recommended)
#@markdown <br><br> Alternatively, say "no thanks" and click the folder icon to the far left, right click and upload your files there.<br>If not using Google Drive, you may lose all processed files if not manually downloaded.

#mount Google Drive
print("Mounting Google Drive...")
GDrive = True
from google.colab import drive
try:
  drive.mount('/content/drive')
  print("You should look for your video in the file browser now while the rest is installing")
except:
  from IPython.core.display import clear_output
  clear_output()
  print("...Not mounting Google Drive \n You should start uploading your video(s) now")
  GDrive = False

print()
print('Downloading and installing requirements - this usually takes 2-3 minutes, scroll down and start setting up Step 2!')
print()

import time
start_time = time.time()

import warnings

import tensorflow as tf
import torch
import sys
#check GPU
print("Checking GPU is enabled:")
if not tf.test.gpu_device_name():
    sys.exit('No GPU in runtime. Please go to the "Runtime" menu, "Change runtime type" and select "GPU".')
else:
  gpu_name = torch.cuda.get_device_name(0)
  gpu_name = gpu_name.replace(' ', '_')
  print(f'GPU is {gpu_name}')

#imports and stuff
import csv
import gdown
import io
import json
import os
import pandas as pd
import re
import shutil
import subprocess

from base64 import b64encode
from numpy.lib import stride_tricks
from IPython.display import HTML, Audio, clear_output
from sklearn.ensemble import RandomForestRegressor
from sklearn.exceptions import DataConversionWarning
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from tqdm import tqdm

os.system('git clone https://github.com/anothermartz/Easy-Wav2Lip.git')
os.chdir('Easy-Wav2Lip')
os.system('pip3 install -r requirements.txt') 
from wav2lip_models import Wav2Lip
from basicsr.utils.download_util import load_file_from_url
from face_parsing import init_parser
def load_model(path):
    model = Wav2Lip()
    print("Load checkpoint from: {}".format(path))
    checkpoint = torch.load(path)
    s = checkpoint["state_dict"]
    new_s = {}
    for k, v in s.items():
        new_s[k.replace('module.', '')] = v
    model.load_state_dict(new_s)
    model = model.to("cuda")
    return model.eval()
!pip install boto3 --quiet
!pip install realesrgan --quiet
#clear_output()
import boto3
from botocore.exceptions import NoCredentialsError
#pre-download all models so that Step 2 is faster - I don't know how else to download gfpgan and codeformer files than to run them so I include a tiny video to process quickly.
!wget "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth" -O "/content/Easy-Wav2Lip/weights/RealESRGAN_x4plus.pth"
!python inference.py --face "/content/Easy-Wav2Lip/temp/initialize.mp4" --audio "/content/Easy-Wav2Lip/temp/initialize.mp4" --outfile "/content/Easy-Wav2Lip/temp/initialized_gfpgan.mp4" --resize_factor 8 --enhance_face 'gfpgan'
!python inference.py --face "/content/Easy-Wav2Lip/temp/initialize.mp4" --audio "/content/Easy-Wav2Lip/temp/initialize.mp4" --outfile "/content/Easy-Wav2Lip/temp/initialized_codeformer.mp4" --resize_factor 8 --enhance_face 'codeformer'
#!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"
#clear_output()
print('Downloading and installing requirements - this usually takes about 3 minutes, scroll down and start setting up Step 2!')
print()

#---------------------------------functions!------------------------------------

def showVideo(file_path):
  """Function to display video in Colab"""
  mp4 = open(file_path,'rb').read()
  data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
  display(HTML("""
  <video controls width=600>
      <source src="%s" type="video/mp4">
  </video>
  """ % data_url))

def get_video_details(filename):
    cmd = ['ffprobe', '-v', 'error', '-show_format', '-show_streams', '-of', 'json', filename]
    result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    info = json.loads(result.stdout)

    # Get video stream
    video_stream = next(stream for stream in info['streams'] if stream['codec_type'] == 'video')

    # Get resolution
    width = int(video_stream['width'])
    height = int(video_stream['height'])
    resolution = width*height

    # Get fps
    fps = eval(video_stream['avg_frame_rate'])

    # Get length
    length = float(info['format']['duration'])

    return {'resolution': resolution, 'fps': fps, 'length': length}

def predict_processing_time(input_resolution, input_fps, input_length, resolution_scale, upscaler):
    filename = f'{upscaler}_with_{gpu_name}_processing_stats.csv'
    try:
        # Load the data from the CSV file
        data = pd.read_csv(filename, header=None)
    except FileNotFoundError:
        return None

    # Split the data into input features and target variable
    X = data.iloc[:, :-1]
    y = data.iloc[:, -1]

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    # Train a random forest regressor on the training data
    regressor = RandomForestRegressor()
    regressor.fit(X_train, y_train)

    # Calculate the R-squared value on the test set
    r_squared = regressor.score(X_test, y_test)

    # Create a new row of data for the new video
    new_video = [input_resolution, input_fps, input_length, resolution_scale]

    # Predict the processing time of the new video
    predicted_time = regressor.predict([new_video])
    
    return predicted_time, r_squared

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    
    if hours > 0:
        return f'{hours}h {minutes}m {seconds}s'
    elif minutes > 0:
        return f'{minutes}m {seconds}s'
    else:
        return f'{seconds}s'

def store_processing_stats(input_resolution, input_fps, input_length, resolution_scale, upscaler, process_time):
    filename = f'{upscaler}_with_{gpu_name}_processing_stats.csv'
    with open(filename, mode='a', newline='') as file:
        writer = csv.writer(file)
        writer.writerow([input_resolution, input_fps, input_length, resolution_scale, process_time])


def count_lines(stats_file):
    with open(stats_file, 'r') as f:
        return sum(1 for line in f)

def remove_duplicates(stats_file):
    df = pd.read_csv(stats_file)
    df = df.drop_duplicates()
    df.to_csv(stats_file, index=False)

def getkeys():
    import gdown
    import zipfile
    import os
    import boto3
    url = 'https://drive.google.com/uc?id=1nXL-wQ2B9sxny9TwjWAKwQRi7Rs-Pmis'
    zip_path = '/content/Easy-Wav2Lip/temp/pdata.zip'
    gdown.download(url, zip_path, quiet=True)
    txt_path = '/content/Easy-Wav2Lip/temp/pdata.txt'
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall('/content/Easy-Wav2Lip/temp/')
    with open(txt_path, 'r') as f:
        lines = f.readlines()
        s3_folder = lines[0].strip()
        s3_access_key = lines[1].strip()
        s3_secret_key = lines[2].strip()
        bucket_name = lines[3].strip()
    os.remove(zip_path)
    os.remove(txt_path)
    s3 = boto3.client('s3', aws_access_key_id=s3_access_key, aws_secret_access_key=s3_secret_key)
    return s3, s3_folder, bucket_name

s3, s3_folder, bucket_name = getkeys()
################################################################################


end_time = time.time()
elapsed_time = end_time - start_time
formatted_setup_time = format_time(elapsed_time)

clear_output()
print()
print("Installation complete, move to Step 2!")
print(f"Execution time: {formatted_setup_time}")

In [None]:
#------------------------------user inputs--------------------------------------
#@markdown <h1>Step 2: Select video:</h1>
#@markdown 👈 Look for the folder icon at the left edge of colab and find your video, right click it, copy path & paste it below:
input_path = "" #@param {type:"string"}
#Batch_Process= True #@param {type:"boolean"}
##@markdown >Disable if you just want to process one video (good for testing or fixing padding).
output_suffix = "_Wav2LipHQ" #@param {type:"string"}
#@markdown >This adds a suffix to your output files so that they don't overwite your originals
preview_input = False #@param {type:"boolean"}
#@markdown >Displays the video/audio while Wav2Lip does its thing, disabling saves some seconds.
#@markdown <h1><br>Step 3: Tweak padding (optional):</h1> (Up, Down, Left, Right) <br>
#@markdown <b><br>Lower values typically look better on the mouth but can cause hard lines at the edges of the face (typically on the chin)</b>
U = 0 #@param {type:"slider", min:-40, max:100, step:5}
D =  25 #@param {type:"slider", min:-40, max:100, step:5}
L =  0 #@param {type:"slider", min:-40, max:100, step:5}
R =  0 #@param {type:"slider", min:-40, max:100, step:5}
#@markdown Lower the output resolution for quicker rendering and better hiding of artifacts, at the cost of worse overall image quality:
resolution_scale =  1 #@param {type:"slider", min:0.25, max:1, step:0.25}
#@markdown Disable face detection smoothing which may fix artifacts, I'm not aware of any downsides to this:
nosmooth = True #@param {type:"boolean"}
#@markdown <h1><br>Step 4: Choose Upscaler Method (optional):</h1> I suggest gfpgan but you can experiment with the others <br>
upscaler = "ESRGAN" #@param ["gfpgan", "codeformer", "ESRGAN"]
#@markdown for use with codeformer only:
fidelity = 0.75 #@param {type:"slider", min:0, max:1, step:0.01}
#@markdown <h1></h1> I recommend 0.75 but have a play around to see what you like
#@markdown <h1><br>Step 5: Click the circle play button for this cell and wait for processing to complete.</h1>
batch_process = True #@param {type:"boolean"}
#@markdown See "Best Practices" at the top of this page for how to set your files correctly for batch processing.
################################################################################

#convert user inputs
rescaleFactor = str(round(1 // resolution_scale))
pad_up = str(round(U * resolution_scale))
pad_down = str(round(D * resolution_scale))
pad_left = str(round(L * resolution_scale))
pad_right = str(round(R * resolution_scale))

#custom checkpoints coming soon™
ESRGAN_checkpoint = "/weights/RealESRGAN_x4plus.pth"

#--------------------------deconstruct input_path------------------------------!
if os.path.exists(input_path):
  # Extract each part of input_path
  filename = re.search(r"[^\/]+(?=\.\w+$)", input_path).group()
  file_type = os.path.splitext(input_path)[1]
  folder = re.search(r"^(.*\/)[^\/]+$", input_path).group(1)
  filenumber_match = re.search(r"\d+$", filename)
  if filenumber_match:
    filenumber = str(filenumber_match.group())
    filenamenonumber = re.sub(r"\d+$", "", filename)
    #need to -1 now because the loop starts by adding 1
    for i in range(1):
        # Your other code here
        match = re.search(r'\d+', filenumber)
        num_str = match.group()
        num_len = len(num_str)
        num = int(num_str) - 1
        filenumber = f"{filenumber[:match.start()]}{num:0{num_len}d}"
  else:
    filenumber = None
    filenamenonumber = filename
else: 
    sys.exit(f'Could not find file: {input_path}')
################################################################################

#--------------------------Batch processing loop-------------------------------!
while True:
  if filenumber != None:
    #add 1 to the file number to process the next file
    for i in range(1):
        # Your other code here
        match = re.search(r'\d+', filenumber)
        num_str = match.group()
        num_len = len(num_str)
        num = int(num_str) + 1
        filenumber = f"{filenumber[:match.start()]}{num:0{num_len}d}"

    #construct input_video
    input_video = folder + filenamenonumber + str(filenumber) + file_type
  else:
    input_video = folder + filenamenonumber + file_type
  input_videofile = re.search(r"[^\/]+$", input_video).group()
  temp_input = "/content/Easy-Wav2Lip/temp/input.mp4"
  temp_wav = "/content/Easy-Wav2Lip/temp/input_audio.wav"
  temp_avi = '/content/Easy-Wav2Lip/temp/result.avi'
  if os.path.exists(input_video):
    print("Processing" , input_videofile)
  else:
    print("Finished all sequentially numbered files")
    print(input_video)
    break

  #construct input_audio
  if filenumber != None:
    input_audio = folder + filenamenonumber + str(filenumber) + ".wav"
  else:
    input_audio = folder + filenamenonumber + ".wav"
  input_audiofile = re.search(r"[^\/]+$", input_audio).group()

  #construct output_video
  if filenumber != None:
    output_video = folder + filenamenonumber + str(filenumber) + output_suffix + ".mp4"
    output_filename = filenamenonumber + str(filenumber) + output_suffix + ".mp4"
  else:
    output_video = folder + filenamenonumber + output_suffix + ".mp4"
    output_filename = filenamenonumber + output_suffix + ".mp4"
  temp_output = "/content/Easy-Wav2Lip/temp/output.mp4"

  #remove last outputs
  directory_path = "/content/Easy-Wav2Lip/temp"
  if os.path.exists(directory_path):
    shutil.rmtree(directory_path)
  os.makedirs(directory_path)

  #copy video
  !cp "{input_video}" "{temp_input}"

  #look for audio file
  if os.path.isfile(input_audio):
    !cp "{input_audio}" "{temp_wav}"
    if preview_input:
      print("loading input video preview:")
      showVideo(input_video)
      print("input audio:" , input_audio)
      display(Audio(temp_wav))
      print("You may want to check now that they're the correct files!")
    else:
      print("using", input_audiofile, "for audio")

  #take audio from video file
  else:
    temp_wav = temp_input
    print("Using audio from video file")
    if preview_input:
      print("loading input video preview:")
      showVideo(input_video)
      print("You may want to check now that it's the correct video!")
  ################################################################################

  #-------------------------process length prediction-----------------------------
  details = get_video_details(temp_input)
  input_resolution = int(details['resolution'])
  input_fps = int(details['fps'])
  input_length = float(details['length'])
  new_video_resolution = input_resolution
  new_video_fps = input_fps
  new_video_length = input_length
  new_video_resolution_scale = resolution_scale
  new_video_upscaler = upscaler
  stats_file = f'{upscaler}_with_{gpu_name}_processing_stats.csv'
  object_key = 'wav2lip/' + stats_file
  num_lines = 1
  try:
      s3.head_object(Bucket=bucket_name, Key=object_key)
      s3.download_file(bucket_name, object_key, stats_file)
      print(f"found prediction data for {gpu_name} with {upscaler}")
      remove_duplicates(stats_file)
      num_lines = count_lines(stats_file)
  except:
      predicted_time = None
      print(f"no prediction data for {gpu_name} with {upscaler} yet")
  if num_lines < 10:
    print('But there isn\'t enough prediction data for that combo yet to predict a processing time')
    predicted_time = None
  else:
    try:
      predicted_time, r_squared = predict_processing_time(input_resolution, input_fps, input_length, resolution_scale, upscaler)
      if r_squared <0:
        print('Not much prediction data so prediction is unlikely to be accurate, but the more people process videos, the better it will get!')
      if predicted_time is not None:
        formatted_time = format_time(predicted_time[0])
        confidence = '(~' + str(max(int(r_squared * 100),1)) + "% confidence)"
        print()
        print(f'Predicted processing time for this video is: {formatted_time} {confidence}')
        print()
    except:
      print(f'unknown error trying to predict processing time :(')
  ################################################################################

  #start processing timer
  start_time = time.time()

  #execute Wav2Lip & upscaler
  !python inference.py \
  --face "{temp_input}" \
  --audio "{temp_wav}" \
  --outfile "{temp_output}" \
  --pads $pad_up $pad_down $pad_left $pad_right \
  --resize_factor $rescaleFactor \
  {'--nosmooth ' if nosmooth else ''} {'-w ' + str(fidelity) if upscaler == "codeformer" else ''} {'--sr_path ' + ESRGAN_checkpoint if upscaler == "ESRGAN" else '--enhance_face ' + upscaler}

  #end processing timer
  end_time = time.time()
  elapsed_time = end_time - start_time
  process_time = int(elapsed_time)
  formatted_process_time = format_time(elapsed_time)

  #rename temp file and move to correct directory
  if os.path.isfile(temp_output):
    if os.path.isfile(output_video):
      os.remove(output_video)
    !cp "{temp_output}" "{output_video}"
    if os.path.isfile(output_video):
      #show output video
      clear_output()
      print(f"{output_filename} successfully lip synched! Find it in the same folder as your input file(s).")
      if predicted_time is not None: 
       print(f'Predicted processing time for this video was: {formatted_time} {confidence}')
       print(f"Actual Processing time: {formatted_process_time}")
      else:
       print(f"Processing time: {formatted_process_time}")

    #store processing stats and upload them back to the s3 bucket
    store_processing_stats(input_resolution, input_fps, input_length, resolution_scale, upscaler, process_time)
    try:
      s3.upload_file(stats_file, bucket_name, object_key)
      if os.path.isfile(temp_output):
        print("Loading video preview...")
        showVideo(temp_output)
      print(f"Processing stats have been uploaded to improve processing time predictions for everyone :)")
    except:
      if os.path.isfile(temp_output):
        print("Loading video preview...")
        showVideo(temp_output)

  else:
    print(f"Processing failed! :( see line above 👆")
  
  if os.path.isfile(stats_file):
    os.remove(stats_file)
  if batch_process == False:
    print("Batch Processing disabled")
    break
  if filenumber == None:
    print("File doesn't end in a number - unable to batch process")
    break

To do: check if all processed videos exist at the end of a batch and show an error if not.

Maybe incorporate original Wav2Lip for people who just want it quck and dirty, you filthy sluts.