<a href="https://colab.research.google.com/github/nandaluru/LipSyncing/blob/main/LipSyncing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Mount Google Drive**
In this cell, we will mount Google Drive to access files in the Colab environment.


In [38]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# **Clone Wav2Lip Repository from GitHub**

In this cell, we will clone the Wav2Lip repository from GitHub. This repository contains the source code and related files for the Wav2Lip project.


In [39]:
!git clone https://github.com/Rudrabha/Wav2Lip.git

fatal: destination path 'Wav2Lip' already exists and is not an empty directory.


# **Setup Google Drive**

1. **Create Folders:**
    - In Google Drive, make a folder named `LipSync` in MyDrive.
    - Create another folder called `model_weights`.

2. **Upload Files:**
    - Upload `input-video.mp4` and `input-audio.wav` to `LipSync`.
    - Download pretrained model weights from [Wav2Lip GitHub](https://github.com/Rudrabha/Wav2Lip#getting-the-weights) and upload them to `model_weights`.




# **Contents of /content/drive/MyDrive/LipSync**

In this cell, we will list the contents of the `LipSync` directory in your Google Drive.


In [40]:
!ls /content/drive/MyDrive/LipSync

input-audio.wav  input-video.mp4


# **Copy Pretrained Model to Wav2Lip Checkpoints**

In this cell, we copy the pretrained model (`wav2lip_gan.pth`) from the Google Drive folder to the `checkpoints` directory in the Wav2Lip project.


In [41]:
!cp -ri "/content/drive/MyDrive/model_weights/wav2lip_gan.pth" /content/Wav2Lip/checkpoints/

cp: overwrite '/content/Wav2Lip/checkpoints/wav2lip_gan.pth'? y


# **Install Wav2Lip Dependencies**

In this cell, we install the required Python packages listed in the `requirements.txt` file.


In [42]:
!cd Wav2Lip && pip install -r requirements.txt

Collecting librosa==0.7.0 (from -r requirements.txt (line 1))
  Using cached librosa-0.7.0.tar.gz (1.6 MB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting numpy==1.17.1 (from -r requirements.txt (line 2))
  Using cached numpy-1.17.1.zip (6.5 MB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
[31mERROR: Could not find a version that satisfies the requirement opencv-python==4.1.0.25 (from versions: 3.4.0.14, 3.4.10.37, 3.4.11.39, 3.4.11.41, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.15.55, 3.4.16.57, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.4.0.46, 4.5.1.48, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.62, 4.5.5.64, 4.6.0.66, 4.7.0.68, 4.7.0.72, 4.8.0.74, 4.8.0.76, 4.8.1.78, 4.9.0.80)[0m[31m
[0m[31mERROR: No matching distribution found for opencv-python==4.1.0.25[0m[31m
[0m

In [43]:
!pip install librosa==0.8.0



# **Download Face Detection Model (s3fd.pth)**

In this cell, we download the face detection model (`s3fd.pth`) from [Adrian Bulat's website](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) and save it in the appropriate directory within the Wav2Lip project.


In [44]:
!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "Wav2Lip/face_detection/detection/sfd/s3fd.pth"

--2023-12-31 14:33:35--  https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth
Resolving www.adrianbulat.com (www.adrianbulat.com)... 45.136.29.207
Connecting to www.adrianbulat.com (www.adrianbulat.com)|45.136.29.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89843225 (86M) [application/octet-stream]
Saving to: ‘Wav2Lip/face_detection/detection/sfd/s3fd.pth’


2023-12-31 14:33:41 (17.1 MB/s) - ‘Wav2Lip/face_detection/detection/sfd/s3fd.pth’ saved [89843225/89843225]



# **Copy Input Files to Sample Data**

In this cell, we copy the input video (`input-video.mp4`) and input audio (`input-audio.wav`) from the Google Drive directory to the `sample_data` directory.


In [45]:
!cp "/content/drive/MyDrive/LipSync/input-video.mp4" "/content/drive/MyDrive/LipSync/input-audio.wav" sample_data/
!ls sample_data/

anscombe.json		     california_housing_train.csv  input-video.mp4  mnist_train_small.csv
california_housing_test.csv  input-audio.wav		   mnist_test.csv   README.md


# **Run Wav2Lip Inference**

In this cell, we run the Wav2Lip inference script (`inference.py`) with the specified checkpoint path, input video, and audio files. The initial inference is performed without resizing, and later, a resized version is attempted with a specified resize factor.



In [46]:
!cd Wav2Lip && python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face "/content/drive/MyDrive/LipSync/input-video.mp4" --audio "/content/sample_data/input-audio.wav"

Using cpu for inference.
Reading video frames...
Number of frames available for inference: 796
(80, 5386)
Length of mel chunks: 1680
  0% 0/14 [00:00<?, ?it/s]
  0% 0/50 [00:00<?, ?it/s][A^C


# **Variations to try**

- Use more padding to include the chin region.

- Use resize_factor to reduce the video resolution, as there is a change you might get better results for lower resolution videos. Why? Because the model was trained on low resolution faces.

In [47]:
!cd Wav2Lip && python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face "../sample_data/input-video.mp4" --audio "../sample_data/input-audio.wav" --resize_factor 3 --pad 2 5 0 0

Using cpu for inference.
Reading video frames...
Number of frames available for inference: 796
(80, 5386)
Length of mel chunks: 1680
  0% 0/14 [00:00<?, ?it/s]
  0% 0/50 [00:00<?, ?it/s][A
  2% 1/50 [00:22<18:09, 22.24s/it][A
  4% 2/50 [00:43<17:27, 21.83s/it][A
  6% 3/50 [01:06<17:16, 22.06s/it][A
  8% 4/50 [01:27<16:40, 21.76s/it][A
 10% 5/50 [01:49<16:30, 22.01s/it][A
 12% 6/50 [02:11<15:58, 21.77s/it][A
 14% 7/50 [02:33<15:49, 22.09s/it][A
 16% 8/50 [02:56<15:34, 22.25s/it][A
 18% 9/50 [03:20<15:30, 22.69s/it][A
 20% 10/50 [03:41<14:51, 22.28s/it][A
 22% 11/50 [04:04<14:35, 22.44s/it][A
 24% 12/50 [04:25<13:57, 22.04s/it][A
 26% 13/50 [04:48<13:41, 22.21s/it][A
 28% 14/50 [05:09<13:10, 21.95s/it][A
 30% 15/50 [05:31<12:52, 22.06s/it][A
 32% 16/50 [05:53<12:26, 21.96s/it][A
 34% 17/50 [06:15<12:06, 22.01s/it][A
 36% 18/50 [06:37<11:44, 22.01s/it][A
 38% 19/50 [06:59<11:19, 21.91s/it][A
 40% 20/50 [07:22<11:06, 22.21s/it][A
 42% 21/50 [07:43<10:35, 21.90s/it][A
