<a href="https://colab.research.google.com/github/mucosmo/pythonTutorial/blob/main/Wav2Lip_HQ_pretraining.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wav2Lip-HQ finetuning

**This notebook doesn't cover inference. For running Wav2Lip-HQ, please refer to [this notebook](https://colab.research.google.com/drive/1bwgV-31JLNFTKCVDnJtTbP4brOUV1xaL?usp=sharing).**

Here we describe how to finetune Wav2Lip-HQ super resolution model on your own videos which may sometimes be necessary for obtaining high-quality result. You can find more details in [our GitHub repository](https://github.com/Markfryazino/wav2lip-hq).

## First of all, clone the repository and load all required models.

In [None]:
!git clone https://github.com/Markfryazino/wav2lip-hq.git
%cd wav2lip-hq
!pip3 install gdown
!pip3 install -r requirements.txt

!wget "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" -O "face_detection/detection/sfd/s3fd.pth"

In [None]:
import gdown

urls = {
    "wav2lip_gan.pth": "10Iu05Modfti3pDbxCFPnofmfVlbkvrCm",
    "face_segmentation.pth": "154JgKpzCPW82qINcVieuPH3fZ2e0P812",
    "esrgan_yunying.pth": "1aB-jqBikcZPJnFrJXWUEpvF2RFCuerSe",
    "pretrained.state": "1_MGeOLdARWHylC1PCU2p5_FQztD4Bo7B"
}

for name, id in urls.items():
    url = f"https://drive.google.com/uc?id={id}"
    output = f"checkpoints/{name}"
    gdown.download(url, output, quiet=False)
    print(f"Loaded {name}")

Downloading...
From: https://drive.google.com/uc?id=10Iu05Modfti3pDbxCFPnofmfVlbkvrCm
To: /content/wav2lip-hq/checkpoints/wav2lip_gan.pth
436MB [00:04, 104MB/s] 


Loaded wav2lip_gan.pth


Downloading...
From: https://drive.google.com/uc?id=154JgKpzCPW82qINcVieuPH3fZ2e0P812
To: /content/wav2lip-hq/checkpoints/face_segmentation.pth
53.3MB [00:00, 249MB/s]


Loaded face_segmentation.pth


Downloading...
From: https://drive.google.com/uc?id=1aB-jqBikcZPJnFrJXWUEpvF2RFCuerSe
To: /content/wav2lip-hq/checkpoints/esrgan_yunying.pth
67.0MB [00:00, 93.1MB/s]


Loaded esrgan_yunying.pth


Downloading...
From: https://drive.google.com/uc?id=1_MGeOLdARWHylC1PCU2p5_FQztD4Bo7B
To: /content/wav2lip-hq/checkpoints/pretrained.state
311MB [00:04, 72.1MB/s]


Loaded pretrained.state


## Now upload target video.

You can just upload via Google Colab interface or load from Google Drive, which can be more quick.

In [None]:
# If you load files from Drive, run this cell

# Paste the filename and Google Drive ID of your video below.
urls = {
    "yunying_30s.mp4": "1dggydm07RHrxiFUIH_51RXmkMcD_bMPE",
}

for name, id in urls.items():
    url = f"https://drive.google.com/uc?id={id}"
    output = f"videos/{name}"
    gdown.download(url, output, quiet=False)
    print(f"Loaded {name}")

Downloading...
From: https://drive.google.com/uc?id=1dggydm07RHrxiFUIH_51RXmkMcD_bMPE
To: /content/wav2lip-hq/videos/yunying_30s.mp4
12.0MB [00:00, 70.5MB/s]


Loaded yunying_30s.mp4


## Run Wav2Lip with frame saving.

Please, replace `--face` and `--audio` arguments with the same path to your target video.

In [None]:
!mkdir data/gt
!mkdir data/lq
!mkdir data/hq

mkdir: cannot create directory ‘data’: File exists


In [None]:
!python3 inference.py \
        --checkpoint_path "checkpoints/wav2lip_gan.pth" \
        --segmentation_path "checkpoints/face_segmentation.pth" \
        --sr_path "checkpoints/esrgan_yunying.pth" \
        --face "videos/yunying_30s.mp4" \
        --audio "videos/yunying_30s.mp4" \
        --save_frames \
        --gt_path "data/gt" \
        --pred_path "data/lq" \
        --no_sr \
        --no_segmentation

The snippet below is required to resize ground truth images to 384 $\times$ 384 resolution.

In [None]:
import os

paths = os.listdir("data/gt")

for img_path in tqdm(paths):
    img = cv2.imread("data/gt/" + img_path)
    img = cv2.resize(img, (384, 384))
    cv2.imwrite("data/hq/" + img_path, img)

## Finetune ESRGAN

In general, the longer you train the model, the better. However, a couple of hours is usually enough, so feel free to stop execution if you want to.

After executing this cell the pretrained generator will be stored as `experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/models/net_g_latest.pth`. Download it and pass as the `--sr_path` argument to `inference.py`.

In [None]:
!PYTHONPATH="./:${PYTHONPATH}"
!CUDA_VISIBLE_DEVICES=0
!python3 basicsr/train.py -opt train_basicsr.yml