# Wav2Lip: Accurately Lip-syncing Videos and OpenVINO

Lip sync technologies are widely used for digital human use cases, which enhance the user experience in dialog scenarios.

[Wav2Lip](https://github.com/Rudrabha/Wav2Lip) is a novel approach to generate accurate 2D lip-synced videos in the wild with only one video and an audio clip. Wav2Lip leverages an accurate lip-sync “expert" model and consecutive face frames for accurate, natural lip motion generation.

In this notebook, we introduce how to enable and optimize Wav2Lippipeline with OpenVINO. This is adaptation of the blog article [Enable 2D Lip Sync Wav2Lip Pipeline with OpenVINO Runtime](https://blog.openvino.ai/blog-posts/enable-2d-lip-sync-wav2lip-pipeline-with-openvino-runtime).

Here is Wav2Lip pipeline overview:

![wav2lip_pipeline](https://cdn.prod.website-files.com/62c72c77b482b372ac273024/669487bc70c2767fbb9b6c8e_wav2lip_pipeline.png)


#### Table of contents:

- [Prerequisites](#Prerequisites)
- [Convert the model to OpenVINO IR](#Convert-the-model-to-OpenVINO-IR)
- [Compiling models and prepare pipeline](#Compiling-models-and-prepare-pipeline)
- [Interactive inference](#Interactive-inference)

### Installation Instructions

This is a self-contained example that relies solely on its own code.

We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/wav2lip/wav2lip.ipynb" />

## Prerequisites
[back to top ⬆️](#Table-of-contents:)

In [None]:
%pip install  -q "openvino>=2024.3.0"
%pip install -q huggingface_hub "torch>=2.1" --extra-index-url https://download.pytorch.org/whl/cpu
%pip install -q "librosa==0.9.2" opencv-contrib-python opencv-python tqdm numba

In [None]:
import requests

r = requests.get(
    url="https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py",
)

open("notebook_utils.py", "w").write(r.text)

In [None]:
import sys
from pathlib import Path


wav2lip_path = Path("Wav2Lip")

if not wav2lip_path.exists():
    wav2lip_path.mkdir(parents=True, exist_ok=True)
    !git clone https://github.com/Rudrabha/Wav2Lip

sys.path.append(str(wav2lip_path))

Download example files.

In [None]:
from notebook_utils import download_file


download_file("https://github.com/sammysun0711/openvino_aigc_samples/blob/main/Wav2Lip/data_audio_sun_5s.wav?raw=true")
download_file("https://github.com/sammysun0711/openvino_aigc_samples/blob/main/Wav2Lip/data_video_sun_5s.mp4?raw=true")

### Convert the model to OpenVINO IR
[back to top ⬆️](#Table-of-contents:)

You don't need to download checkpoints and load models, just call the helper function `download_and_convert_models`. It takes care about it and will convert both model in OpenVINO format.

In [None]:
from ov_wav2lip_helper import download_and_convert_models


OV_FACE_DETECTION_MODEL_PATH = Path("models/face_detection.xml")
OV_WAV2LIP_MODEL_PATH = Path("models/wav2lip.xml")

download_and_convert_models(OV_FACE_DETECTION_MODEL_PATH, OV_WAV2LIP_MODEL_PATH)

## Compiling models and prepare pipeline
[back to top ⬆️](#Table-of-contents:)

Select device from dropdown list for running inference using OpenVINO.

In [None]:
from notebook_utils import device_widget

device = device_widget()

device

`ov_inference.py` is an adaptation of original pipeline that has only cli-interface. `ov_inference` allows running the inference using python API and converted OpenVINO models.

In [None]:
from ov_inference import ov_inference


ov_inference(
    "data_video_sun_5s.mp4",
    "data_audio_sun_5s.wav",
    face_detection_path=OV_FACE_DETECTION_MODEL_PATH,
    wav2lip_path=OV_WAV2LIP_MODEL_PATH,
    inference_device=device.value,
    outfile="results/result_voice.mp4",
)

Here is an example to compare original video and generated video after the Wav2Lip pipeline:

In [None]:
from IPython.display import Video

Video("data_video_sun_5s.mp4")

In [None]:
Video("results/result_voice.mp4")