# **Assignment-1 (Lip Sync using Wav2Lip)**
Important: To ensure a smooth experience and avoid any unexpected issues, I recommend running the cells in this notebook sequentially from the beginning. This will help you see the results of each task without encountering errors or troubles. I've taken care to resolve any known issues, but running the cells in order will ensure everything works as intended.
Scroll down to last to see the video output.

Feel free to explore the code and results. If you have any questions or run into any difficulties, please don't hesitate to reach out to me via email at jaisuryar0305@gmail.com.

# **Install required dependencies**

In [None]:
!git clone https://github.com/zabique/Wav2Lip
!wget 'https://iiitaphyd-my.sharepoint.com/personal/radrabha_m_research_iiit_ac_in/_layouts/15/download.aspx?share=EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA' -O '/content/Wav2Lip/checkpoints/wav2lip_gan.pth'
!cd Wav2Lip && pip install -r requirements.txt
!pip install ffmpeg-python
!pip install librosa==0.9.1

from IPython.display import clear_output, HTML
clear_output()
print("\nDone")


Done


In [None]:
import cv2
import numpy as np
import os
import shutil
import gdown
import moviepy.editor as mp

**Get the audio and video**

In [None]:
# Define the URL for the video and audio file on Google Drive
vid_url = f'https://drive.google.com/uc?id=1-DPSujTzsAbYQluvXCm0TLOkzKuPedZC'
aud_url = f'https://drive.google.com/uc?id=1-5F6UQK435Fi9DhDh9IJgFWUrYeWJnrg'

# Output file path for video and audio file
vid_file = '/content/Original_Video.mp4'
aud_file = '/content/Audio.wav'

# Download the files from Google Drive
gdown.download(vid_url, vid_file)
print("Video file downloaded successfully!\n\n")
gdown.download(aud_url, aud_file)
print("Audio file downloaded successfully!\n\n")

Downloading...
From: https://drive.google.com/uc?id=1-DPSujTzsAbYQluvXCm0TLOkzKuPedZC
To: /content/Original_Video.mp4
100%|██████████| 3.37M/3.37M [00:00<00:00, 181MB/s]


Video file downloaded successfully!




Downloading...
From: https://drive.google.com/uc?id=1-5F6UQK435Fi9DhDh9IJgFWUrYeWJnrg
To: /content/Audio.wav
100%|██████████| 808k/808k [00:00<00:00, 118MB/s]

Audio file downloaded successfully!







# **Video Pre-processing**

In [None]:
# Initialize the face detector from OpenCV
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Input video file and audio file
video_path = '/content/Original_Video.mp4'
audio_path = '/content/Audio.wav'

# Load the input video
cap = cv2.VideoCapture(video_path)

# Initialize variables to track face detection
start_time = None
face_detected = False
segment_number = 0

# Create a directory to save the segments
output_directory = '/content/Face_segments'
os.makedirs(output_directory, exist_ok=True)

# Create and open a text file to save the timestamps
timestamps_file = open('/content/timestamps.txt', 'w')

while cap.isOpened():
    ret, frame = cap.read()

    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))

    if len(faces) > 0:
        if not face_detected:
            start_time = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000
            face_detected = True
    else:
        if face_detected:
            end_time = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000

            # Write the timestamps to the text file
            timestamps_file.write(f"Segment {segment_number}: Start: {start_time:.2f}s, End: {end_time:.2f}s\n")

            # Extract the video segment with the visible face
            video_clip = mp.VideoFileClip(video_path).subclip(start_time, end_time)
            video_segment_path = os.path.join(output_directory, f'{segment_number+1}_face.mp4')
            video_clip.write_videofile(video_segment_path, codec='libx264')

            # Extract the audio segment with the visible face
            audio_clip = mp.AudioFileClip(audio_path).subclip(start_time, end_time)
            audio_segment_path = os.path.join(output_directory, f'{segment_number+1}_face.wav')
            audio_clip.write_audiofile(audio_segment_path)

            segment_number += 1
            face_detected = False

# Release video capture object
cap.release()

# Close the timestamps file
timestamps_file.close()

print("Segments with visible faces saved in the 'Face_segments' directory.")

Moviepy - Building video /content/Face_segments/1_face.mp4.
MoviePy - Writing audio in 1_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/1_face.mp4





Moviepy - Done !
Moviepy - video ready /content/Face_segments/1_face.mp4
MoviePy - Writing audio in /content/Face_segments/1_face.wav


                                                        

MoviePy - Done.




Moviepy - Building video /content/Face_segments/2_face.mp4.
MoviePy - Writing audio in 2_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/2_face.mp4





Moviepy - Done !
Moviepy - video ready /content/Face_segments/2_face.mp4
MoviePy - Writing audio in /content/Face_segments/2_face.wav


                                                                  

MoviePy - Done.




Moviepy - Building video /content/Face_segments/3_face.mp4.
MoviePy - Writing audio in 3_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/3_face.mp4





Moviepy - Done !
Moviepy - video ready /content/Face_segments/3_face.mp4
MoviePy - Writing audio in /content/Face_segments/3_face.wav




MoviePy - Done.
Segments with visible faces saved in the 'Face_segments' directory.


# **Implementing Wav2Lip**

In [None]:
# Directory containing the files
directory_path = "/content/Face_segments"

# Initializing an empty list to store the file pairs
file_pairs = []

# Iterate through the files in the directory
for filename in os.listdir(directory_path):
    if filename.endswith(".mp4"):
        mp4_file = os.path.join(directory_path, filename)
        wav_file = os.path.join(directory_path, filename.replace(".mp4", ".wav"))
        if os.path.exists(wav_file):
            file_pairs.append([mp4_file, wav_file])

i = 0

# Loop through each (video, audio) pair
for video_path, audio_path in file_pairs:
    print(f"Processing {video_path} and {audio_path}")
    # Command to execute Wav2Lip
    command = f"cd Wav2Lip && python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face {video_path} --audio {audio_path} --pads 0 -3 0 0"
    os.system(command)

    # Move the resulting video to a Face_segments folder
    result_video_path = 'Wav2Lip/results/result_voice.mp4'
    destination_folder = video_path
    os.remove(audio_path)

    # Copy the result video to destination
    !cp "$result_video_path" "$destination_folder"
    i += 1

    print(f"Lipsync available in {destination_folder}")

Processing /content/Face_segments/3_face.mp4 and /content/Face_segments/3_face.wav
Lipsync available in /content/Face_segments/3_face.mp4
Processing /content/Face_segments/2_face.mp4 and /content/Face_segments/2_face.wav
Lipsync available in /content/Face_segments/2_face.mp4
Processing /content/Face_segments/1_face.mp4 and /content/Face_segments/1_face.wav
Lipsync available in /content/Face_segments/1_face.mp4


**Extract video's where the face is not visible**

In [None]:
print("Extracting segments of video with no face visible...")
# Path to the original video
original_video_path = '/content/Original_Video.mp4'

# Path to the timestamps file
timestamps_file = '/content/timestamps.txt'

# Load the original video
original_video = mp.VideoFileClip(original_video_path)
duration = original_video.duration

# Create a list to store the video segments
video_segments = []

# Read the timestamps from the file
timestamps = []
with open(timestamps_file, 'r') as file:
    lines = file.readlines()
    for line in lines:
        if "Segment" in line:
            s_time = float(line.split("Start: ")[1].split("s,")[0])
            e_time = float(line.split("End: ")[1].split("s")[0])
            timestamps.append([s_time, e_time])

# Extract and save each segment as a separate video
i = 0
while i<len(timestamps)-1:
    segment = original_video.subclip(timestamps[i][1], timestamps[i+1][0])
    segment_output_path = f'/content/Face_segments/{i+1}_no_face.mp4'
    segment.write_videofile(segment_output_path, codec='libx264')
    i += 1

if timestamps[i][1] < duration:
    segment = original_video.subclip(timestamps[i][1], duration)
    segment_output_path = f'/content/Face_segments/{i+1}_no_face.mp4'
segment.write_videofile(segment_output_path, codec='libx264')

print("Successfully extracted to 'Face_segments' directory")

Extracting segments of video with no face visible...
Moviepy - Building video /content/Face_segments/1_no_face.mp4.
MoviePy - Writing audio in 1_no_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/1_no_face.mp4





Moviepy - Done !
Moviepy - video ready /content/Face_segments/1_no_face.mp4
Moviepy - Building video /content/Face_segments/2_no_face.mp4.
MoviePy - Writing audio in 2_no_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/2_no_face.mp4





Moviepy - Done !
Moviepy - video ready /content/Face_segments/2_no_face.mp4
Moviepy - Building video /content/Face_segments/3_no_face.mp4.
MoviePy - Writing audio in 3_no_faceTEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video /content/Face_segments/3_no_face.mp4







Moviepy - Done !
Moviepy - video ready /content/Face_segments/3_no_face.mp4
Successfully extracted to 'Face_segments' directory


**Merge all segments of video's with and without face in an order**

In [None]:
# Path to the directory containing video files
directory_path = '/content/Face_segments'

# Path to the audio file
audio_path = '/content/Audio.wav'
audio_clip = mp.AudioFileClip(audio_path)

# Get a list of video file paths in the directory
video_files = [file for file in os.listdir(directory_path) if file.endswith('.mp4')]

# Sort the video file paths alphabetically
video_files.sort()

# Create a list to store VideoFileClip objects
video_clips = []

# Load and append each video to the list
for video_file in video_files:
    video_path = os.path.join(directory_path, video_file)
    video_clip = mp.VideoFileClip(video_path)
    video_clips.append(video_clip)

# Concatenate the video clips to merge them into a single video
merged_video = mp.concatenate_videoclips(video_clips, method="compose")

# Set the duration of the merged video
duration = sum(clip.duration for clip in video_clips)
merged_video = merged_video.set_duration(duration)

# Set the audio to the merged_video
video_with_audio = merged_video.set_audio(audio_clip)

# Write the video to the output file
output_path = '/content/Final_output.mp4'
video_with_audio.write_videofile(output_path, codec='libx264')

**Delete all the resorces used for processing**

In [None]:
# Release all files and directory except for Original Video, Audio and the Output Video
os.remove('/content/timestamps.txt')

directory_path = '/content/Face_segments'

try:
    shutil.rmtree(directory_path)
except FileNotFoundError:
    print(f"Directory '{directory_path}' does not exist.")
except Exception as e:
    print(f"An error occurred while deleting the directory: {e}")

# **Final Output**

In [None]:
from base64 import b64encode
mp4 = open('/content/Final_output.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f"""
<video width="50%" height="50%" controls>
      <source src="{data_url}" type="video/mp4">
</video>""")
