# DeepFakes 101: Facial Swap (1) - Data Preprocessing

This script aims to generate a clean training data for deepfake generation by extracting facial data of training subjects via dlib's face_reocgnition. Based on the implementation by Ovalery16, in turn based on the original implementation by ShoanLu.

Main improvements were focused on compatibility with Google Colaboratory environment as a stand-alone module.


This notebook utilizes a Google Drive download to access auxilliary supporting image processing scripts. However, this could be replaced with simple git clone command.


1. Load dependencies

In [1]:
#Download auxilliary components first


!gdown https://drive.google.com/uc?id=1O0jrWmtAoSN-W8AwmO0GrqhcYPzkZB2Y
!gdown https://drive.google.com/uc?id=1sW6isWgkgiurXtQnYB_iM7OfVlpVlpgV
!gdown https://drive.google.com/uc?id=15ilnghI30jH5IxMj9DGRxlI3yb0JxKQM  
!gdown https://drive.google.com/uc?id=1dLC8I9rElNpexVtIxJ5qETTKR2rAlPfk
!gdown https://drive.google.com/uc?id=111RTFhYZsDEMWDsegpVZKc3KEqsSQ7mM  
!unzip deepfake_aux.zip
!unzip nicolas.zip
!unzip tom.zip
!unzip chris.zip

Downloading...
From: https://drive.google.com/uc?id=1O0jrWmtAoSN-W8AwmO0GrqhcYPzkZB2Y
To: /content/deepfake_aux.zip
100% 906k/906k [00:00<00:00, 60.0MB/s]
Downloading...
From: https://drive.google.com/uc?id=1sW6isWgkgiurXtQnYB_iM7OfVlpVlpgV
To: /content/deepfake.zip
267MB [00:05, 47.1MB/s]
Downloading...
From: https://drive.google.com/uc?id=15ilnghI30jH5IxMj9DGRxlI3yb0JxKQM
To: /content/nicolas.zip
256MB [00:05, 46.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1dLC8I9rElNpexVtIxJ5qETTKR2rAlPfk
To: /content/tom.zip
320MB [00:05, 63.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=111RTFhYZsDEMWDsegpVZKc3KEqsSQ7mM
To: /content/chris.zip
494MB [00:10, 46.3MB/s]
Archive:  deepfake_aux.zip
   creating: filter/
  inflating: filter/000001.jpg       
  inflating: filter/chrisfilter.jpg  
  inflating: filter/nicfilter.jpg    
  inflating: filter/tomfilter.jpg    
  inflating: filter/tomfilter2.jpg   
   creating: lib_1/
  inflating: lib_1/aligner.py        
  inflating: l

In [2]:
#Install dependencies if missing

!pip install face_recognition

!pip install scandir
!pip install h5py
!pip install opencv-python
!pip install scikit-image
!pip install dlib
!pip install tqdm



Collecting face_recognition
  Downloading https://files.pythonhosted.org/packages/3f/ed/ad9a28042f373d4633fc8b49109b623597d6f193d3bbbef7780a5ee8eef2/face_recognition-1.2.3-py2.py3-none-any.whl
Collecting face-recognition-models>=0.3.0 (from face_recognition)
[?25l  Downloading https://files.pythonhosted.org/packages/cf/3b/4fd8c534f6c0d1b80ce0973d01331525538045084c73c153ee6df20224cf/face_recognition_models-0.3.0.tar.gz (100.1MB)
[K     |████████████████████████████████| 100.2MB 1.5MB/s 
Building wheels for collected packages: face-recognition-models
  Building wheel for face-recognition-models (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/d2/99/18/59c6c8f01e39810415c0e63f5bede7d83dfb0ffc039865465f
Successfully built face-recognition-models
Installing collected packages: face-recognition-models, face-recognition
Successfully installed face-recognition-1.2.3 face-recognition-models-0.3.0
Collecting scandir
  Downloading https://files.pythonhosted.org/pac

In [0]:
import cv2
from pathlib import Path
import face_recognition
from lib_1.PluginLoader import PluginLoader
from lib_1.faces_detect import detect_faces
from lib_1.FaceFilter import FaceFilter
import os
!mkdir extracted
from os import path

2. Define directories

In [0]:
input_directory="../content/chris/"  #TODO Change argument here of the input data, should be either chris, tom, or nicolas



output_directory="../content/extracted/"

Define extraction functions

In [0]:
def load_filter():
    filter_file = '../content/filter/chrisfilter.jpg' # TODO Change argument here depending on what youre trying to extract
    if os.path.exists(filter_file):
        print('Loading reference image for filtering')
        return FaceFilter(filter_file)
    else:
        print("Filter not detected")

def get_faces(image):
    faces_count = 0
    filterDeepFake = load_filter()
    
    for face in detect_faces(image):
        
        if filterDeepFake is not None and not filterDeepFake.check(face):
            print('Skipping not recognized face!')
            continue
        

        yield faces_count, face


In [6]:
os.listdir(input_directory)

['chris_hemsworth_1538559620.jpg',
 '65ca35b3_36ea_4426_bf21_29ca26b9ae8e_getty_1043106482.jpg',
 'entertainment_2015_06_chris_hemsworth_main.jpg',
 'tag_heuer_chris_hemsworth_in_australia.jpg',
 'thor_endgame_beer.jpg',
 '071317_chris_hemsworth_lead.jpg',
 '04_avengers_age_of_ultron.jpg',
 'chris_hemsworth_thor_ragnarok_image (1).jpg',
 'rs_634x1024_180720105902_634_chris_hemsworth_matt_damon.jpg',
 'rs_1080x1080_180306154805_28752625_371116436698863_1911330248528494592_n.jpg',
 'wenn_chrishemsworth_091417_1800x1200_1800x1200.jpg',
 'Thor_and_Odin_in_The_Dark_World.jpg',
 '2.42260696.jpg',
 'anglo_2000x1125_chrishemsworth_thor_e1472477255621_1600x721.jpg',
 'Avengers_Infinity_War_Chris_Hemsworth_Thor_Comforter_1.jpg',
 'Chris_Hemsworth_zagra_Hulka_Hogana_w_biograficznym_filmie_Netflixa_o_wrestlerze_article.jpg',
 'chris_hemsworth_thor_photo_call_hotel_bayrischer_hof_munich_germany_C2JC75.jpg',
 'DxO7Cu8X0AEzX5g.jpg',
 'chris_hemsworth_green_smoothie.jpg',
 'b18e88a1e73f84a8e59fe3ed159

We list the image in the input directory and we extract the faces in each of them

In [7]:
files = [i for i in os.listdir(input_directory)]
         
         
from matplotlib import pyplot as plt
from google.colab.patches import cv2_imshow

extractor_name = "Align" # TODO Pass as argument
extractor = PluginLoader.get_extractor(extractor_name)()

"""

#Single Example test

example  ="../content/data/CR_2012.jpg"


image = cv2.imread(example)

for idx, face in get_faces(image):
           resized_image = extractor.extract(image, face, 256)
           output_file = output_directory+"/"+str(Path(example).stem)
           cv2.imwrite(str(output_file) + str(idx) + Path(example).suffix, resized_image)
"""
#Simply iterating over the folder is insufficient, imread needs paths, so create them into a list.

def find_all_files(directory):
    for root, dirs, files in os.walk(directory):

        for file in files:
            p=os.path.join(root, file)
            p=p.split("/")[len(p.split("/"))-2]
            name, ext = os.path.splitext(p)

            yield os.path.join(root, file)
folder_img = find_all_files(input_directory)

try:
    for filename in folder_img:
        #print(file)
        #filename = Path(input_directory+file)
        
        
        
        image = cv2.imread(filename)
        
        
        
        for idx, face in get_faces(image):
            resized_image = extractor.extract(image, face, 256)
            output_file = output_directory+"/"+str(Path(filename).stem)
            cv2.imwrite(str(output_file) + str(idx) + Path(filename).suffix, resized_image)

except Exception as e:
    print('Failed to extract from image: {}. Reason: {}'.format(filename, e))
    
   
   

Loading Extract from Extract_Align plugin...
Loading reference image for filtering
-----
check
[0.52981545]
Loading reference image for filtering
-----
check
[0.56861599]
Loading reference image for filtering
-----
check
[0.45572015]
check
[0.74860174]
Skipping not recognized face!
check
[0.77143722]
Skipping not recognized face!
check
[0.74811181]
Skipping not recognized face!
check
No faces found in the image!
0.8
Skipping not recognized face!
Loading reference image for filtering
-----
check
[0.49074639]
Loading reference image for filtering
-----
check
[0.51871826]
Loading reference image for filtering
-----
check
[0.43528752]
Loading reference image for filtering
-----
check
[0.52997591]
check
[0.65036366]
Skipping not recognized face!
Loading reference image for filtering
-----
Loading reference image for filtering
-----
Loading reference image for filtering
-----
check
No faces found in the image!
0.8
Skipping not recognized face!
Loading reference image for filtering
-----
chec

In [8]:
# Zip up results for use later, rename as you see fit. Remember to save in your own drive

!zip -r extracted_chris.zip extracted

  adding: extracted/ (stored 0%)
  adding: extracted/chris_hemsworth_in_singapore_for_hugo_boss0.jpg (deflated 1%)
  adding: extracted/1242911076001_5390045633001_et_socialstudiy_040717_40.jpg (deflated 1%)
  adding: extracted/main_men_in_black_international_m_tessa_thompson_h_chris_hemsworth_suits0.jpg (deflated 1%)
  adding: extracted/2.383694260.jpg (deflated 1%)
  adding: extracted/43d26b6567455e2f5b0b342602b5f6d00.jpg (deflated 1%)
  adding: extracted/actor_chris_hemsworth_meets_fans_on_set_of_thor_ragnarok_in_brisbane_picture_id5980039100.jpg (deflated 1%)
  adding: extracted/kisspng_thor_avengers_age_of_ultron_chris_hemsworth_marve_thor_clipart_free_pictures_5ab10dbbcf44a1.9137888915215528278490.jpg (deflated 1%)
  adding: extracted/thor_hair_color_177279_thor_ragnarok_what_s_with_the_short_hair_on_chris_hemsworth_of_thor_hair_colo0.jpg (deflated 1%)
  adding: extracted/Avengers_Infinity_war_Thor_Chris_Hemsworth_The_Avengers_13030290.jpg (deflated 1%)
  adding: extracted/Thor_Ra