# Face Swapping

This notebook demonstrates the ability to swap faces using numpy, dlib and OpenCV. (Original article on [reddit](https://www.reddit.com/r/programming/comments/3f591x/so_i_wrote_a_script_that_swaps_peoples_faces_in/) by Matthew Earl)

Note: The sample images are downloaded from wikimedia.org (see the attached license file). Code and notebook are MIT License.

## 1. Detecting facial landmarks using _dlib_

The first thing we want to do is to extract the landmark matrices, especially the coordinates of the particular facial features.

Ref.: [One Millisecond Face Alignment with an Ensemble of Regression Trees](http://www.csc.kth.se/~vahidk/papers/KazemiCVPR14.pdf)

Here are the original sample images:
<table><tr><td><img src="./Andrea_V.jpg" alt="sample 1" style="height: 320px;"/></td>
<td><img src="./The_Equestrian_Session.jpg" alt="sample 2" style="height: 320px;"/></td></tr></table>

In [149]:
#==========
# imports
#==========
import cv2
import dlib
import numpy as np
import sys
import os

#=================================
# constants and initial settings
#=================================
sample1 = 'Andrea_V.jpg'
sample2 = 'The_Equestrian_Session.jpg'
predictor_path = 'shape_predictor_68_face_landmarks.dat'
scale_factor = 1
feather_amount = 11

face_points = list(range(17, 68))
mouth_points = list(range(48, 61))
# right_brow_points = list(range(17, 22))
# left_brow_points = list(range(22, 27))
right_brow_points = list(range(18, 22))
left_brow_points = list(range(22, 26))
right_eye_points = list(range(36, 42))
left_eye_points = list(range(42, 48))
nose_points = list(range(27, 35))
jaw_points = list(range(0, 17))

# Points used to line up the images.
align_points = (
    left_brow_points + right_eye_points + left_eye_points + 
    right_brow_points + nose_points + mouth_points
)

# Points from the second image to overlay on the first. 
# The convex hull of each element will be overlaid.
overlay_points = [
    left_eye_points + right_eye_points + left_brow_points + right_brow_points,
    nose_points + mouth_points,
]

# Amount of blur to use during colour correction, as a fraction of the
# pupillary distance.
color_correction_blur_fraction = 0.6

#===============================
# helper classes and functions
#===============================
class TooManyFacesException(Exception):
    pass

class NoFacesException(Exception):
    pass

def read_image(img_path):
    """
    read and resize raw color image to numpy array
    """
    img = cv2.imread(img_path, cv2.IMREAD_COLOR)
    img = cv2.resize(img, (
        img.shape[1] * scale_factor, 
        img.shape[0] * scale_factor
    ))
    return img

def write_image(img_name, img):
    """
    writes image into the generated folder
    """
    generated_pth = 'generated/{}'.format(img_name)
    cv2.imwrite(generated_pth, img)

if not os.path.exists('./generated'):
    os.mkdir('./generated')

# download pretrained dlib predictor dat
import urllib.request
if not os.path.exists(predictor_path):
    urllib.request.urlretrieve (
        'https://github.com/AKSHAYUBHAT/TensorFace/raw/master/openface/models/dlib/{}'.format(
            predictor_path
        ), 
        predictor_path
    )
    
print('> ready!')

> ready!


In [150]:
detector = dlib.get_frontal_face_detector()
# dlib feature extractor with a pre-trained model
predictor = dlib.shape_predictor(predictor_path)

def get_landmarks(img):
    """
    takes an image in form of a numpy array and returns a 68x2 matrix, 
    each row corresponds with the x/y-coordinates of a particular 
    feature point in the input image.

    A rough bounding box for the feature extractor (predictor)
    provided by a traditional face detector (detector) which returns
    a list of rectangles, each of which corresponds to a face in
    the image
    """
    rects = detector(img, 1)
    if len(rects)>1:
        raise TooManyFacesException
    if len(rects)==0:
        raise NoFacesException
    return np.matrix(
        [[p.x, p.y] for p in predictor(img, rects[0]).parts()]
    )

def annotate_landmarks(img, landmarks):
    """
    annotates feature landmarks on the image.
    """
    img = img.copy()
    for idx, point in enumerate(landmarks):
        pos = (point[0, 0], point[0, 1])
        cv2.putText(img, str(idx), pos,
                    fontFace=cv2.FONT_HERSHEY_SCRIPT_SIMPLEX,
                    fontScale=0.4,
                    color=(0, 0, 255))
        cv2.circle(img, pos, 3, color=(0, 255, 255))
    return img

In [151]:
# generate annotated images for debugging purposes
data = {}
for pth in [sample1, sample2]:
    data[pth] = {}
    data[pth]['raw_image'] = read_image(pth)
    data[pth]['landmarks'] = get_landmarks(data[pth]['raw_image'])
    data[pth]['annotated_image'] = annotate_landmarks(
        data[pth]['raw_image'], 
        data[pth]['landmarks']
    )
    write_image(
        'annotated_{}'.format(pth), 
        data[pth]['annotated_image']
    )
    print('> generated annotated image of \'{}\''.format(pth))

> generated annotated image of 'Andrea_V.jpg'
> generated annotated image of 'The_Equestrian_Session.jpg'


Annotated sample images with landmarks:
<table><tr><td><img src="./generated/annotated_Andrea_V.jpg" alt="annotated sample 1" style="height: 320px;"/></td>
<td><img src="./generated/annotated_The_Equestrian_Session.jpg" alt="annotated sample 2" style="height: 320px;"/></td></tr></table>

## 2. Align faces with a procrustes analysis

Now we're going to rotate, scale and translate the feature points of the image vectors such that the facial parts fit as closely as possible.

Ref.: 
* [Ordinary Procrustes Analysis](https://en.wikipedia.org/wiki/Procrustes_analysis#Ordinary_Procrustes_analysis)
* [Singular Value Decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition)

In [152]:
def transformation_from_points(landmarks, target_landmarks):
    """
    Transforms matrice p2 to fit into p1. Returns an affine
    transformation [s * R | T] such that:
        sum ||s*R*p1,i + T - p2,i||^2
    is minimized.
    """
    # extract align points and convert matrices into floats 
    p1 = target_landmarks[align_points].astype(np.float64)
    p2 = landmarks[align_points].astype(np.float64)
    # calculate centroids
    c1 = np.mean(p1, axis=0)
    c2 = np.mean(p2, axis=0)
    # subtract centroid from each of the point sets
    p1 -= c1
    p2 -= c2
    # find and use scaling
    s1 = np.std(p1)
    s2 = np.std(p2)
    p1 /= s1
    p2 /= s2
    # calculate rotation using singular value decomposition
    U, S, Vt = np.linalg.svd(p1.T * p2)
    # The R we seek is in fact the transpose of the one given by U * Vt. This
    # is because the above formulation assumes the matrix goes on the right
    # (with row vectors) where as our solution requires the matrix to be on the
    # left (with column vectors).
    R = (U * Vt).T
    # return affine transformation matrix
    return np.vstack([
        np.hstack((
            (s2 / s1) * R,
            c2.T - (s2 / s1) * R * c1.T
        )),
        np.matrix([0., 0., 1.])
    ])

def warp_image(img, target_img, M):
    """
    use OpenCV's cv2.warpAffine to map the second image 
    onto the first.
    """
    dshape = target_img.shape
    result_img = np.zeros(dshape, dtype=img.dtype)
    cv2.warpAffine(
        img,
        M[:2],
        (dshape[1], dshape[0]),
        dst=result_img,
        borderMode=cv2.BORDER_TRANSPARENT,
        flags=cv2.WARP_INVERSE_MAP
    )
    return result_img

In [153]:
# apply transformation on sample2 image
M = transformation_from_points(    
    data[sample2]['landmarks'],
    data[sample1]['landmarks']
)
warped_img = warp_image(
    data[sample2]['raw_image'],
    data[sample1]['raw_image'], 
    M
)
write_image('warped_{}'.format(sample2), warped_img)
print('> generated warped image of \'{}\''.format(sample2))

> generated warped image of 'The_Equestrian_Session.jpg'


Transformed sample 2 image:
<table><tr><td><img src="./Andrea_V.jpg" alt="annotated sample 1" style="height: 320px;"/></td>
<td><img src="./generated/warped_The_Equestrian_Session.jpg" alt="annotated sample 2" style="height: 320px;"/></td></tr></table>

## 3. Blending face features together

A mask is used to select which parts of image 2 and wich parts of image 1 should be shown in the final image.

In [156]:
def draw_convex_hull(img, points, color):
    """
    draw and fill a conves hull.
    """
    points = cv2.convexHull(points)
    cv2.fillConvexPoly(img, points, color=color)
    
def get_face_mask(img, landmarks):
    """
    extracts face mask using the landmark matrix and draws two
    convex polygons in white: one surrounding the eye area,
    and one surrounding the nose and mouth area.
    
    Then it feathers the edge of the mask helping hide any
    remaining discontinuities.
    """
    img = np.zeros(img.shape[:2], dtype=np.float64)
    for grp in overlay_points:
        draw_convex_hull(
            img,
            landmarks[grp],
            color=1
        )
    img = np.array([img, img, img]).transpose((1, 2, 0))
    img = (cv2.GaussianBlur(img, (feather_amount, feather_amount), 0) > 0) * 1.0
    img = cv2.GaussianBlur(img, (feather_amount, feather_amount), 0)
    return img


def get_combined_face_mask(img, landmarks, target_img, target_landmarks, M):
    """
    gets combined face masks of the face parts
    """
    # generate face masks for both images.
    mask_img1 = get_face_mask(target_img, target_landmarks)
    mask_img2 = get_face_mask(img, landmarks)
    # transform second image into image 1's coordinate space
    warped_mask_img2 = warp_image(mask_img2, target_img, M)
    # combine masks by taking element-wise maximum, this guarantees
    # that image 1 features are covered up and features of image 2
    # can show through
    combined_masks = np.max([mask_img1, warped_mask_img2], axis=0)
    return combined_masks


def apply_face_mask(img, landmarks, target_img, target_landmarks, M):
    """
    extracts and combines the face parts
    """
    combined_masks = get_combined_face_mask(img, landmarks, target_img, target_landmarks, M)
    warped_img2 = warp_image(img, target_img, M)
    # apply combined face mask to the target image
    combined_img = target_img * (1.0 - combined_masks) + warped_img2 * combined_masks
    return combined_img


In [157]:
# transform and merge the face parts
combined_masks = get_combined_face_mask(
    data[sample2]['raw_image'],
    data[sample2]['landmarks'],
    data[sample1]['raw_image'],
    data[sample1]['landmarks'],
    M
)
write_image(
    'masked_{}'.format(sample1), 
    data[sample1]['raw_image'] * (1.0 - combined_masks)
)
result_img = apply_face_mask(
    data[sample2]['raw_image'],
    data[sample2]['landmarks'],
    data[sample1]['raw_image'],
    data[sample1]['landmarks'],
    M
)
write_image('combined_{}'.format(sample1), result_img)
print('> generated combined image of \'{}\' and \'{}\''.format(
    sample1, 
    sample2
))

> generated combined image of 'Andrea_V.jpg' and 'The_Equestrian_Session.jpg'


Merged face parts together:
<table><tr><td><img src="./generated/masked_Andrea_V.jpg" alt="masked sample 1" style="height: 320px;"/></td>
<td><img src="./generated/combined_Andrea_V.jpg" alt="merged face sample 1" style="height: 320px;"/></td></tr></table>

## 4. Color Correction

In the last step we'll handle the color tones and brightness issues which cause discontinuities around the edges of the overlaid region.

Ref.: [RGB scaling and color balance](https://en.wikipedia.org/wiki/Color_balance#Scaling_monitor_R.2C_G.2C_and_B)

In [158]:
def correct_colors(img, target_img, target_landmarks):
    """
    crude solution to change the image color to match the color
    of the target image by using an appropriate size gaussian kernel.
    """
    # kernel of 0.6 * pupillary distance
    blur_amount = (
        color_correction_blur_fraction * 
        np.linalg.norm(
           np.mean(target_landmarks[left_eye_points], axis=0) - 
           np.mean(target_landmarks[right_eye_points], axis=0)
        )
    )
    blur_amount = int(blur_amount)
    if blur_amount%2 == 0:
        blur_amount += 1
    img1_blur = cv2.GaussianBlur(target_img, (blur_amount, blur_amount), 0)    
    img2_blur = cv2.GaussianBlur(img, (blur_amount, blur_amount), 0)
    img2_blur += (128 * (img2_blur <= 1.0)).astype(img2_blur.dtype)

    return (
        img.astype(np.float64) * 
        img1_blur.astype(np.float64) /
        img2_blur.astype(np.float64)
    )

def swap_face(img, landmarks, target_img, target_landmarks, M):
    """
    extracts and combines the face parts using color correction
    """
    combined_mask = get_combined_face_mask(img, landmarks, target_img, target_landmarks, M)
    warped_img = warp_image(img, target_img, M)
    write_image('debug1_{}'.format(sample1), warped_img)
    corrected_img = correct_colors(warped_img, target_img, target_landmarks)
    write_image('debug2_{}'.format(sample1), corrected_img)
    write_image('debug3_{}'.format(sample1), target_img)
    write_image('debug4_{}'.format(sample1), corrected_img * combined_masks)
    # apply combined face mask to the target image
    combined_img = target_img * (1.0 - combined_mask) + corrected_img * combined_mask
    return combined_img


In [159]:
# apply the final swap face method
result_img = swap_face(
    data[sample2]['raw_image'],
    data[sample2]['landmarks'],
    data[sample1]['raw_image'],
    data[sample1]['landmarks'],
    M
)
write_image('final_{}'.format(sample1), result_img)
print('> generated face swapped image of \'{}\' and \'{}\''.format(
    sample1, 
    sample2
))

> generated face swapped image of 'Andrea_V.jpg' and 'The_Equestrian_Session.jpg'


Final output:
<table><tr><td><img src="./Andrea_V.jpg" alt="original sample 1" style="height: 320px;"/></td>
<td><img src="./generated/final_Andrea_V.jpg" alt="final face sample 1" style="height: 320px;"/></td><td><img src="./The_Equestrian_Session.jpg" alt="original sample 2" style="height: 320px;"/></td></tr></table>