# Computer Vision 

# Exercise 6: Face Swapping

- TU Chemnitz
    - Fak. für Informatik
        - Professur Künstliche Intelligenz
            - Lehre
                - Bildverstehen
     
Contact:
* julien dot vitay at informatik dot tu-chemnitz dot de
* abbas dot al-ali at informatik dot tu-chemnitz dot de

Course web page:
[https://www.tu-chemnitz.de/informatik/KI/edu/biver/](https://www.tu-chemnitz.de/informatik/KI/edu/biver/)

## Face Swapping

<img src="img/switchingeds.jpg" alt="img/switchingeds.jpg" width="600"/>

* The code for the exercise is adapted from the blog post of Matthew Earl:

<http://matthewearl.github.io/2015/07/28/switching-eds-with-python> 

* Matthew Earl even provides the complete Python code for it:

<https://github.com/matthewearl/faceswap/blob/master/faceswap.py>

* See the copyright notice at the end of this notebook.

* The goal of this exercise is to reimplement it step by step to better understand it. Don't hesitate to read the blogpost for additional information.

<table> 
  <tr>
    <td>
        <img src="img/swap-output.jpg" alt="img/swap-output.jpg" width="250"/>        
    </td>      
    <td>
        <img src="img/merkel_clinton.jpg" alt="img/merkel_clinton.jpg" width="200"/>
    </td>
  </tr>
</table> 

### Face Swapping algorithm

1. Detect landmarks in both images.

2. Compute a binary mask over the two faces (eyes + nose + mouth).

3. Estimate the affine transformation between the two masks using **Procrustes analysis**.

4. Inverse warping of the source face to align the facial features with the target head.

5. Crop the warped image to the target mask. 

6. Add the cropped image to the target image.

7. Adapt the colors using **Gaussian color correction**.

### **Task 1:** Installing `dlib`

1. Download dlib at <http://dlib.net>:

    - dlib-19.2 : <http://dlib.net/files/dlib-19.2.tar.bz2>
    - dlib-19.16: <http://dlib.net/files/dlib-19.16.tar.bz2>

2. Unzip it:

```
tar xvjf dlib-19.2.tar.bz2
tar xvjf dlib-19.16.tar.bz2
```

3. Compile and install it (takes a couple of minutes):

```
cd dlib-19.2
cd dlib-19.16
python setup.py install --user
```

4. You will also need the data for the landmark detector:

<http://sourceforge.net/projects/dclib/files/dlib/v18.10/shape_predictor_68_face_landmarks.dat.bz2>

5. Unzip it and place the `.dat` file in the same directory as your notebook

```
unzip shape_predictor_68_face_landmarks.dat.bz2
```

In [None]:
from __future__ import print_function

%matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
import cv2

import dlib

### **Task 2:** Opening images


1. Read the two images:
    - `source` <== `clinton.jpg`: contains the face
    - `destination` <== `merkel.jpg`: contains the head 
    

2. Convert them to RGB so that `matplotlib` can display them well (do not use `opencv` for displaying)

3. Display the two images side by side

In [None]:
source = cv2.imread('clinton.jpg')
source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)
destination = cv2.imread('merkel.jpg')
destination = cv2.cvtColor(destination, cv2.COLOR_BGR2RGB)

plt.subplot(121)
plt.imshow(source)
plt.xticks([]), plt.yticks([]) 
plt.title('Source')
plt.subplot(122)
plt.imshow(destination)
plt.xticks([]), plt.yticks([])
plt.title('Destination')
plt.show()

### **Task 3:** Face detection

<img src="img/figure_2.png" alt="img/figure_2.png" width="600"/> 

* The second step is to detect where the face is in each image (if any).

* `dlib` provides a fast face detector based on the **Viola-Jones** face detection algorithm
(<https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework>).

* The detector was trained on a huge number of frontal faces.

* OpenCV also has one (`cvHaarDetectObjects()`), but let's use dlib. 

1. Create a `detector` object:

```python
detector = dlib.get_frontal_face_detector()
```

2. Apply the face detector on the `source` image:

```python
source_facerect = detector(source)[0]
```

- It returns a list a bounding boxes, one per face detected in the image (hence the `[0]` as we have only one face).

- Each bounding box is a `dlib.rectangle` object, which allows to get the coordinates (`top()`, `bottom()`, `left()`, `right()`) and the dimensions (`width()`, `height()`) of the bounding box in the image.

3. You can print the bounding box to see the coordinates of the top-left and bottom-right points:

```python
print(source_facerect)
```

4. Visualize the box, you need to use the axes `ax` returned by `subplot()`:

```python
ax = plt.subplot(121)
plt.imshow(source)
ax.add_patch(
    plt.Rectangle(
        (source_facerect.left(), source_facerect.top()), # Top-left corner
        source_facerect.width(), # Width
        source_facerect.height(), # Height
        edgecolor='r', lw=1.0, fill=False))
plt.xticks([])
plt.yticks([])
```

5. Repeat the same for the `destination`

In [None]:
detector = dlib.get_frontal_face_detector()

source_facerect = detector(source)[0]
destination_facerect = detector(destination)[0]

print(source_facerect)
print(destination_facerect)
# [(321, 321) (692, 692)]
# [(362, 280) (734, 651)]

ax = plt.subplot(121)
plt.imshow(source)
ax.add_patch(plt.Rectangle(
    (source_facerect.left(), source_facerect.top()), # Top-left corner
    source_facerect.width(), # Width
    source_facerect.height(), # Height
    edgecolor='r', lw=1.0, fill=False))
plt.xticks([])
plt.yticks([])

ax = plt.subplot(122)
plt.imshow(destination)
ax.add_patch(
    plt.Rectangle(
        (destination_facerect.left(), destination_facerect.top()), # Top-left corner
        destination_facerect.width(), # Width
        destination_facerect.height(), # Height
        edgecolor='r', lw=1.0, fill=False))
plt.xticks([])
plt.yticks([])

plt.show()

### **Task 4 (Optional):** Using the face detector on multiple faces

<img src="img/crowd.png" alt="img/crowd.png" width="600"/> 

- Apply the face detector on the image `crowd.jpg` and visualize all the bounding boxes:

In [None]:
crowd = cv2.imread('crowd.jpg')
crowd = cv2.cvtColor(crowd, cv2.COLOR_BGR2RGB)

detector = dlib.get_frontal_face_detector()
crowd_facerect = detector(crowd)

fig = plt.figure()
ax = fig.gca()
plt.imshow(crowd)

for rect in crowd_facerect:
    ax.add_patch(
        plt.Rectangle(
        (rect.left(), rect.top()), # Top-left
        rect.width(), # Width
        rect.height(), # Height
        edgecolor='r', lw=1.0, fill=False))
plt.xticks([]); plt.yticks([])
plt.show()

### **Task 5:** Landmark extraction

<img src="img/landmarks.png" alt="img/landmarks.png" width="300"/> 

* Once the face region is extracted, one can pass it to a **feature extractor** that will *annotate* the relevant landmarks in the face.

* The feature extractor proposed by dlib uses the complex algorithm described in *One Millisecond Face Alignment with an Ensemble of Regression Trees*, by Vahid Kazemi and Josephine Sullivan.

* Its usage is very simple:

```python
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

source_landmarks = []
for p in predictor(source, source_facerect).parts():
    source_landmarks.append([p.x, p.y])
source_landmarks = np.array(source_landmarks, dtype=np.int)
```

`source_landmarks` is then a Numpy array with the $(x, y)$ coordinates of 68 different landmarks.

* The file `shape_predictor_68_face_landmarks.dat` must be in the same directory as your script, otherwise give a relative path.

* Each landmark represents a specific part of a face (jaw, eyebrows, eyes, nose, mouth).


1. Visualize the landmarks of the two images. You just need to plot the two columns of the landmarks array against each other.

2. Optional: you can also visualize their numbers


In [None]:
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
source_landmarks = []
for p in predictor(source, source_facerect).parts():
    source_landmarks.append([p.x, p.y])
source_landmarks = np.array(source_landmarks, dtype=np.int)

destination_landmarks = []
for p in predictor(destination, destination_facerect).parts():
    destination_landmarks.append([p.x, p.y])
destination_landmarks = np.array(destination_landmarks, dtype=np.int)

In [None]:
# altenatively load them
# source_landmarks = np.load('source_landmarks.npy')
# destination_landmarks = np.load('destination_landmarks.npy')

In [None]:
# Plot only the face landmarks
ax = plt.subplot(121)
plt.imshow(source)
plt.plot(source_landmarks[:, 0], source_landmarks[:, 1], '.')
plt.xticks([]); plt.yticks([])

ax = plt.subplot(122)
plt.imshow(destination)
plt.plot(destination_landmarks[:, 0], destination_landmarks[:, 1], '.')
plt.xticks([]); plt.yticks([])
plt.show()

3. The landmarks can be grouped into regions:

In [None]:
FACE_POINTS = list(range(17, 68))
MOUTH_POINTS = list(range(48, 61))
RIGHT_BROW_POINTS = list(range(17, 22))
LEFT_BROW_POINTS = list(range(22, 27))
RIGHT_EYE_POINTS = list(range(36, 42))
LEFT_EYE_POINTS = list(range(42, 48))
NOSE_POINTS = list(range(27, 35))
JAW_POINTS = list(range(0, 17))

4. The next step is to align the two images by estimating the similarity transformation between the two sets of points.

 - we select only the inner points (not the jaw):

In [None]:
# Points used to line up the images
ALIGN_POINTS = (LEFT_BROW_POINTS + RIGHT_EYE_POINTS + LEFT_EYE_POINTS +
                               RIGHT_BROW_POINTS + NOSE_POINTS + MOUTH_POINTS)

5. Plot only the face landmarks:

In [None]:
# Plot only the face landmarks used to line up the images
ax = plt.subplot(121)
plt.imshow(255*np.ones(source.shape))
plt.plot(source_landmarks[ALIGN_POINTS, 0], source_landmarks[ALIGN_POINTS, 1], '.')
plt.xticks([]); plt.yticks([])

ax = plt.subplot(122)
plt.imshow(255*np.ones(destination.shape))
plt.plot(destination_landmarks[ALIGN_POINTS, 0], destination_landmarks[ALIGN_POINTS, 1], '.')
plt.xticks([]); plt.yticks([])
plt.show()

### **Task 6:**  Affine transformation estimation

<!--(img/figure_4.png)-->

* Now that we have the coordinates of interest points in the two images, we need to estimate the similarity transformation betwen the two.

* Possible methods include: *least squares method*, *RANSAC* and *Procrustes analysis*. 

* Here Matthew Earl chose to use **Procrustes analysis**, so we do it too ...

### Procrustes Analysis

- The idea behind **Procrustes analysis** is to estimate separately the *translation*, *rotation* and *scaling* components of the similarity transformation.

- We search for a translation vector $\mathbf{t}$, a rotation matrix $\mathbf{R}$ and a scaling factor $s$ so that the **least squares** function is minimized:

$$
    E_\text{LS}(\mathbf{t}, \mathbf{R}, s) = \sum_{i=1}^{68} || s \, \mathbf{R} \, \mathbf{p}_i + \mathbf{t} - \mathbf{q_i}||^2
$$

with $\mathbf{p}_i$ and $\mathbf{q}_i$ being the coordinates of the landmarks in the source and destination images respectively.

- The details of Procrustes analysis are irrelevant for this exercise. The function:
```python
transformation_from_points(points1, points2)
```
provided here is designed to do the job, let's just notice that it depends on basic geometrical analysis and *singular value decomposition*:

In [None]:
def transformation_from_points(points1, points2):
    points1 = np.matrix(points1).astype(np.float64)
    points2 = np.matrix(points2).astype(np.float64)
    # The translation t corresponds to the displacement of the centers of mass
    c1 = np.mean(points1, axis=0)
    c2 = np.mean(points2, axis=0)
    # Normalize the mean of the points
    points1 -= c1
    points2 -= c2

    # The scaling corresponds to the ratio between the standard deviations
    s1 = np.std(points1)
    s2 = np.std(points2)
    # Normalize the variance of the points
    points1 /= s1
    points2 /= s2
    # Apply Singular Value decomposition on the correlation matrix of the points
    U, S, Vt = np.linalg.svd(points2.T * points1)
    # The R we seek is in fact the transpose of the one given by U * Vt.
    R = (U * Vt).T
    # Return the affine transformation matrix
    return np.hstack(((s1 / s2) * R, c1.T - (s1 / s2) * R * c2.T))

1. Apply Procrustes analysis to find the similarity transformation between the inner landmarks of the source image and the inner landmarks of the destination image:

```python
M = transformation_from_points(
        source_landmarks[ALIGN_POINTS],
        destination_landmarks[ALIGN_POINTS])
```

2. Print the transformation matrix and analyse its different components.

3. From this matrix, estimate the translation, the scaling factor and the angle of the rotation.

In [None]:
M = transformation_from_points(
        source_landmarks[ALIGN_POINTS],
        destination_landmarks[ALIGN_POINTS])

print('Transformation matrix between the two faces:')
print(M)
print('')

print('Translation:', M[0, 2], M[1, 2])
print('')
s = np.sqrt(M[0, 0]*M[1,1] - M[0, 1]*M[1, 0])
print('Scaling:', s)
print('')
theta = np.arccos(M[0, 0]/s)
print('Angle:', theta*180/np.pi)

### **Task 7:**  Extracting masks


In [None]:
OVERLAY_POINTS = [
    LEFT_EYE_POINTS + RIGHT_EYE_POINTS + LEFT_BROW_POINTS + RIGHT_BROW_POINTS,
    NOSE_POINTS + MOUTH_POINTS
]

def get_face_mask(img, landmarks):
    "Extracts a mask on an image around the important regions."
    # Create an empty mask
    mask = np.zeros(img.shape[:2], dtype=np.float64)
    # Compute the mask by computing the convex hull.
    for group in OVERLAY_POINTS:
        points = cv2.convexHull(landmarks[group])
        cv2.fillConvexPoly(mask, points, color=1)
    # Transform the mask into an image
    mask = np.array([mask, mask, mask]).transpose((1, 2, 0))
    # Blur the mask
    mask = (cv2.GaussianBlur(mask, (11, 11), 0) > 0) * 1.0
    mask = cv2.GaussianBlur(mask, (11, 11), 0)
    return mask

1. Use the provided method to extract masks on the two images.

2. Visualize the masks and the pixels under them by multiplying the image with the mask element per element.

```python
source_masked = source * source_mask
```

- Warning: you have to cast to the cropped image back to `np.uint8` to visualize it correctly. 

In [None]:
# Mask over the source image
source_mask = get_face_mask(source, source_landmarks)

# Mask over the destination image
destination_mask = get_face_mask(destination, destination_landmarks)

# Apply the mask on the source image
source_masked = source * source_mask

# Apply the mask on the destination image
destination_masked = destination * destination_mask

# Plot the masks
ax = plt.subplot(221)
plt.imshow(source_mask)
plt.xticks([]); plt.yticks([])
ax = plt.subplot(222)
plt.imshow(destination_mask)
plt.xticks([]); plt.yticks([])
ax = plt.subplot(223)
plt.imshow(source_masked.astype(np.uint8))
plt.xticks([]); plt.yticks([])
ax = plt.subplot(224)
plt.imshow(destination_masked.astype(np.uint8))
plt.xticks([]); plt.yticks([])

plt.show()

### **Task 8:** Warping the source image

- One can warp the source image to the destination using the similarity transform computed before: 

```python
source_warped = cv2.warpAffine(source,
                   M,
                   (destination.shape[1], destination.shape[0]),
                   dst=None,
                   borderMode=cv2.BORDER_TRANSPARENT
                   flags=cv2.WARP_INVERSE_MAP
                )
```

1. Warp the source image to the destination and show both warped source and destination:

In [None]:
#######################################
# Warp the source image to the destination
#######################################

source_warped = cv2.warpAffine(source,
                   M,
                   (destination.shape[1], destination.shape[0]),
                   dst=None,
                   borderMode=cv2.BORDER_TRANSPARENT,
                   flags=cv2.WARP_INVERSE_MAP
                )
# Plot the warped images
ax = plt.subplot(121)
plt.imshow(source_warped)
plt.xticks([]); plt.yticks([])
ax = plt.subplot(122)
plt.imshow(destination)
plt.xticks([]); plt.yticks([])
plt.show()

### **Task 9:** Blending the warped source image to the destination using the mask

* One could simply "cut" the warped source image with the destination mask and compose the two images.

* It is much more elegant to first warp the source mask and then "add" it to the destination mask.

* The reason is that the Procrustes analysis is only an approximation, the warped source mask does not match exactly the destination mask.

```python
warped_mask = cv2.warpAffine(source_mask,
                   M,
                   (destination.shape[1], destination.shape[0]),
                   dst=None,
                   borderMode=cv2.BORDER_TRANSPARENT,
                   flags=cv2.WARP_INVERSE_MAP
                )

combined_mask = np.max([destination_mask, warped_mask],
                          axis=0) 
```

1. Warp the source mask to the destination

2. Create the combined mask

In [None]:
#######################################
# Warp the source mask to the destination
#######################################

# Warp the mask of the source image to the destination
warped_mask = cv2.warpAffine(source_mask,
                   M,
                   (destination.shape[1], destination.shape[0]),
                   dst=None,
                   borderMode=cv2.BORDER_TRANSPARENT,
                   flags=cv2.WARP_INVERSE_MAP
                )

# Combine the warped source mask and the destination mask
# This avoids "missing" pixels after composition
combined_mask = np.max([destination_mask, warped_mask],axis=0)

3. Crop the warped source using the combined mask.

4. Crop the destination image using `(1 - combined_mask)`.

5. Add the two images pixel by pixel (**blending**).

6. Show the cropped source, the cropped destination and the composition.

* The face regions are aligned, but the textures do not match.

In [None]:
#######################################
# Compose the warped source image with the destination using the mask
#######################################

# Crop the warped source image
source_cropped = source_warped * combined_mask

# Anti-crop the destination image
destination_cropped = destination * (1 - combined_mask)

# Add the two to compose the two images
composition =  source_cropped + destination_cropped

plt.subplot(131)
plt.imshow(source_cropped.astype(np.uint8))
plt.title('Source')
plt.xticks([]); plt.yticks([])
plt.subplot(132)
plt.imshow(destination_cropped.astype(np.uint8))
plt.title('Destination')
plt.xticks([]); plt.yticks([])
plt.subplot(133)
plt.imshow(composition.astype(np.uint8))
plt.title('Composition')
plt.xticks([]); plt.yticks([])

plt.show()

### **Task 10:** Color correction

* The last step is to correct the color balance of the warped source image to match the color profile of the destination image. 

* Matthew Earl proposes to use a variant of **RGB color balancing**:

    1. The two images are filtered by a HUGE Gaussian filter (its width is equal to the distance between the eyes!). This forms a kind of local histogram of colors.

    2. The color of the pixels of the warped source image are "normalized" to locally have the same distribution as the destination image.

In [None]:
def correct_colours(source, destination, landmarks):
    "RGB color scaling correction"

    # Compute the size of the Gaussian filter by measuring the distance between the eyes
    blur_amount = 0.6*np.linalg.norm(
                      np.mean(landmarks[LEFT_EYE_POINTS], axis=0) -
                      np.mean(landmarks[RIGHT_EYE_POINTS], axis=0))
    blur_amount = int(blur_amount)
    if blur_amount % 2 == 0:
        blur_amount += 1

    # Blur the two images
    destination_blur = cv2.GaussianBlur(destination, (blur_amount, blur_amount), 0)
    source_blur = cv2.GaussianBlur(source, (blur_amount, blur_amount), 0)

    # Avoid divide-by-zero errors.
    source_blur += (128 * (source_blur <= 1.0)).astype(source_blur.dtype)

    # Compute the color-corrected image
    return (source.astype(np.float64) * destination_blur.astype(np.float64) 
                    / source_blur.astype(np.float64))


1. Using the function `correct_colours`, correct the colors of the warped source image.

2. Show the corrected source and the destination.

3. Blend corrected source with the destination image.

3. Renormalize the resulting image:

```python
composition = cv2.normalize(composition, None, 0.0, 255.0, cv2.NORM_MINMAX)
```

In [None]:
#######################################
# Correct the colors
#######################################

# Correct colors in the warped image
source_warped_corrected = correct_colours(source_warped, destination, destination_landmarks)

# Plot the color-matched images
ax = plt.subplot(121)
plt.imshow(source_warped_corrected.astype(np.uint8))
plt.xticks([]); plt.yticks([])
ax = plt.subplot(122)
plt.imshow(destination)
plt.xticks([]); plt.yticks([])
plt.show()

# Add the color-corrected warped image to the destination
composition_corrected = destination * (1 - combined_mask) + source_warped_corrected * combined_mask

# Normalize the image
composition_corrected = cv2.normalize(composition_corrected, None, 0.0, 255.0, cv2.NORM_MINMAX)

plt.figure()
plt.imshow(composition_corrected.astype(np.uint8))
plt.xticks([]); plt.yticks([])

plt.show()

### **Task 11:** Try it on other images

- You are done! Now try it on other images.

<img src="img/merkel_clinton.jpg" alt="img/merkel_clinton.jpg" width="300"/> 

<img src="img/vitayhamker.png" alt="img/vitayhamker.png" width="600"/> 
<img src="img/vamker.png" alt="img/vamker.png" width="300"/> 


### Copyright notice:

```python
# Copyright (c) 2015 Matthew Earl
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
#     The above copyright notice and this permission notice shall be included
#     in all copies or substantial portions of the Software.
#
#     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
#     OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
#     MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
#     NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
#     DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
#     OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
#     USE OR OTHER DEALINGS IN THE SOFTWARE.

"""
This is the code behind the Switching Eds blog post:
    
    http://matthewearl.github.io/2015/07/28/switching-eds-with-python/

The code is available at:

    https://github.com/matthewearl/faceswap/blob/master/faceswap.py

To run the script you'll need to install dlib (http://dlib.net) including its
Python bindings, and OpenCV. 

You'll also need to obtain and unzip the trained model from sourceforge:

    http://sourceforge.net/projects/dclib/files/dlib/v18.10/shape_predictor_68_face_landmarks.dat.bz2
"""
```

---