<a href="https://colab.research.google.com/github/mlfa19/assignments/blob/master/Module%201/03/Estimating_Face_Pose_from_Images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Estimating Head Pose from Images

Let's try to estimate people's head position from photos. The [original dataset](http://www-prima.inrialpes.fr/perso/Gourier/Faces/HPDatabase.html) link shows how the images where collected. Check out the photos of their setup.

In [0]:
# First, let's download the data (or upload if using colab). 
# You may need to click yes to approve running this code if using colab... Can Paul Ruvolo be trusted?

!wget "https://www.dropbox.com/s/9u9znk0utfr7yjf/HeadPoseImageDatabase.tar.gz?dl=0" -O head_pose.tar.gz
!tar -xvzf head_pose.tar.gz > /dev/null

#old: !wget "https://drive.google.com/uc?authuser=0&id=1304LwlF0o_L0N3njQyB1Gy77FGwUGIUg&export=download" -O head_pose.tar.gz

## Load the data

These data comprise multiple people, each at various head positions (both pitch and yaw).

This code loads the images, crops to the face (face boxes provided in the dataset), and downsamples to 20x20 pixels.

In [0]:
import cv2
import os
import re
import numpy as np
import glob

def load_data():
    """ Load the head pose image dataset from here:
        http://www-prima.inrialpes.fr/perso/Gourier/Faces/HPDatabase.html
        
        returns a tuple containing
            person_id: a list of subject directories where the image came from
            images: the face boxes cropped to (20, 20) grayscale pixels
            head pitch: the pitch of the subject's head
            yaw: the yaw of the subject's head """
    person_ids = []
    images = []
    pitches = []
    yaws = []
    for person_path in glob.glob('Person*'):
        for image_path in glob.glob(os.path.join(person_path, '*.jpg')):
            m = re.search('([-\+][0-9]*)([-\+][0-9]*).jpg$', image_path)
            pitch = float(m.group(1))
            yaw = float(m.group(2))

            # don't use images with extreme pitches
            if np.abs(pitch) > 30:
                continue
            im = cv2.imread(image_path)
            im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
            face_box_path = os.path.join(image_path[:-4] + '.txt')
            with open(face_box_path) as f:
                lines = f.readlines()
                # grab the center pixel coordinate of the face (x_c, y_c) and
                # size (w, h) of the face bounding box
                x_c, y_c, w, h = (int(l) for l in lines[3:])
                # use a square cropping by taking the maximum of the two sizes
                big_length = max(w, h)

                # grab the face by indexing into the numpy array
                face_pixels = im[y_c-big_length//2:y_c+big_length//2,\
                                 x_c-big_length//2:x_c+big_length//2]
                try:
                    # resize the image to a (20, 20) patch to make it easier
                    # for linear regression (less dimensions)
                    face_pixels = cv2.resize(face_pixels, (20, 20))
                    images.append(face_pixels)
                    pitches.append(pitch)
                    yaws.append(yaw)
                    print(person_path, yaw, pitch)
                    person_ids.append(person_path)
                except Exception as ex:
                    continue
    return person_ids, np.array(images), np.array(pitches), np.array(yaws)

person_ids, images, pitches, yaws = load_data()

## Visualize the data
An important first step is to visualize your data to make sure everything is arranged as expected. The code below plots multiple head positions for the first three people. You can alter this code to look at other examples. 

**Discuss: What do you notice about the images?**


In [0]:
%matplotlib inline
import matplotlib.pyplot as plt

def visualize_person(person_id):
    fig_scale = 2
    fig, ax = plt.subplots(5, 13, figsize=(13*fig_scale, 5*fig_scale))
    plt.suptitle(person_id)
    subplot_idx = 0
    for pitch in np.linspace(-30, 30, 5):
        for yaw in np.linspace(-90, 90, 13):
            subplot_idx += 1
            ax = plt.subplot(5, 13, subplot_idx)
            img_idx = np.argwhere(np.logical_and([p == person_id for p in person_ids], np.logical_and(pitches == pitch, yaws == yaw)))
            if img_idx.size:
                plt.set_cmap('gray')
                ax.imshow(images[img_idx[0]].squeeze(), interpolation='none')
                ax.set_axis_off()
    plt.show()

visualize_person('Person01')
visualize_person('Person02')
visualize_person('Person03')

## Build a model
Read and run the code below.

**Discuss: What equation does "w = ..." represent?**

**Discuss: Do you see any visual pattern in the weights (first plot)?**

**Discuss: What is point of the line that starts with "X = np.hstack(..."?**

**Discuss: What should the y-axes label be on the last plot?**


In [0]:
# Reshape the images so each is represented by a vector of pixel values
X = images.reshape((images.shape[0], images.shape[1]*images.shape[2]))
num_pixels = len(X[0]) # num pixels
X = np.hstack((X, np.ones((X.shape[0],1))))

w = np.linalg.inv(X.T.dot(X) + 100000000*np.eye(num_pixels+1)).dot(X.T.dot(yaws))


# Plot the weights
plt.imshow(w[:-1].reshape((20, 20)))
plt.colorbar()
plt.show()


# Plot the actual and predicted yaws
plt.scatter(yaws, X.dot(w))
plt.xlabel('actual yaw (degrees)')
plt.ylabel('predicted yaw (degrees)')
plt.show()

# Make another plot!
plt.scatter(np.arange(len(yaws)),X.dot(w)-yaws,c=pitches,cmap="jet")
plt.xlabel('Data point')
plt.ylabel('What should this label say?')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Pitches')
plt.show()

In the code above, we combined our data across all pitches (looking up vs straight ahead vs down). 

Now, let's only consider that data in which the pitch = 0 (looking straight ahead). 

**Discuss: Why might we opt to do this?**



## Analyze when looking straight ahead (pitch = 0)

Investigate the relationship between the variable lam (for lambda) in the code below. 

**Discuss: What happens when your value for lam is much lower or higher than starting value? Condisder this in relation to the 3 plots. What happens to the fit of the model (e.g., when is it overfit or underfit)?**






In [0]:
pitch_value = 0
lam = 10000000 #10000000 was starting point
X_restricted = X[pitches == pitch_value, :]
w = np.linalg.inv(X_restricted.T.dot(X_restricted) + lam*np.eye(X.shape[1])).dot(X_restricted.T.dot(yaws[pitches == pitch_value]))

# Plot weights
plt.imshow(w[:-1].reshape((20, 20)))
plt.colorbar()
plt.show()

# Plot Predicted and Actual yaw
plt.scatter(yaws[pitches == pitch_value], X_restricted.dot(w))
plt.xlabel('actual yaw (degrees)')
plt.ylabel('predicted yaw (degrees)')
plt.show()

# Plot residuals
plt.scatter(np.arange(len(yaws[pitches == pitch_value])),X_restricted.dot(w)-yaws[pitches == pitch_value])
plt.xlabel('Data point')
plt.ylabel('Residuals')
plt.show()

## Consider the model and data

Now that you've had a chance to work with these data, refer back to the [original dataset](http://www-prima.inrialpes.fr/perso/Gourier/Faces/HPDatabase.html) and data collection protocol. Check out the photos of their setup. You can also see a few sample videos of people.

**Discuss: What are some of the limitations of this data set and/or the linear regression model?**


You may also want to check out the [Gender Shades Project](http://gendershades.org/overview.html). 
