## 4. Fitting "in-the-wild" videos

#### Prerequisites

- Access to a combined 3D facial shape model of identity and expression
- Access to an "in-the-wild" texture model
- Access to an "in-the-wild" video with landmarks to fit. The per-frame landmarks should be given as input and should follow the iBUG 68 mark-up. It is highly recommended that these landmarks are "3D-Aware-2D" (3DA-2D) landmarks, as e.g. described in the paper (Zafeiriou et al., "The 3D menpo facial landmark tracking challenge", ICCV-W 2017).

The first two of these can be generated by following the two previous notebooks in this folder:

- `1. Building an "in-the-wild" texture model.ipynb`
- `2. Creating an expressive 3DMM.ipynb`

Running these notebooks to completion will lead to the following two files being generated in the `DATA_DIR` folder:

- `itw_texture_model.pkl`
- `id_exp_shape_model.pkl`

This script shows how to load these directly and use them in fitting.

You could of course understand the required formats by studying the aformentinoed scripts, and instead load your own shape and texture models instead.

In [None]:
from pathlib import Path
import numpy as np

import menpo.io as mio
from menpo.base import LazyList
from menpo.shape import PointCloud
from menpo.visualize import print_progress
from menpo3d.morphablemodel import ColouredMorphableModel

from itwmm import (
    initialize_camera_from_params, initialize_camera,
    fit_video, instance_for_params,
    render_initialization, render_iteration,
)

# Replace DATA_PATH with the path to your data. It should have files:
#  itw_texture_model.pkl
#  id_exp_shape_model.pkl
# As generated from Notebooks 1. and 2.)
DATA_PATH = Path('~/Dropbox/itwmm_src_data/').expanduser()

In [None]:
def prepare_image_and_return_transforms(diagonal, feature_f, image):
    # this variation is needed if we need to know how the imput image 
    # is transformed
    img, t = image.crop_to_landmarks_proportion(0.4, return_transform=True)
    img, scale = img.rescale_landmarks_to_diagonal_range(diagonal, return_transform=True)
    return {
        'image': feature_f(img),
        't': t.translation_component,
        'scale': scale.scale[0]
    }


def load_id_exp_shape_model(path):
    sm_dict = mio.import_pickle(path)
    shape_model = sm_dict['shape_model']
    lms = sm_dict['lms']
    id_ind = sm_dict['id_ind']
    exp_ind = sm_dict['exp_ind']
    return shape_model, lms, id_ind, exp_ind


def load_itw_texture_model(path):
    tm_dict = mio.import_pickle(path)
    texture_model  = tm_dict['texture_model']
    diagonal_range = tm_dict['diagonal_range']
    feature_function = tm_dict['feature_function']
    return texture_model, diagonal_range, feature_function

## Prepare data and model

In [None]:
# LOAD SHAPE MODEL
# note that id_ind and exp_ind are two index mappings into the components of
# this special combined shape model. The first records the index position of
# components that are related to identitiy, the second an index of the (remaining)
# components which are related to shape.
shape_model, lms, id_ind, exp_ind = load_id_exp_shape_model(DATA_PATH / 'id_exp_shape_model.pkl')

# record the number of ID / EXP params
n_p, n_q = id_ind.shape[0], exp_ind.shape[0]

# LOAD ITW TEXTURE MODEL
# Note we have to know the diagonal setting and feature used in the texture model.
texture_model, diagonal_range, feature_function = load_itw_texture_model(DATA_PATH / 'itw_texture_model.pkl')

# construct our Morphable Model that we can use in the fitting approaches below
mm = ColouredMorphableModel(shape_model, texture_model, lms, 
                            holistic_features=feature_function,
                            diagonal=diagonal_range)

In [None]:
# load some images and prepare them for fitting.
# Note that we have to rescale the images/extract the feature we used for the model
# ourselves. Unlike previous menpo fitting routines, fit_video is a simpler implementation.
# it requires us to explicitly do more before we call fit_video, but it is much simpler to
# follow what is being done in the code.
lim_frames = 100

frame_ids = LazyList.init_from_iterable(
    [p.stem for p in mio.image_paths('video_dir/')][:lim_frames])
frames = frame_ids.map(lambda fid: mio.import_image('video_dir/{}.png'.format(fid)))

transform_info = [prepare_image_and_return_transforms(diagonal_range, feature_function, i) for i in 
                  print_progress(frames, prefix='processing images')]
images = [x.pop('image') for x in transform_info]

n_images = len(images)

## Initialize parameters

In [None]:
%matplotlib inline
# initialize the shape weights to zero (mean)
p = np.zeros(n_p)
qs = np.zeros([n_images, n_q])

# initialize all cameras with a large focal length (orthogathic)
cameras = [initialize_camera_from_params(img, mm, id_ind, exp_ind, p, q, focal_length=99999999) 
           for img, q in zip(images, qs)]
cs = np.vstack([camera.as_vector() for camera in cameras])
template_camera = cameras[0]

# Check the initialization looks sensible (for first and last frame)
render_initialization(images, mm, id_ind, exp_ind, template_camera, p, qs, cs, 0).view()
render_initialization(images, mm, id_ind, exp_ind, template_camera, p, qs, cs, -1).view(new_figure=True)

## ITW Video fitting

In [None]:
# Actually run the optimisation.
# Return is a list of parameters recovered per-iteration per-frame.
#
# E.g. to access the 3'rd frame's parameters at the 6th iteration:
#   params[5][2]  # (lists are 0-based in Python)
#
params = fit_video(images, mm, id_ind, exp_ind, template_camera, 
                   p, qs, cs,
                   lm_group='PTS', n_iters=10, 
                   c_f=3.,
                   c_id=1. * n_images,
                   c_exp=3.,
                   c_l=1.,
                   c_sm=6.,
                   n_samples=1000, compute_costs=True)

## Inspect results

In [None]:
# now we render the fitting.
frame_no = 5
iter_no = -1
frames[frame_no].view()
render_iteration(mm, id_ind, exp_ind, images[0].shape, 
                 template_camera, params, frame_no, iter_no).view(new_figure=True)