<p style="text-align: center;font-size: 40pt">Error minimization</p>

# Overview 

Requirements
- [Processing point clouds](2-lesson_processing.ipynb)
- [Outliers](3-lesson_outliers.ipynb)

Objectives of this lesson:
- explain the goal of error minimization
- give the mathematical developments of the minimization

Hidden custom latex commands here $ \curvearrowright$

----
[comment]: <> (General commands)
$\DeclareMathOperator*{\argmin}{arg\,min}$
$\DeclareMathOperator{\error}{error}$
$\DeclareMathOperator*{\match}{match}$
$\DeclareMathOperator{\distance}{d}$
$\DeclareMathOperator{\outlier}{outlier}$
$\DeclareMathOperator{\weight}{w}$
$\DeclareMathOperator{\datafilter}{datafilter}$

$\newcommand{\mat}[1]{\mathbf{#1}}$
$\newcommand{\point}[2][]{{}^{#1}\mathbf{#2}}$
$\newcommand{\frame}[1]{\mathcal{#1}}$
$\newcommand{\shape}[2][]{{}^{#1}\mathcal{#2}}$
$\newcommand{\matches}[1]{\mathcal{#1}}$
$\newcommand{\transformation}[3][T]{{}_{#2}^{#3}\mat{#1}}$
$\newcommand{\weights}[1]{\mathcal{#1}}$
$\newcommand{\textcomma}{\quad,}$
$\newcommand{\textdot}{\quad.}$
$\newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}}$
----

# Introduction
The aim of <tt>error minimization</tt> is to solve [Equation 1.2](1-lesson_overview.ipynb#error_minimization) from the overview lesson, which is

$$
    \transformation{i}{i+1} \gets \argmin\limits_{\transformation{}{}}\left(\error\left(\transformation{}{}\left(\shape[i]{P'}\right), \mathcal{Q}'\right)\right)
    \textdot
$$

This step relies on the definition of an error metric calculated from the association of features and needs to be resolved using an error model.
The error model can be sometimes the same as the distance metric used at the matching stage, but the main difference is that error is only defined in the feature space and not in the descriptor space.
This is because only features are influenced by transformation parameters, as listed in [Table 2.4](2-lesson_processing.ipynb#influenceFunc).
So, if the association is based on descriptor distances, another error must be defined to correct the misalignment.
Parameters selected for the minimization should follow an expected deformation model. 
[Zitová and Flusser [2003]](https://www.sciencedirect.com/science/article/abs/pii/S0262885603001379) present two generic types to classify error metrics: global (e.g., rigid, affine transform, perspective projection model) and local (e.g., radial basis functions, elastic registration, fluid registration, diffusion-based, level sets, optical-flow-based registration).

# Shape Morphing
Most of the data association algorithms based on point clouds use a global-rigid error. 
This error metric is parametrized by three translations and three rotations parameters for a total of six Degrees of Freedom (DoF), when dealing with 3D point clouds.
Recall that there is only three DoF for point clouds in two dimensions.
Point-to-point error uses the most basic primitive and was first introduced in a registration context by [Besl and McKay [1992]](https://ieeexplore.ieee.org/document/121791) and used subsequently in multiple solutions [\[Godin et al., 1994](https://www.spiedigitallibrary.org/conference-proceedings-of-spie/2350/1/Three-dimensional-registration-using-range-and-intensity-information/10.1117/12.189139.short), [Pulli, 1999](https://ieeexplore.ieee.org/document/805346), [Druon et al., 2006](https://ieeexplore.ieee.org/document/4097937), [Pan et al., 2010](https://ieeexplore.ieee.org/document/5476132), [Kim, 2010\]](https://ieeexplore.ieee.org/document/5373827).
During the matching step, it might happen that different kind of geometric primitives (e.g., point, line, curve, plane, quadric) are matched together.
Multiple error metrics were developed for those situations and we want to bring them under the same concept that we introduce as *Shape Morphing*.
Essentially, when a primitive with higher dimensionality is matched with a lower one, it is morphed via projective geometry to adapt to its counterpart.
[Figure 2.6](#shapeMorph) presents the list of possible combination for a 2D space and illustrates the concept for different errors. Using the subfigure labeled point-to-line as an example, a point in solid red matches a line in dashed blue.
To generate an alignment error, a virtual point (i.e., the empty blue circle) is generated by projection. 
The same principle applies to point-to-curve and line-to-curve.
Although not depicted in [Figure 2.6](#shapeMorph), their 3D counterparts (i.e., points, planes, quadrics) follow the same projection principle.
<p id="shapeMorph" style="text-align: center;">
    <img src="images/shape_morphing.png" width="50%"/> <br/>
    <b>Figure 2.6:</b> Possible morphing in 2D.
    The real underlaying shape is represented in light gray with its approximation in dark blue. 
    The misaligned surface is represented by a point in light red.
    The resulting errors are represented with black arrows.
</p>

The most represented example is the point-to-plane introduced by [Chen and Medioni [1992]](https://www.sciencedirect.com/science/article/abs/pii/026288569290066C) and then reused in multiple works [\[Champleboux et al., 1992](https://ieeexplore.ieee.org/document/223223), [Gagnon et al., 1994](https://ieeexplore.ieee.org/document/323796), [Bergevin et al., 1996](https://ieeexplore.ieee.org/document/494643), [Gelfand et al., 2003\]](https://ieeexplore.ieee.org/document/1240258).
Its 2D version, point-to-line, is also used in robotics [[Bosse and Zlot, 2009a]](https://www.sciencedirect.com/science/article/abs/pii/S0921889009000992) and a closed-form solution was presented by [Censi [2008]](https://ieeexplore.ieee.org/document/4543181).
Using higher complexity to represent 3D primitives, [Segal et al. [2009]](http://www.roboticsproceedings.org/rss05/p21.html) propose the use of plane-to-plane, while early work of [Feldmar and Ayache [1996]](https://link.springer.com/article/10.1007/BF00054998) uses quadric-to-quadric.

It is also possible to find extensions to those error metrics: point-to-point with extrapolation and damping [[Zinsser et al., 2003]](https://ieeexplore.ieee.org/document/1246775), a mix of point-to-line with odometry error [[Diebel et al., 2004]](https://ieeexplore.ieee.org/document/1389948), a mix of point-to-point, point-to-line or point-to-plane with angle [[Armesto et al., 2010]](https://ieeexplore.ieee.org/document/5509371) and mix of point-to-point with Boltzmann-Gibbs-Shannon entropy and Burg entropies [[Liu, 2010]](https://ieeexplore.ieee.org/document/5291420).
Entropy based methods used in medical registration were reviewed by [Pluim et al. [2003]](https://ieeexplore.ieee.org/document/1216223) as being: Shannon, Rodriguez and Loew, Jumarie, Rényi entropies.
All those techniques rely on mean squared error.

Recently, [Silva et al. [2005]](https://ieeexplore.ieee.org/document/1407879) introduce a novel error called Surface Interpenetration Measure (SIM), which presents more robustness against different noise types. 
This measure was then applied later by [Pan et al. [2010]](https://ieeexplore.ieee.org/document/5476132) for face recognition. 
Image registrations mainly use affine transformations including skew and scale deformations like in [[Lowe, 2004]](https://link.springer.com/article/10.1023%2FB%3AVISI.0000029664.99615.94). 
A more complex hierarchy of error models, presented by [Stewart et al. [2003]](https://ieeexplore.ieee.org/document/1242341), increases the transformation parameter complexity from similarity to affine, reduced quadratic and finally quadratic.
Those error models allow them to achieve higher precision on the final alignment, while avoiding heavy computation at the beginning of the minimization.

# Optimization
Once the error model is defined, the problem is to select a strategy or scheme to find the transformation with the minimum error.
Different optimization strategy are reviewed and discussed by [Rusinkiewicz and Levoy [2001]](https://ieeexplore.ieee.org/document/924423).
The authors mention the possible use of Singular Value Decomposition (SVD) [[Arun et al., 1987]](https://ieeexplore.ieee.org/document/4767965), quaternions [[Horn, 1987]](http://josaa.osa.org/abstract.cfm?URI=josaa-4-4-629), orthonormal matrices [[Horn et al., 1988]](https://www.osapublishing.org/josaa/abstract.cfm?uri=josaa-5-7-1127), and dual quaternions [[Walker et al., 1991]](https://www.sciencedirect.com/science/article/abs/pii/104996609190036O) for the point-to-point objective function.
It is noted that the results provided by those solutions are quite similar when the association between points is unknown [[Eggert et al., 1997]](https://link.springer.com/article/10.1007/s001380050048).
This is why these optimization solutions are only briefly listed in this review.
In the case of the point-to-plane error, linearization based on small angle approximation is mainly used following its original implementation [[Chen and Medioni, 1991]](https://ieeexplore.ieee.org/document/132043).
Other objective functions for point cloud alignment rely on histogram correlation [[Bosse and Zlot, 2008]](https://journals.sagepub.com/doi/abs/10.1177/0278364908091366), tensor voting [[Reyes et al., 2007]](https://link.springer.com/article/10.1007/s11263-007-0038-z), or Hough transform [\[Lowe, 2004](https://link.springer.com/article/10.1023%2FB%3AVISI.0000029664.99615.94), [Censi,
2006\]](https://ieeexplore.ieee.org/document/1642044).

## Example: minimizing point-to-point error
In the case of the point-to-point error, the error is the Euclidean distance:

\begin{aligned}
\error(\shape{P}, \shape{Q}) 
&= \sum_{(\point{p}, \point{q})\in\matches{M}'}{\|\point{p}-\point{q}\|_2}\\
&= \sum_{k=1}^K \left\| \point{p}_k - \point{q}_k \right\|_2
\textcomma
\end{aligned}

where $K$ is the number of tuples in $\matches{M}'$.
With some abuse of the notation for homogeneous and cartesian coordinates, the error minimization is then
\begin{aligned}
\transformation{i}{i+1} 
&= \argmin_{\transformation{}{}}
\left(
    \sum_{k=1}^K \left\| \transformation{}{}\point{p}_k - \point{q}_k \right\|_2
\right)\\
&= \argmin_{\transformation{}{}}\left(\sum_{k=1}^K \left\| \mat{R}\point{p}_k + \mat{t} - \point{q}_k \right\|_2\right)
\textdot
\end{aligned}

In that case, this minimization problem can be solved analytically by computing the centroids (i.e., the average of coordinates) of the point clouds, and the singular value decomposition of the covariance [[Arun et al., 1987]](https://ieeexplore.ieee.org/document/4767965).
More precisely, lets compute the centroids $\mat{\mu}_p$ and $\mat{\mu}_q$ of each point cloud using

$$
\mat{\mu}_p = \frac{1}{K} \sum_{k=1}^K \point{p}_k
$$ 
and 
$$
\mat{\mu}_q = \frac{1}{K} \sum_{k=1}^K \point{q}_k
\textdot
$$

The cross-covariance can then be computed using

$$
\mat{H} = \sum_{k=1}^K (\point{p}_k - \mat{\mu}_p)(\point{q}_k - \mat{\mu}_q)^\top
\textdot
$$

The singular value decomposition (SVD) of the matrix $\mat{H}$ can be used to express the same matrix as a multiplication of three others, such that

$$
\mat{H} = \text{svd}\left(\mat{H}\right) = \mat{U}\mat{\Lambda}\mat{V}^\top
$$

It was demonstrated by [[Arun et al., 1987]](https://ieeexplore.ieee.org/document/4767965)that the optimal transformation can be computed using

$$
    \left\{
        \begin{array}{rl}
            \hspace{-54pt} \mbox{$\hat{\mat{R}} = \mat{V}\mat{U}^\top$} \\
          	\hspace{5pt}\mbox{$\hat{\mat{t}} = \mat{\mu}_q - \hat{\mat{R}} \mat{\mu}_p$.}
        \end{array}
    \right.
$$

## Example: minimizing point-to-plane error

Another error often used is point-to-plane error, which is only the distance between a point and the plane defined by another point and the normal associated to it, such that

$$
     \error(\shape{P}, \shape{Q}) = \sum\limits_{k=1}^K \left\| (\point{p}_k - \point{q}_k) \cdot \vec{n}_k \right\|_2
     \textcomma
$$

where $\vec{n}_k$ is the normal vector around the 3D point $\point{q}_k$ in <tt>reference</tt>.
The usual method relies on the linearization of the rotation matrix $\mat{R}$ using small angles leading to

$$
    \mat{R} = R(\alpha, \beta, \gamma) \approx 
    \left[ \begin{array}{ccc} 
    1 & -\gamma & \beta \\
    \gamma & 1 & -\alpha \\
    -\beta & \alpha & 1
    \end{array} \right] 
    = [\mat{r}]_\times + \mat{I}
    \textdot
$$

The full transformation is parametrized by six degrees of freedom, so our optimization vector becomes

$$
    \transformation{}{} = \mat{\tau} = 
    \left[ \begin{array}{c}
    \mat{r} \\
    \mat{t}
    \end{array} \right]  =
    \left[ \begin{array}{c}
    \alpha \\
    \beta \\
    \gamma \\
    t_x \\
    t_y \\
    t_z
    \end{array} \right]
    \textdot
$$

Under these assumptions, the optimal parameters can be obtained by solving the following linear system (see [next section](#Derivation-for-Point-to-Plane-Error) for more details):
<a id="minPointToPlane"></a>

$$
     \hspace{140pt} \mat{G}\mat{G}^\top\mat{\tau} = \mat{G}\mat{h} \hspace{140pt} \mathbf{(2.1)}
$$

where

$$
    \mat{G}= \bmat{\cdots & \begin{matrix} \point{p}_k\times\mat{n}_k \\ \mat{n}_k \end{matrix}& \cdots}
$$

is a $6 \times K$ matrix and 

$$
    \mat{h}=\bmat{\vdots \\ (\point{q}_k-\point{p}_k)\cdot\mat{n}_k \\ \vdots}
$$

is a column vector of $K$ elements.
The linear system of [Equation 2.1](#minPointToPlane) is of the classical form of $\mat{A}\mat{x} = \mat{b}$, which can be resolved for $\mat{x}$, or $\mat{\tau}$ in our case, using the Cholesky decomposition.

### Derivation for Point-to-Plane Error
This appendix presents a solution for minimizing the point-to-plane error in 3D. We first define our transformation parameter set $\transformation{}{}$ as a 6D vector:

$$
    \transformation{}{} = \mat{\tau} = 
    \left[ \begin{array}{c}
    \mat{r} \\
    \mat{t}
    \end{array} \right]  =
    \left[ \begin{array}{c}
    \alpha \\
    \beta \\
    \gamma \\
    t_x \\
    t_y \\
    t_z
    \end{array} \right]
    \textcomma
$$

where $\alpha$, $\beta$ and $\gamma$ are the rotational components, while $t_x$, $t_y$ and $t_z$ are the translation components.
We also define the objective function for point-to-plane:

<a id="p2plane"></a>
$$
    \hspace{140pt} e_\mathrm{p\Phi} = \sum\limits_{k=1}^K \left\| \left[(\mat{R}\point{p}_k + \mat{t})- \point{q}_k\right] \cdot \mat{n}_k \right\|_2 , \hspace{140pt} \mathbf{(A.2)}
$$

where $\mat{n}_k$ is the normal vector representing the surface at the point $\point{q}_k$ and the index $k$ represents paired points.

The method presented here rely on rotation matrix linearization.
This linearization can be achieved using the small-angle approximation:

<a id="rotLin"></a>
$$
    \hspace{122pt} \mat{R} = R(\alpha, \beta, \gamma) \approx 
    \left[ \begin{array}{ccc} 
    1 & -\gamma & \beta \\
    \gamma & 1 & -\alpha \\
    -\beta & \alpha & 1
    \end{array} \right] 
    = [\mat{r}]_\times + \mat{I}, \hspace{122pt} \mathbf{(A.3)}
$$

where $[\mat{r}]_\times$ is a cross-product operator transforming the vector $\mat{r}$ to a $3 \times 3$ skew-symmetric matrix.
In the context of ICP, the impact of linearization is reduced through the iterative process of the whole registration algorithm.
Combining [Equation A.2](#p2plane) with [Equation A.3](#rotLin), we can approximate the objective function as

$$
    e_\mathrm{p\Phi} \approx \sum\limits_{k=1}^K \left\| [([\mat{r}]_\times + \mat{I})\point{p}_k + \mat{t} - \point{q}_k] \cdot \mat{n}_k \right\|_2 \\
    \hspace{64pt} \approx \sum\limits_{k=1}^K \left\| (\mat{r} \times \point{p}_k) \cdot \mat{n}_k + \point{p}_k \cdot \mat{n}_k + \mat{t} \cdot \mat{n}_k - \point{q}_k \cdot \mat{n}_k \right\|_2 ,
$$

which can be rewritten using the *scalar triple product* and by reorganizing the terms

$$
    e_\mathrm{p\Phi} \approx \sum\limits_{k=1}^K \left\| \mat{r} \cdot \underbrace{(\point{p}_k \times \mat{n}_k)}_{\mat{c}_k} + \mat{t} \cdot \mat{n}_k - \underbrace{(\point{q}_k - \point{p}_k)}_{\mat{d}_k} \cdot \mat{n}_k  \right\|_2 \\
    \hspace{-37pt} \approx \sum\limits_{k=1}^K \left\| \mat{r} \cdot \mat{c}_k + \mat{t} \cdot \mat{n}_k - \mat{d}_k \cdot \mat{n}_k  \right\|_2 ,
$$

We can then minimize the error $e_\mathrm{p\Phi}$ with respect to $\mat{r}$ and $\mat{t}$ and setting the partial derivatives to zero

$$
    \frac{\partial e_\mathrm{p\Phi}}{\partial \mat{r}} = \sum\limits_{k=1}^K 2 \mat{c}_k (\mat{r} \cdot \mat{c}_k + \mat{t} \cdot \mat{n}_k - \mat{d}_k \cdot \mat{n}_k) = \mat{0} \\
    \frac{\partial e_\mathrm{p\Phi}}{\partial \mat{t}} = \sum\limits_{k=1}^K 2 \mat{n}_k (\mat{r} \cdot \mat{c}_k + \mat{t} \cdot \mat{n}_k - \mat{d}_k \cdot \mat{n}_k) = \mat{0}
$$

We can assemble those derivative under the linear form $\mat{A}\mat{\tau}=\mat{b}$, by bringing the independent variables on the right side of the equation

$$
    \sum\limits_{k=1}^K
    \left[ \begin{array}{cc}
    \mat{c}_k (\mat{r} \cdot \mat{c}_k) + \mat{c}_k (\mat{t} \cdot \mat{n}_k)   \\
    \mat{n}_k (\mat{r} \cdot \mat{c}_k) + \mat{n}_k (\mat{t} \cdot \mat{n}_k)  
    \end{array} \right]
    =
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k (\mat{d}_k \cdot \mat{n}_k)   \\
    \mat{n}_k (\mat{d}_k \cdot \mat{n}_k)
    \end{array} \right]
    \\
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k \mat{c}_k^\top \mat{r} + \mat{c}_k \mat{n}_k^\top \mat{t}    \\
    \mat{n}_k \mat{c}_k^\top \mat{r} + \mat{n}_k \mat{n}_k^\top \mat{t}  
    \end{array} \right]
    =
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k (\mat{d}_k \cdot \mat{n}_k)   \\
    \mat{n}_k (\mat{d}_k \cdot \mat{n}_k)
    \end{array} \right]
    \\
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k \mat{c}_k^\top  & \mat{c}_k \mat{n}_k^\top    \\
    \mat{n}_k \mat{c}_k^\top  & \mat{n}_k \mat{n}_k^\top   
    \end{array} \right]
    \left[ \begin{array}{cc}
    \mat{r} \\
    \mat{t}
    \end{array} \right]
    =
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k \\
    \mat{n}_k 
    \end{array} \right] (\mat{d}_k \cdot \mat{n}_k)
$$

which brings us to the linear system of equations that we were looking for

<a id="minPointToPlaneBis"></a>
$$
    \hspace{128pt} \underbrace{
    \sum\limits_{k=1}^K 
    \left[ \begin{array}{cc}
    \mat{c}_k   \\
    \mat{n}_k  
    \end{array} \right]
    \left[ \begin{array}{cc}
    \mat{c}_k^\top  &  \mat{n}_k^\top \\
    \end{array} \right]
    }_{\mat{A}_{6 \times 6}}
    \mat{\tau}
     = 
    \underbrace{
    \sum\limits_{k=1}^K
    \left[ \begin{array}{c}
    \mat{c}_k \\
    \mat{n}_k \\
    \end{array} \right] 
    (\mat{d}_k \cdot \mat{n}_k)
    }_{\mat{b}_{6 \times 1}} \hspace{128pt} \mathbf{(A.4)}
$$

Once the matrix $\mat{A}$ and the vector $\mat{b}$ can be constructed, the linear system of [Equation A.4](#minPointToPlaneBis) can be resolved for $\mat{\tau}$ using the Cholesky decomposition.
Implementing such solution will require a loop for the summations over $K$ to build $\mat{A}$ and $\mat{b}$.
An alternative formulation relying on dense matrix multiplication can be computed by assembling

$$
    \mat{G}=
    \underbrace{
    \left[\cdots\begin{array}{c}\point{p}_k\times\mat{n}_k\\\mat{n}_k\end{array}\cdots\right]
    }_{6 \times K}
$$
and
$$
    \mat{h}=
    \underbrace{
    \left[\begin{array}{c}\vdots\\(\point{q}_k-\point{p}_k)\cdot\mat{n}_k\\\vdots\end{array}\right]
    }_{K \times 1}
$$

leading to 

$$
    \mat{A}\mat{\tau} = \mat{b} 
    \\
    \hspace{5pt} \Updownarrow 
    \\
    \mat{G}\mat{G}^\top\mat{\tau} = \mat{G}\mat{h},
$$

which is the same formulation as proposed in [the last section](#Optimization).

# Conclusion
You should do the following activities to enhance your understanding of the concepts viewed in this lesson:
- modify the markdown by adding your own notes using `> my notes`; and
- complete the tables [Symbol definitions](#Symbol-definitions) and [Glossary](#Glossary) and add your own definitions.

Next steps:
- Do the [assignment](../../exercises/registration/1-assignment_implementation.ipynb) related to this lesson
- Continue the lessons

Next lessons (parallel):
- [Use case 1](5-lesson_use_case_1.ipynb)
- [Use case 2](5-lesson_use_case_2.ipynb)
- [Use case 3](5-lesson_use_case_3.ipynb)

## Symbol definitions

| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Symbol &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Definition                    |
|:--------------------------------:|:-----------------------------------------------------:|
| $\error(\shape{P}, \shape{Q})$   |  Alignment error between $\shape{P}$ and $\shape{Q}$  |
| $\transformation{}{}(\shape{S})$ | Application of the transformation $\transformation{}{}$ to $\shape{S}$ |

## Glossary

| English   | Français   | Definition |
|-----------|------------|------------|
|           |            |            |