<p style="text-align: center;font-size: 40pt">Error minimization</p>

\section{Error Minimization} \label{sec:errorMin}
%=====================================================================

The aim of \texttt{error minimization} is to solve \autoref{eq:error_minimization}:
\begin{equation*}
    \fromTo*{\mgen{T}}{i}{i+1} \gets \argmin\limits_{\mgen{T}}\left(\error\left(\mgen{T}\left(\inframe*{\mgen{P'}}{i}\right), \mgen{Q}'\right)\right).
\end{equation*}
This step relies on the definition of an error metric calculated from the association of features and needs to be resolved using an error model.
The error model can be sometimes the same as the distance metric used at the matching stage but the main difference is that error is only defined in the feature space and not in the descriptor space.
This is because only features are influenced by transformation parameters, as listed in \autoref{tab:influenceFunc}.
So, if the association is based on descriptor distances, another error must be defined to correct the misalignment.
Parameters selected for the minimization should follow an expected deformation model. 
\citet{Zitova:2003hq} present two generic types to classify error metrics: global (rigid, affine transform, perspective projection model) and local (radial basis functions, elastic registration, fluid registration, diffusion-based, level sets, optical-flow-based registration).


\subsection{Shape Morphing}

Most of the data association algorithms based on point clouds use global-rigid error. 
This error metric is parametrized by 3 translations and 3 rotations parameters for a total of 6 \acx{dof}, when dealing with 3D point clouds (3 \acx{dof} in 2 dimensions).
Point-to-point error uses the most basic primitive and was first introduced in a registration context by \citet{Besl:1992iv} and used subsequently in multiple solutions \citep{Godin:1994uh,Pulli:1999hya,Druon:2006im,YePan:2010ip,Kim:2010fe}.
During the matching step, it might happen that different kind of geometric primitives (e.g., point, line, curve, plane, quadric) are matched together.
Multiple error metrics were developed for those situations and we want to bring them under the same concept that we introduce as \emph{Shape Morphing}.
Essentially, when a primitive with higher dimensionality is matched with a lower one, it is morphed via projective geometry to adapt to its counterpart.
\autoref{fig:shapeMorph} presents the list of possible combination for a 2D space and illustrates the concept for different errors. Using the subfigure labeled point-to-line as an example, a point in solid red matches a line in dashed blue.
To generate an alignment error, a virtual point (i.e., the empty blue circle) is generated by projection. 
The same principle applies to point-to-curve and line-to-curve.
Although not depicted in \autoref{fig:shapeMorph}, their 3D counterparts (i.e., points, planes, quadrics) follow the same projection principle.

\begin{figure}[htb]
\centering
	\includegraphics[width=1.00\textwidth]{shape_morphing}
\caption[Possible morphing in 2D.]
{Possible morphing in 2D.
The real underlaying shape is represented in light gray with its approximation in dark blue. 
The misaligned surface is represented by a point in light red.
The resulting errors are represented with black arrows.
}
\label{fig:shapeMorph}
\end{figure}

The most represented example is the point-to-plane introduced by \citet{Cheng:1992dr} and then reused in multiple works \citep{Champleboux:1992dy,Gagnon:1994eh,Bergevin:1996gh,Gelfand:2003fr}.
Its 2D version, point-to-line, is also used in robotics \citep{Bosse:2009kn} and a closed-form solution was presented by \citet{Censi:2008va}.
Using higher complexity to represent 3D primitives, \citet{Segal:2009ws} propose the use of plane-to-plane, while early work of \citet{Feldmar:1996cc} uses quadric-to-quadric.

It is also possible to find extensions to those error metrics: point-to-point with extrapolation and damping \citep{Zinsser:2003tr}, a mix of point-to-line with odometry error \citep{Diebel:2004jg}, a mix of point-to-point, point-to-line or point-to-plane with angle \citep{Armesto:2010ke} and mix of point-to-point with Boltzmann-Gibbs-Shannon entropy and Burg entropies \citep{Liu:2010gz}.
Entropy based methods used in medical registration were reviewed by \citet{Pluim:2003ig} as being: Shannon, Rodriguez and Loew, Jumarie, R\'{e}nyi entropies.
All those techniques rely on mean squared error.

Recently, \citet{Silva:2005it} introduce a novel error called \acx{sim}, which presents more robustness against different noise types. 
This measure was then applied later by \citet{YePan:2010ip} for face recognition. 
Image registrations mainly use affine transformations including skew and scale deformations like in \citep{Lowe:2004kp}. 
A more complex hierarchy of error models, presented by \citet{Stewart:2003df}, increases the transformation parameter complexity from similarity to affine, reduced quadratic and finally quadratic.
Those error models allow them to achieve higher precision on the final alignment, while avoiding heavy computation at the beginning of the minimization.

\subsection{Optimization} \label{sec:optimization}
Once the error model is defined, the problem is to select a strategy or scheme to find the transformation with the minimum error.
Different optimization strategy are reviewed and discussed by \citet{Rusinkiewicz:2001ff}.
The authors mention the possible use of Singular Value Decomposition (SVD) \citep{Arun:1987ue}, quaternions \citep{Horn:1987hf}, orthonormal matrices \citep{Horn:1988bq}, and dual quaternions \citep{Walker:1991kt} for the point-to-point objective function.
It is noted that the results provided by those solutions are quite similar when the association between points is unknown \citep{Eggert:1997wo}.
This is why those optimization solutions are only briefly listed in this review.
In the case of the point-to-plane error, linearization based on small angle approximation is mainly used following its original implementation \citep{Chen:1991cd}.
Other objective functions for point cloud alignment rely on histogram correlation \citep{Bosse:2008tl}, tensor voting \citep{Reyes:2007tn}, or Hough transform \citep{Lowe:2004kp,Censi:2006fd}.


%\subsection{Optimization}
% FP: this section was rephrase in the paragraph above.

%
%Once the error model is defined, the problem is to select a strategy or scheme to find the transformation with the minimum error.
%When a closed-form solution is available, direct minimization can be used but it is unfortunately seldom the case.
%% point-to-point has a close-form solution, right? should we cite it?
%Due to the iterative nature of \icp, it is not necessary to find the global minimum: finding a transformation with smaller error should still lead to a better association.
%Therefore, a first strategy is to use simplifying assumptions like the small-angle approximation or linearization, to compute a closed-form approximation of the optimum transformation.
%% examples of those?
%
%Another approach is to use the typical toolbox of optimization and in particular iterative methods.
%These approches are popular for point cloud registration and include the well known \icp \citep{Chen:1991cd,Besl:1992iv}, \acx{ndt} \citep{Biber:2003ud}, \acx{sa} \citep{YePan:2010ip} and \acx{ga} \citep{Silva:2005it}.
%But we can cite also application of Gauss-Newton, or Levenberg-Marquardt. % refs
%
%Finally, voting schemes have also been applied, often for image registration, for example the  Hough transform \citep{Lowe:2004kp} and \acx{ransac} \citep{Fischler:1981vl}.
%Tensor voting was presented in a context of stereo image registrations \cite{Medioni:2000ud} and later applied to point cloud registrations \cite{Reyes:2007tn}.


%Unfortunately, closed solutions are rarely possible, so two other minimization schemes are used: iteration and votes.
%: what about small angle approximation? 
%Iterative schemes seem to be more applied to point cloud registrations. 
%Within this category fall the well known \icp \citep{Chen:1991cd,Besl:1992iv}, \acx{ndt} \citep{Biber:2003ud}, \acx{sa} \citep{YePan:2010ip} and \acx{ga} \citep{Silva:2005it}. 
%Voting schemes are more the standard in image registration with Hough transform \citep{Lowe:2004kp} and \acx{ransac} \citep{Fischler:1981vl}. 
%: RANSAC is also iterative?
%Tensor voting was presented in a context of stereo image registrations \cite{Medioni:2000ud} and later applied to point cloud registrations \cite{Reyes:2007tn}.

%: add that properly

%solver:
%Singular Value Decomposition (SVD) \cite{YePan:2010ip}
%Quaternions \cite{Liu:2010gz}, \cite{Godin:1994uh}
%list from \cite{Rusinkiewicz:2001p4715} specialized for rigid transformation: SVD, quaternions, orthonormal matrices, dual quaternions
%non linear method: Levenberg-Marquardt, Gauss-Newton,  or by linearizing the angles
%histogram correlation \cite{Bosse:2008tl}
%Hough transform \cite{Censi:2006p4324}

%From \cite{Rusinkiewicz:2001ff}:
%"Solution methods based on singular value decomposition \cite{Arun:1987ue}, quaternions \cite{Horn:1987hf}, orthonormal matrices \cite{Horn:1988bq}, and dual quaternions \cite{Walker:1991kt} have been proposed; \cite{Eggert:1997wo} have evaluated the numerical accuracy and stability of each of these, concluding that the differences among them are small."

%------example------
\paragraph{Example 1}
In the case of the point-to-point error, the error is the Euclidean distance:
\begin{eqnarray*}
    \error(\mgen{P}, \mgen{Q}) &=& \sum_{(\bm{p}, \bm{q})\in\mgen{M}'}{\|\bm{p}-\bm{q}\|_2}\\
                               &=& \sum_{k=1}^K \left\| \bm{p}_k - \bm{q}_k \right\|_2
\end{eqnarray*}
where $K$ is the number of points in $\mgen{M}'$.

The error minimization is then:
\begin{eqnarray*}
    \fromTo*{\bm{T}}{i}{i+1} &=& \argmin_{\bm{T}}\left(
    \sum_{k=1}^K \left\| \bm{T}\bm{p}_k - \bm{q}_k \right\|_2
    \right)\\
    &=&\argmin_{\bm{T}}\left(\sum_{k=1}^K \left\| \bm{R}\bm{p}_k + \bm{t} - \bm{q}_k \right\|_2\right).
\end{eqnarray*}

In that case, this minimization problem can be solved analytically by computing the centroids (average) of the point clouds, and the singular value decomposition of the covariance \citep{Arun:1987ue}.
More precisely, let $\bm{\mu}_p = \frac{1}{K} \sum_{k=1}^K \bm{p}_k$ and $\bm{\mu}_q = \frac{1}{K} \sum_{k=1}^K \bm{q}_k$ be the centroids of both point clouds.
The covariance is then:
\[\bm{H} = \sum_{k=1}^K (\bm{p}_k - \bm{\mu}_p)(\bm{q}_k - \bm{\mu}_q)^\top.\]
Let $\bm{U}\bm{\Lambda}\bm{V}^\top$ be the singular value decomposition of $\bm{H}$.
It can be shown that the optimal transformation can be computed with:
\begin{equation*}
    \left\{\begin{array}{rcl}
        \hat{\bm{R}} &=& \bm{V}\bm{U}^\top\\
        \hat{\bm{t}} &=& \bm{\mu}_q - \hat{\bm{R}} \bm{\mu}_p.
    \end{array}\right.
\end{equation*}

\paragraph{Example 2}
Another error often used is point-to-plane error, which is only the distance between a point and the plane defined by another point and the normal associated to it:
\begin{equation*}
     \error(\mgen{P}, \mgen{Q}) = \sum\limits_{k=1}^K \left\| (\bm{p}_k - \bm{q}_k) \cdot \bm{n}_k \right\|_2
\end{equation*}
where $\bm{n}_k$ is the normal vector around the 3D point $\bm{q}_k$ in \reference.

The usual method relies on the linearization of the rotation matrix:
\begin{equation*}
\bm{R} = R(\alpha, \beta, \gamma) \approx 
\left[ \begin{array}{ccc} 
1 & -\gamma & \beta \\
\gamma & 1 & -\alpha \\
-\beta & \alpha & 1
\end{array} \right] 
= [\bm{r}]_\times + \bm{I}.
\end{equation*}
The full transformation is parametrized by 6 degrees of freedom:
\begin{equation*}
\mgen{T} = \bm{\tau} = 
\left[ \begin{array}{c}
\bm{r} \\
\bm{t}
\end{array} \right]  =
\left[ \begin{array}{c}
\alpha \\
\beta \\
\gamma \\
t_x \\
t_y \\
t_z
\end{array} \right].
\end{equation*}

Under these assumptions, the optimal can be obtained by solving the following linear system (see \autoref{appendix:error} for more details):
%
\begin{equation}
%\underbrace{
%\sum\limits_{k=1}^K 
%\left[ \begin{array}{cc}
    %\bm{p}_k\times\bm{n}_k   \\
    %\bm{n}_k  
%\end{array} \right]
%\otimes\left[ \begin{array}{cc}
    %\bm{p}_k\times\bm{n}_k   \\
    %\bm{n}_k  
%\end{array} \right]^\top
%}_{\bm{A}_{6 \times 6}}
%\bm{\tau}
%& = &
%\underbrace{
%-\sum\limits_{k=1}^K
%\left[ \begin{array}{c}
%\bm{c}_k \\
%\bm{n}_k \\
%\end{array} \right] 
%(\bm{d}_k \cdot \bm{n}_k)
%}_{\bm{b}_{6 \times 1}}
%\\
\bm{G}\bm{G}^\top\bm{\tau}  =  \bm{G}\bm{h}
\label{eq:minPointToPlane}
\end{equation}
where 

\begin{equation*}
\bm{G}=\left[\cdots\begin{array}{c}\bm{p}_k\times\bm{n}_k\\\bm{n}_k\end{array}\cdots\right]
\end{equation*}
%
is a $6 \times K$ matrix and 
\begin{equation*}
\bm{h}=\left[\begin{array}{c}\vdots\\(\bm{q}_k-\bm{p}_k)\cdot\bm{n}_k\\\vdots\end{array}\right]
\end{equation*}
is a column vector of $K$ elements.
The linear system of \autoref{eq:minPointToPlane} can be resolved for $\bm{\tau}$ using the Cholesky decomposition.

%----end example----

\chapter{Derivation for Point-to-Plane Error}
\label{appendix:error}

\vspace{5mm}
This appendix presents a solution for minimizing the point-to-plane error in 3D. We first define our transformation parameter set $\mgen{T}$ as a 6D vector:

\begin{equation}
\mgen{T} = \bm{\tau} = 
\left[ \begin{array}{c}
\bm{r} \\
\bm{t}
\end{array} \right]  =
\left[ \begin{array}{c}
\alpha \\
\beta \\
\gamma \\
t_x \\
t_y \\
t_z
\end{array} \right],
\end{equation}
%
where $\alpha$, $\beta$ and $\gamma$ are the rotational components, while $t_x$, $t_y$ and $t_z$ are the translation components.
%
We also define the objective function for point-to-plane:
\begin{equation} \label{eq:p2plane}
e_\mathrm{p\Phi} = \sum\limits_{k=1}^K \left\| \left[(\bm{R}\bm{p}_k + \bm{t})- \bm{q}_k\right] \cdot \bm{n}_k \right\|_2 ,
\end{equation}
%
where $\bm{n}_k$ is the normal vector representing the surface at the point $\bm{q}_k$ and the index $k$ represents paired points.
The method presented here rely on rotation matrix linearization. 
This linearization can be achieved using the small-angle approximation:
%
\begin{equation} \label{eq:rotLin}
\bm{R} = R(\alpha, \beta, \gamma) \approx 
\left[ \begin{array}{ccc} 
1 & -\gamma & \beta \\
\gamma & 1 & -\alpha \\
-\beta & \alpha & 1
\end{array} \right] 
= [\bm{r}]_\times + \bm{I},
\end{equation}
% 
where $[\bm{r}]_\times$ is a cross-product operator transforming the vector $\bm{r}$ to a $3 \times 3$ skew-symmetric matrix.
In the context of \icp, the impact of linearization is reduced through the iterative process of the whole registration algorithm.
Combining \autoref{eq:p2plane} with \autoref{eq:rotLin}, we can approximate the objective function as
%
\begin{align*}
e_\mathrm{p\Phi} &\approx \sum\limits_{k=1}^K \left\| [([r]_\times + \bm{I})\bm{p}_k + \bm{t} - \bm{q}_k] \cdot \bm{n}_k \right\|_2 \\
 &\approx \sum\limits_{k=1}^K \left\| (\bm{r} \times \bm{p}_k) \cdot \bm{n}_k + \bm{p}_k \cdot \bm{n}_k + \bm{t} \cdot \bm{n}_k - \bm{q}_k \cdot \bm{n}_k \right\|_2 ,
\end{align*}
%
which can be rewritten using the \emph{scalar triple product} and by reorganizing the terms
\begin{align*}
e_\mathrm{p\Phi} &\approx \sum\limits_{k=1}^K \left\| \bm{r} \cdot \underbrace{(\bm{p}_k \times \bm{n}_k)}_{\bm{c}_k} + \bm{t} \cdot \bm{n}_k - \underbrace{(\bm{q}_k - \bm{p}_k)}_{\bm{d}_k} \cdot \bm{n}_k  \right\|_2 \\
 &\approx \sum\limits_{k=1}^K \left\| \bm{r} \cdot \bm{c}_k + \bm{t} \cdot \bm{n}_k - \bm{d}_k \cdot \bm{n}_k  \right\|_2 ,
\end{align*}
%
We can then minimize the error $e_\mathrm{p\Phi}$ with respect to $\bm{r}$ and $\bm{t}$ and setting the partial derivatives to zero
%
\begin{align*}
\frac{\partial e_\mathrm{p\Phi}}{\partial \bm{r}} &= \sum\limits_{k=1}^K 2 \bm{c}_k (\bm{r} \cdot \bm{c}_k + \bm{t} \cdot \bm{n}_k - \bm{d}_k \cdot \bm{n}_k) = \bm{0} \\
\frac{\partial e_\mathrm{p\Phi}}{\partial \bm{t}} &= \sum\limits_{k=1}^K 2 \bm{n}_k (\bm{r} \cdot \bm{c}_k + \bm{t} \cdot \bm{n}_k - \bm{d}_k \cdot \bm{n}_k) = \bm{0}
\end{align*}
%
We can assemble those derivative under the linear form $\bm{A}\bm{\tau}=\bm{b}$, by bringing the independent variables on the right side of the equation
%
\begin{align*}
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k (\bm{r} \cdot \bm{c}_k) + \bm{c}_k (\bm{t} \cdot \bm{n}_k)   \\
\bm{n}_k (\bm{r} \cdot \bm{c}_k) + \bm{n}_k (\bm{t} \cdot \bm{n}_k)  
\end{array} \right]
&=
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k (\bm{d}_k \cdot \bm{n}_k)   \\
\bm{n}_k (\bm{d}_k \cdot \bm{n}_k)
\end{array} \right]
\\
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k \bm{c}_k^\top \bm{r} + \bm{c}_k \bm{n}_k^\top \bm{t}    \\
\bm{n}_k \bm{c}_k^\top \bm{r} + \bm{n}_k \bm{n}_k^\top \bm{t}  
\end{array} \right]
&=
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k (\bm{d}_k \cdot \bm{n}_k)   \\
\bm{n}_k (\bm{d}_k \cdot \bm{n}_k)
\end{array} \right]
\\
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k \bm{c}_k^\top  & \bm{c}_k \bm{n}_k^\top    \\
\bm{n}_k \bm{c}_k^\top  & \bm{n}_k \bm{n}_k^\top   
\end{array} \right]
\left[ \begin{array}{cc}
\bm{r} \\
\bm{t}
\end{array} \right]
&=
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k \\
\bm{n}_k 
\end{array} \right] (\bm{d}_k \cdot \bm{n}_k)
\end{align*}
%
which brings us to the linear system of equations that we were looking for 
%
\begin{align} \label{eq:minPointToPlaneBis}
\underbrace{
\sum\limits_{k=1}^K 
\left[ \begin{array}{cc}
\bm{c}_k   \\
\bm{n}_k  
\end{array} \right]
\left[ \begin{array}{cc}
\bm{c}_k^\top  &  \bm{n}_k^\top \\
\end{array} \right]
}_{\bm{A}_{6 \times 6}}
\bm{\tau}
 = 
\underbrace{
\sum\limits_{k=1}^K
\left[ \begin{array}{c}
\bm{c}_k \\
\bm{n}_k \\
\end{array} \right] 
(\bm{d}_k \cdot \bm{n}_k)
}_{\bm{b}_{6 \times 1}}
\end{align}
%
Once the matrix $\bm{A}$ and the vector $\bm{b}$ can be constructed, the linear system of \autoref{eq:minPointToPlaneBis} can be resolved for $\bm{\tau}$ using the Cholesky decomposition.
Implementing such solution will require a loop for the summations over $K$ to build $\bm{A}$ and $\bm{b}$.
An alternative formulation relying on dense matrix multiplication can be computed by assembling
\begin{equation*}
\bm{G}=
\underbrace{
\left[\cdots\begin{array}{c}\bm{p}_k\times\bm{n}_k\\\bm{n}_k\end{array}\cdots\right]
}_{6 \times K}
\end{equation*}
%
and
%
\begin{equation*}
\bm{h}=
\underbrace{
\left[\begin{array}{c}\vdots\\(\bm{q}_k-\bm{p}_k)\cdot\bm{n}_k\\\vdots\end{array}\right]
}_{K \times 1}
\end{equation*}
%
leading to 
\begin{align*}
\bm{A}\bm{\tau} &= \bm{b} 
\\
&\Updownarrow 
\\
\bm{G}\bm{G}^\top\bm{\tau} &= \bm{G}\bm{h} ,
\end{align*}
%
which is the same formulation as proposed in \autoref{sec:optimization}\qed