We define the intensity error $e_I$ as in:

$$ e_I = I^{*}(x) - I(W(x,p))$$

Where $x$ are 2d positions on the image plane and $W(x,p)$ is the warping function that warps pixels from the image $I$ to the template $I*$.

The warp depends on the camera parameters, the depth at $x$ and the relative pose. As we are using calibrated RGBD cameras, all parameters are given a part from the relative pose which is what we are trying to find. The warping function is defined as follows:

$$ W(x,p) = \pi(g(\pi^{-1}(x),p))$$

Where g(x,p) is an SE3 transformation consisting of a rotation and translation of a rigid body.

Since the image and the warp are non linear functions we want to linearize the error and solve for a small increment in parameters $\Delta p$. We can express the update to the warp in a "compositional" way: 

$$  e_I \approx I^{*}(x) - I(W(W(x,\Delta p),p)) $$

After solving the problem for $\Delta p$ we can update the warp as:

$$ W(x,p_1) = W(W(x,\Delta p),p_0) $$

$$ W(x,p_1) = \pi(g(g(\pi^{-1}(x),\Delta p),p_0)) $$

$$ g(x,p_1) = g(g(x,\Delta p), p_0) $$

$$ W(x,p_1) = \pi(g(\pi^{-1}(x),p_1)) $$

Where $g(x,p_1)$ is two concatenated SE3 transformations.

For solving the problem we linearize it around p:

$$  e_I \approx I^{*}(x) - I(W(x,p)) - \nabla I \frac{\delta W}{\delta p} \Delta p$$

$$  e_I \approx I^{*}(x) - I(W(x,p)) - \nabla I \nabla \pi \frac{\delta g}{\delta p} \Delta p$$




As $g(x,p)$ is a SE3 transformation its derivative is not trivial. However, since we are only interested in an small increment around $p$ we can formulate it as an increment in the lie algebra:

$$ g = e^{\hat{\Delta p}} \boxplus g(x,p)$$

Where $e$ is the matrix exponential, $\hat{\Delta p}$ is the skew symmetric matrix of $\Delta p$ and $\boxplus$ is the group operator of SE3. We can compute the jacobian of the expression following "A tutorial on SE(3) transformation parameterizations and
on-manifold optimization" appendix A.2 "Applications to Computer Vision".


#TODO use matrix formulation, include weights

We can summarize to:

$$  e_I \approx r + J\Delta p $$

Since we are interested in the minimum of $e_I^2$ we set its derivative to 0 and solve for $\Delta p$:

$$ \frac{\delta e_I}{\delta p} = 2J^Tr + 2J^TJ \Delta p = 0$$

$$ \Delta p = - \frac{J^Tr}{J^TJ}


Since we are in rgbd setting we can impose an additional constraint on the depth:

$$ e_Z = [h(Z^{*}(x),p)]_z - Z(W(x,p))$$

Where $h$ is the transformation from the pixel at $Z^{*}$ to the camera coordinate system of $Z$ and $[.]_z$ selects the z-component of the vector.

$$ h = g(\pi^{-1}(x),p) $$

Where  $\pi()$ is the camera projection and g(x,p) is an SE3 transformation. Note how it is simply the warping function without the final reprojection part:

$$ W(x,p) = \pi(g(\pi^{-1}(x),p))$$


We can replace the projection to 3D as a constant:

$$\pi^{-1}(x) = X$$

And write the whole expression as:

$$ e_Z = [g(X),p]_Z - Z(\pi(g(X,p)))$$

Similarly to the intensity error we can express the depth error in terms of a small increment. However, in case of the depth we have the relative pose not only in the warping function but also in the transformation function $h$:

$$ e_Z = [g(g(X,\Delta p),p)]_z - Z(\pi(g(g(X,\Delta p),p)))$$


Similarly to the intensity part we can linearize the error around $p$ and solve for a small increment $\Delta p$:

$$ e_Z \approx e_Z(0) + \frac{\delta e_Z}{\delta p}\Delta p = [g(X),p]_Z - Z(\pi(g(X,p))) + ([\frac{\delta g}{\delta p}]_z - \nabla Z \nabla \pi \frac{\delta g}{\delta p})\Delta p$$


We can summarize to:

$$ e_Z \approx (r_z + ([J_T]_z + J_ZJ_{\pi}J_T ))\Delta p$$

With:

$$ r_Z = [g(X),p]_Z - Z(\pi(g(X,p)))$$

being the z-residual which consists of the difference between the depth $z^{*}$ at pixel $x$ transformed to the camera coordinate system of Z and the depth $z$ at the warped pixel $W(x,p)$.

$$ J_ZJ_TJ_{\pi} = \nabla Z \nabla \pi \frac{\delta g}{\delta p} $$

being the "warp jacobian" similar to the image warp jacobian for the intensity part multiplied by the depth map jacobian

$$ J_T = \frac{\delta g}{\delta p}$$ 
being the transformation jacobian.



We can summarize to:

$$ e_Z \approx (r_z + J_z) $$

and will reach similar normal equations as for e_I.
 

If we combine the intensity and depth constraints for each observation we get:

$$e_I + we_Z \approx r_Z + wr_I + ((J_IJ_{\pi}J_T)+w(J_ZJ_{\pi}J_T+[J_{T}]_z))\Delta p$$

Where w can be a weighting factor to balance the scales of the two errors.