# Table of Contents
 <p><div class="lev1"><a href="#Identification-of-dynamic-model-parameters"><span class="toc-item-num">1&nbsp;&nbsp;</span>Identification of dynamic model parameters</a></div><div class="lev2"><a href="#Dynamic-modeling"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Dynamic modeling</a></div><div class="lev3"><a href="#Inertial-Parameters"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Inertial Parameters</a></div><div class="lev2"><a href="#Identification"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Identification</a></div><div class="lev3"><a href="#Missing-torque-or-force-sensors"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Missing torque or force sensors</a></div><div class="lev2"><a href="#Parameters"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Parameters</a></div><div class="lev3"><a href="#Standard-and-Base-Parameters"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Standard and Base Parameters</a></div><div class="lev3"><a href="#Essential-Parameters"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Essential Parameters</a></div><div class="lev2"><a href="#Joint-Excitation"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Joint Excitation</a></div><div class="lev2"><a href="#Retrieving-torque-measurements"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Retrieving torque measurements</a></div><div class="lev2"><a href="#Identification-of-difference-between-a-priori-parameters-and-optimal-ones"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Identification of difference between a priori parameters and optimal ones</a></div><div class="lev2"><a href="#Direct-identification-of-standard-parameters"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Direct identification of standard parameters</a></div><div class="lev2"><a href="#Optimization-with-constraints"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Optimization with constraints</a></div><div class="lev3"><a href="#Feasibility-constraints-as-LMI"><span class="toc-item-num">1.8.1&nbsp;&nbsp;</span>Feasibility constraints as LMI</a></div><div class="lev3"><a href="#Constrained-OLS-as-SDP-optimization-problem"><span class="toc-item-num">1.8.2&nbsp;&nbsp;</span>Constrained OLS as SDP optimization problem</a></div><div class="lev3"><a href="#SDP-with-contact-forces"><span class="toc-item-num">1.8.3&nbsp;&nbsp;</span>SDP with contact forces</a></div><div class="lev3"><a href="#SDP-in-Standard-parameter-space"><span class="toc-item-num">1.8.4&nbsp;&nbsp;</span>SDP in Standard parameter space</a></div><div class="lev2"><a href="#References"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>References</a></div>

# Identification of dynamic model parameters

## Dynamic modeling

The dynamic model of a tree structured robot is derived from the Newton-Euler method
(or Lagrangian equations and virtual work principle) and results in the following equations.

__Forward dynamics__:

Given joint torques/forces, find the resulting velocities and accelerations.

$$\begin{equation} \ddot{q} = F(q, \dot{q}, \tau, f_{ext}) \end{equation}$$
with:  
* $F$ - some function
* $q$ - Joint positions in configuration space ($\in \mathbb{R}^n$, generalized coordinates)
* $f_\text{ext}$ - contact forces (at end-effectors)
* $\tau$ - joint torques

The forward dynamics can be solved numerically with the ABA algorithm.

__Inverse dynamics__:

Given the position, velocity and acceleration of joint trajectories, find the necessary
torques/forces of the joints. (Sometimes simply called equations of motion, joint-space form)

$$\begin{equation} \tau = H(q)\ddot{q}+c(q,\dot{q}, f_\text{ext}) \end{equation}$$
with:  
* $H$ - joint-space inertia or mass matrix ($\mathbb{R}^{6+n_\text{dof} \times 6+n_\text{dof}}$)
* $c$ - joint-space bias force (the force which is needed to produce zero acceleration)
* $f_\text{ext}$ - Wrench excerted by external environment on link ($\mathbb{R}^{6}$)
* $\tau$ - Joint torques ($\mathbb{R}^{6+n_\text{dof}}$)

(The floating base formulation of $\tau$ has a dimension of $6+n_{dof}$ instead of $n_{dof}$ for fixed base, $6$ F/T values for the base link and $n_{dof}$ torque values for each joint.)

They can also be written with explicit bias forces:  
$$\begin{equation} \tau = H(q)\ddot{q}+C(q,\dot{q})\dot{q} + g(q) - \sum_{l \in L} J_{l}^T f_l^{ext} \end{equation}$$
with:  
* $C(q,\dot{q})\dot{q}$ - Coriolis and centrifugal forces ($\mathbb{R}^{6+n_{dof}}$, $C \in \mathbb{R}^{6+n_{dof} \times n_{dof}}$) 
* $g$ - Gravity terms ($\mathbb{R}^{6+n_{dof}}$)
* $J_l$ - Jacobian of the contact frame $l$ ($\mathbb{R}^{6 \times 6+n_{dof}}$)
* $f_l^\text{ext}$ - Wrench excerted by external environment on link l ($\mathbb{R}^{6}$)
* (friction is missing, for that see p.228 Khalil)

The inverse dynamics problem can numerically be solved with the RNEA algorithm (which can also give $c(q,\dot{q}, f_\text{ext})$).

The inverse dynamics can also be written in regressor form (Khosla, 1985) which makes the equations linear in the parameters:

$$\begin{equation} \tau = Y(\theta, \dot{\theta}, \ddot{\theta}) x - \sum_{l \in L} J_{l}^T f_l^{ext} + \rho \end{equation} $$
with:  
* $Y$ - dynamics regressor ($\mathbb{R}^{6+n_\text{dof} \times 10\cdot n_\text{links}}$) (Jacobian of $\tau$ with respect to vector $x$)
* $x$ - inertial parameter vector ($\mathbb{R}^{10\cdot n_\text{links}}$)
* $\rho$ - errors due to measurement noise and modelling errors ($\mathbb{R}^{6+n_\text{dofs}}$)

### Inertial Parameters

The inertia parameters usually consist of mass in kg (1 param), location of center of mass relative to joint (3: x,y,z) and the rotational inertia in each axis around the COM (6: xy, xx, xz, yy, yz, zz), so 10 parameters for each link. Additionaly, e.g. 4 friction parameters are possible (total inertia of moment for rotor and gears, viscous and coulomb coefficients and an offset parameter), (Gautier, 2013, p.1430).
Inverting eq. (4) for $x$ (further down) assumes that the equations are linear in the unknown parameters. This is however only true for Recursive Newton-Euler if the parameters are given in the modified linear form. The inertia tensor needs to be formulated relative to the link frame (URDF uses relative to the COM). The COM position needs to be given as first moment of mass, so coordinates times mass. See also Khosla, 1985.

In CoDyCo/iDynTree notation (https://github.com/robotology/idyntree/blob/master/doc/dcTutorialCpp.md):  
"For floating-base dynamics, the dynamics regressor is a $(6+n_\text{dofs})\times (10 \cdot n_\text{dofs})$ matrix $Y$ such that: $Y \pi = M(q) \frac{d(v)}{dt} + C(q,v)v + g(q)$ with $M(q)$, $C(q,v)$ and $g(q)$ defined in http://wiki.icub.org/codyco/dox/html/dynamics_notation.html. The $\pi$ vector is a $10 \cdot n_\text{dofs}$ inertial parameters vector, such that the elements of the vector from the $((i-1) \cdot 10)$-th to the $((i \cdot 10) - 1))$-th belong to the i-th link. For more details on the inertial parameters vector, check https://hal.archives-ouvertes.fr/hal-01137215/document".


## Identification

The identification problem is to find the parameter vector $x$ (link mass, COM, inertia for each link. Details below) in the inverse dynamics regressor equation.

iDynTree can calculate the dynamics regressor $Y$ numerically (using RNEA) for each state, which is dependent on the joint-link tree and on the system state ($q, \dot{q}, \ddot{q}$). For Walk-Man, the contact forces with the ground at the feet can be measured and together with the Jacobian for each contact frame added to the joint torques. For fixed base dynamics, the only force is coming through the constrained base link (for which the Jacobian is $J = [I \quad 0]$ and the contact forces vanish).
Robotran or openSYMORO gives the regressor as symbolic equations that are dependent on the system state and the dynamical and some structural parameters.

When torques and/or contact forces are known and $Y$ and $J^T$ (of each contact link) is retrieved through iDynTree, we can calculate 

$$\begin{equation} Yx = \tau + \sum_{l \in L} J_{l}^T f_l^{ext} \end{equation}$$
$$\begin{equation} x = Y^{-1}\tau + Y^{-1}\sum_{l \in L} J_{l}^T f_l^{ext} \end{equation}$$
using pseudoinverse for the general non-invertible case
$$\begin{equation} \tilde{x} = Y^{+}\tau + Y^{+}\sum_{l \in L} J_{l}^T f_l^{ext} \end{equation}$$

### Missing torque or force sensors

To be able to do identification of the parameters $x$ without having contact force measurements, the HyQ identification code (Andrea del Prete) multiplies the equations with a null-space term to eliminate the contact forces from the equation. (How to get the null-space though?)
 
$$\begin{equation} Yx = \tau + J^T f_\text{ext} \end{equation}$$
$$\begin{equation} N(J^T)Yx = N(J^T)\tau + \underbrace{N(J^T)J^T f_\text{ext}}_{=0} \end{equation}$$
$$\begin{equation} x = [N(J^T)Y]^{-1}N(J^T)\tau \end{equation}$$

If only contact forces are known, there is also the possibility to just look at the upper equations of the matrix formulation and leave out the joint torques, using only the base-link dynamics (see: Ayusawa, Venture, 2008: "Identifiability and identification of inertial parameters using the underactuated base-link dynamics for legged multibody systems").



Both approaches most probably will reduce the accuracy expecially for the links closer to the missing information.

## Parameters

### Standard and Base Parameters
The parameters $x$ are called the standard parameters as they occur in the normal dynamics equations. The rank of the regressor is usually smaller than its number of columns, i.e. some columns are linearly dependent (matrix is rank deficient). It is not possible to use many numerical linear algebra methods for the identification when the matrix is singular.
Before inverting the equation and estimating the minimum identifiable parameters (also base parameters or simply identifiable parameters), we need to reduce the regressor $Y$'s columns to the linearly independent ones. It is possible using an algorithm given by Gautier, 1990, using SVD or QR decompositions or using closed form rules (Khalil). That way, some columns and their corresponding parameters of which the regressor columns are zero are removed and some will be grouped into linear combinations of two or more standard parameters, as they were linearly dependent (i.e. were multiples of each other). The resulting base parameters are linearly independent but the choice is not unique (there are multiple possibilities of combining the columns). Any combinations of those can be used.

The regressor for the base parameters can be retrieved e.g. from the SVD, where the first $r = \text{rank}(Y_\text{std})$ columns of V are the basis vectors of the identifiable subspace. There is computeFloatingBaseIdentifiableSubspace() in iDynTree which using the SVD calculates this orthogonal basis $B$ of the identifiable parameters subspace. It can be used to project from the standard regressor to the base regressor, i.e. $Y_\text{base} = Y_\text{std}B$ (See Traversaro, 2015, p.3).
Similar the first $r$ colimns from matrix Q of the QR decomposition of $Y_\text{std}$ represent an orthonormal basis of the column space of $Y_\text{base}$. In order to not be dependent on incomplete data, the linear dependencies can also be computed from a regressor produced from random state data. A stacked regressor matrix can be generated with random $q, \dot{q}, \dot{q}$, in turn giving the structural linear dependencies that are independent from an excitation trajectory and measurement errors. For computational efficiency, the random regressors $Y^T Y$ can also be added on top of each other (forgot the exact reason why this works - Traversaro, unpublished?).
Other methods get the linear relationship between removed columns and remaining by QR decomposition or the SVD with a permutation matrix (see Gautier, 1990) which is especially useful if the symbolic relationships are needed. The reduced row echelon form (Klodmann, 2015) or explicit symbolic rules (Khalil, 19?), which both however do not necessesarily give a minimal set of columns, can also be used.

In order to obtain standard parameters for e.g. simulation models or modifying the URDF file, after identifying the base parameters $\tilde{x}_\text{base}$, they can be projected back to the standard basis using the same projection matrix $B$, obtaining a non-unique $\tilde{x}$. In case $B$ is orthonormal, $\tilde{x}_\text{std} = B^T \tilde{x}_\text{base}$ (otherwise, use pinv). This way, the parameters are however not necessarily physically meaningful (also called physical consistent) in which case stability assumptions for certain model based controllers or simulations will fail. Useful torque prediction can however be done already (maybe getting worse if movements are very different while CAD is always similarly good/bad?).

Standard parameters are called physical consistent if masses are strictly positive, the inertia matrix is positive definite (i.e. for any non-zero real vector $z$: ${\displaystyle z^{\mathrm {T} }Mz}$) and the center of mass is located inside of the link segment. The moments of inertia need to respect the triangle inequality (but this can possibly be ignored).

Various methods have been proposed to obtain better parameters and to improve the consistency. While improving the identification itself (less noise or be closer to gaussian distribution, obtain more data, excite ideal trajectories, select data to improve condition number, etc.) can likely improve the base parameter accuracy, there is however no unique mapping back to the standard parameters. There is rather an affine subspace of the standard solution vector space $V_\text{base} \subset \mathbb{R}^{10*n_\text{links}}$ that includes all solutions that project to the same base identified parameters, i.e. $V_\text{base} = \{ x \in \mathbb{R}^{10*n_\text{links}} \,|\, Bx = \tilde{x}_\text{base}\}$. The affine subspace of all standard solutions that are physically consistent might be intersecting with $V_\text{base}$ in which case $\tilde{x}_\text{base}$ is called physically consistent.
Equally, a set of base parameters $x_\text{base}$ is physically consistent, if $\exists x: Bx = x_\text{base}$ (Sousa, 2014).

### Essential Parameters

The base parameters might still have elements that are (get identified as?) almost zero and therefore don't contribute much to the dynamics calculations. Possibly because they have not been excited well in the experiment that was measured. They can therefore be cancelled (params and regressor columns deleted) to retrieve a smaller model and set of parameters called the essential parameters (Gautier, there is also another definition for essential parameters in Sheu, 1991). The residual error will not grow a bit as the model will be expressed with fewer parameters and equations but should stay small.

The essential base parameters can be determined using the relative standard deviation of the identifed parameter vector, using standard deviation of the error of the estimated torques (See Gautier, 2013). Iteratively, the parameter with the largest relative std dev will be cancelled and the estimation repeated. This is done until a ratio of min and max relative std dev on the parameters is reached that is "between 30 and 20". That way, the similarity of estimation "certainty" of the remaining parameters is increased. The error itself is not looked at. Other criteria for stopping the model reduction because of error are the F-statistic (Janot 2014) or the lack-of-fit error (based on mean least squares) but these measures assume that the error is normal distributed which is not always the case so that these statistic tests can be calculated. Janot recommends improving measurements, filtering etc. Also, it is unclear (to me) how these measures are related to amount of samples selected, they seem to change a lot on the same data when e.g. only every other sample is selected.
Alternatively, looking at the percentual error of the estimation in relation to the maximal torque ranges of each joint or the maximum measured torque ranges allows stopping before the prediction error gets too large (torque prediction should ultimately still be good, even if the parameters are not estimated close to the actual values).
However, it is possible that the estimation of the parameters is bad if too few paramters are removed (?).

The torque error $\rho$ in $(4)$ includes measurement error, error due to unmodelled effects (linear and non-linear) and the error from model reduction. Before reduction, only the modelling error $\rho_j$ of sample $j$ can be determined (the real torques are not known) as the mean of the difference of measured torques $\tau_i$ and the corresponding estimated torques $\hat{\tau_i}$ of joint $i$:

$$ \rho_j =\frac{1}{n_\text{dofs}} \sum_{i=0}^{n_\text{dofs}}(\tau_i - \hat{\tau_i})^2 $$

and $\rho$ being the vector of the errors for all measured samples $j$, $\rho = [\rho_0, \rho_1, \dots \rho_N]^T$

The percentual error in regard to e.g. the maximum torque (range) is:
$$ \frac{\text{mse}}{\text{mean}(\text{torqranges})} $$

When using the QR decomposition or permutation matrices for determining which std parameters are regrouped into the base parameters, it is known which of the standard parameters belong to each of the base parameters. Having determined the indices of the essential parameters within the base parameters, and knowing the indices of the base parameters within the standard parameters, the essential parameters within the standard parameters can also be determined.
Each base parameter $b$ can correspond to a linear combination of $n$ standard parameters $a_n$ $k_1a_1+k_2a_2+...+k_na_n$. Each of the $a_n$ is assumed to be essential if $b$ is essential (to be inclusive; some of them can have no influence). The paper is not clear about how the essential parameter within the standard vector is determined, experiments suggest that using the dependent std columns of each base essential parameter improve the results. The weighting of these parameters is non-trivial (see code). The final estimation of the essential parameter vector is described in Gautier, 2013 and boils down to
$$\tilde{x}_\text{std}^\text{opt} = x_\text{std}^\text{cad} + V_1 \Sigma_1^{-1} U_1^T(Y - Y^\text{ref})$$
which is determining the parameter error to the a priori CAD values.

The essential parameter approach has no formal proof for the claim that potentially "more" consistent parameters are produced. It actually depends a lot on the data quality and a relative closeness of the CAD parameters to the actual ones. It is not a suitable method to use with bad or unknown data and it also increases the model error.

## Joint Excitation

A good estimation should yield physically consistent parameters and only produce small prediction errors (measured torques and simulated torques) with those estimated parameters.
Siciliano recommends sufficiently rich trajectories that should however not excert unmodelled dynamic effects like elasticity.

In order to get meaningful sample data that allows good estimation, trajectories can be generated by optimization of e.g. the prediction error or the condition number of the regressor matrix.
Swevers, Gensemann (1997) use Fourier series on all joints to generate periodic excitation and use a common pulsation (i.e. frequency) so the excitation as a whole also stays periodic (multiples of the base frequency?). This way it is said to be easier to get time series measurements because averaging can be done to reduce noise, noise can be estimated and velocities and accelerations can be calculated analytically using the FFT to get near noise-free values. The joint angles for the $i$-th of $n$ joints are given by $N$ coefficients:

$$ q_i(t) = \sum_{l=1}^{N_i} \frac{a_l^i}{\omega_f l} \text{sin}(\omega_f lt)- \frac{b_l^i}{\omega_f l} \text{cos}(\omega_f lt) + q_{i0}$$

It is therefore parameterized by the Fourier coefficients $a_l^i$ (amplitude of sin), $b_l^i$ (amplitude of cos) and $q_i0$ (offset on the position trajectory) for each term $l$ of each joint $i$. The velocity and acceleration is given accordingly.

Finding the right parameters is then a non-linear optimization problem to get ideal trajectories. Finding a global optimal solution is hard as the search space can be large and the function to evaluate is complex and not analytically differentiable. The "only correct" criterion (Swevers) for experimentation design is said to be the covariance matrix of the estimated model parameters which when using a maximum-likelihood estimation is equal to $(F^t \Sigma^{-1}F)^{-1}$ with $F$ the condition number. Any optimization needs to take the model constraints like (self-) collision, maximum velocities and torque and position limits into account.

First using a global method (like the particle swarm optimization ALPSO from pyOpt) that starts from random points and progresses with some heuristics has shown to be effective and further local optimization (with e.g. SLSQP with gradient approximation) can possibly increase the accuracy a bit further. It is noteworthy that not setting the parameter limits carefully to keep a small search space with many feasible points in it might increase the number of necessary function evaluations very quickly. The success of only using local gradient based methods is strongly dependent on the starting point and will always find the same solution. Also, many more function evaluations needs to be done to approximate the gradient. For randomness based methods, each run might produce solutions with higher or lower quality after a fixed amount of time.
Ding, et. al. (2015) use an "artificial bee" algorithm to solve the same optimization problem as in Swevers (fourier series). Other methods based on genetic programming/artificial evolution or simulated annealing have also been used in other papers but have been difficult to try out with existing python toolkits (not working, missing constraint support, only commercial library backends).

Another simpler method is to select random end-effector trajectory points in space and use smooth curves to move along them. Selecting them can potentially also be done using an optimization method.

It may become beneficiary to use specific optimized excitation trajectory experiments initially to get a well identified general model (that could be better than CAD) but nevertheless, have task specific and selective identification running and updating during the robot lifetime.

For that purpose, instead of generating ideal trajectories, samples from normal movement data of existing behaviors need to be selected. The criterion for adding new data to the existing dataset or not is still to be determined but could be similar to existing optimization criteria like the condition number, model or validation accuracy and controller performance (e.g. tracking errors).
Venture, et. al (2009) select data from an existing dataset to identify and calculate the condition number for sub-matrices of the (base) regressor (select certain columns corresponding to e.g. a link) to get information about which links are excerted by the data. The final measurement data is selected in such a way that all links have low values and among all selected data blocks, two with similar patterns are avoided (to not repeat certain data too much) and in turn decrease the overall condition number.

## Retrieving torque measurements

The torque measurements should measure the deflection between link $i$ and child link $i+1$. For the position, velocity and acceleration, it is likely necessary to differentiate the velocity and possibly also the position depending on what measurements are available. Since differentiation has the effect of high-pass filtering and therefore adding new noise, the data should be low-pass filtered (e.g. with an appropriately designed butterworth filter). Filtering in a forward and a backward pass so no phase shift is introduced. It is also possible to use simple median filtering to first remove some outliers.

It is often done in previous work to use multiple experimental setups identifying separate body parts (going up the chain) while fixating the rest of the body to improve the estimation accuracy. For the iCub, the arms were "cut" at the position of the 6 DOF F/T sensors so that the measurement gives the contact forces for a fixed base scenario. The limbs should move with a standardized trajectory excitation that highlights all the influencing columns and has sufficient velocities, accelerations and forces appearing. The dynamic equations don't include friction or other non-linear effects.

Using $N$ measurement samples at time instants $t_1 \dots t_N$, the regressor matrix is extended by v-stacking regressors for each measurement and the same for the torque vectors.

$$\begin{bmatrix} 
   Y(t_1)\\ 
   \vdots \\
   Y(t_N)\\
 \end{bmatrix}x = \begin{bmatrix} 
   \tau(t_1)\\ 
   \vdots \\
   \tau(t_N)\\
 \end{bmatrix} - \begin{bmatrix} 
   {\sum_{l \in L} J_{l}^T f_l^{ext}}^1\\ 
   \vdots \\
   {\sum_{l \in L} J_{l}^T f_l^{ext}}^N\\
 \end{bmatrix} $$
$$ x = \begin{bmatrix} 
   Y(t_1)\\ 
   \vdots \\
   Y(t_N)\\
 \end{bmatrix}^{+}\begin{bmatrix} 
   \tau(t_1)\\ 
   \vdots \\
   \tau(t_N)\\
 \end{bmatrix} - \begin{bmatrix} 
   Y(t_1)\\ 
   \vdots \\
   Y(t_N)\\
 \end{bmatrix}^{+}\begin{bmatrix} 
   (\sum_{l \in L} J_{l}^T f_l^{ext})^1\\ 
   \vdots \\
   (\sum_{l \in L} J_{l}^T f_l^{ext})^N\\
 \end{bmatrix} $$
 
 The resulting matrices should have the following dimensions ($n_\text{dof}$ - Degrees of Freedom, $n_\text{links}$ - Nr. of Links):

 $Y_1^N$: $N \cdot n_\text{dof}\times (10\cdot n_\text{links})$
 
 $\tau^N_1$: $N \cdot n_\text{dof}$
 
 The equations only preduce sensible results if there are sufficient measurements and the measurements are reasonably noise free. Otherwise, a maximum-likehood estimator could be a good option which estimates the noise with known constant variance. Swevers describes one way of using an MLE and that it is robust against non-linear effects in the model such as modeled friction.

## Identification of difference between a priori parameters and optimal ones

Using parameters from the CAD model, it is possible to determine standard parameters that are close to the a priori knowledge (Gautier, 2013, p.1432). The claim is that if the CAD values are physical consistent (and well chosen), the determined values are optimal with regard to error norms (other methods are usually increasing the residual error) and the estimated parameters are physical consistent (if combined with reduced essential parameters). It turns out however that this is not necessarily true, only if the parameters change very little. No guarantees can be given as there are no constraints on the values. There can be fewer links though that are physically inconsistent and a combination with other methods might be worth a try.

Let $x^\text{ref}_\text{std}$ be the a priori CAD parameter vector and $\tau^\text{ref}$ be the torques estimated with these parameters by $\tau^\text{ref} = Y_\text{std}x^\text{ref}_\text{std}$.
Subtracting the regressor equation of the identification problem from the estimated parameters, we get:
$$\tau - \tau^\text{ref} = Y_\text{std} (x_\text{std} - x^\text{ref}_\text{std}) + \rho $$
$$ \iff \Delta \tau = Y_\text{std} \Delta x_\text{std} + \rho $$

## Direct identification of standard parameters

(Gautier, 2013)
It is possible to modify the regressor to remove the singular values and then identify the standard parameters directly. The non-singular matrix $\hat{Y}_\text{std}$ closest to $Y_\text{std}$ with respect to the Frobenius norm is $$\hat{Y}_\text{std} = Y_\text{std} - U_{n_b:n_\text{st}}\Sigma_{n_b:n_\text{st}}{V_{n_b:n_\text{st}}}^T = Y_\text{std} - \sum\limits_{k=n_b+1}^{n_{std}}{s_k {U_k V_k}^T}$$
with $U \Sigma V^T = \text{SVD}(Y_\text{std})$ and with $s_k$ the $k$-th value on the diagonal of $\Sigma$ and $U_k$ and $V_k$ being the $k$-th column of $U$ and $V$.
The identification with the pseudoinverse can then be done as before with the base parameters, now identifying the standard parameters directly.


## Optimization with constraints

Since the obtained parameters minimize the error norm but otherwise have no constraints, the base parameters might be very close to the "real" ones but also can vary depending on an ill-conditioned regressor and noise. The resulting standard parameters can likely be far off the "real" ones. Even if the base parameters are estimated perfectly (as seen in simulation), the non-unique choice (within null-space of base solution space) through using the inverse base projection matrix back onto the standard solution space will usually result in a solution that is not physically consistent and out of other bounds.
Different authors (Yoshida, Gautier 2000; Ayusawa, Nakamura 2010) suggested to use non-linear optimization methods to find a solution within non-linear or linear constraints that are physically consistent and close to the identified solution. It is possible to solve the classic OLS problem at the same time as the constraints with these formulation, although often a standard solution is found first and then constraints are enforced, increasing the residual error again. Using convex optimization (LMI-SDP) with linear constraints in terms of the base parameters finds the optimal physical consistent solution (Sousa, 2014). With the same method it is also possible to directly identify the standard parameters and formulate direct constraints on them, it seems without any disadvantage over working within the base space other than more variables and possibly higher optimization complexity. As SDP finds a global optimal solution, no previous solution e.g. from CAD is necessary. Performance is also said to be very good and usable for high degrees of freedom.

It is possible that a solution in the base parameter space does not have any corresponding consistent set of standard space parameters. A separate correction method (also using SDP) moving such a solution to the closest solution for which a consistent solution exists and is also given in (Sousa, 2014).

Another method by (Traversaro, 2016) also takes the triangle inequality into account while doing non-linear optimization with a special solver working on manifolds (or also formulating it for general solvers). The method is likely not as performant as using SDP and does not find the global solution. However, it is also claimed to always find a physical consistent solution. Possibly starting at the CAD parameters will improve that. It is not completely clear also to the authors if the triangle inequality needs to hold for control or simulation (it is maybe not that important) but it is nonetheless a strict property of physical correct inertia tensors.

In general, optimization should minimize the residual error, while optionally minimizing the error between $x_\text{cad}$ and the solution that we look for $x_\text{opt}$ while constraining the single parameters to the feasibility criterion.

### Feasibility constraints as LMI

Following the notation of (Yoshida and Khalil, 2000) and (Sousa, 2014), the linear constraints for each link to be physical consistent can be given as follows: 

$$\begin{equation}
  \left \{
  \begin{aligned}
    &m_k > 0 \\
    &I_k \succ 0
  \end{aligned} \right.
\end{equation} 
$$
with $m_k$ the mass of link $k$ and $I_k$ the inertia tensor of link $k$ around its center of mass.
As with linear regression, rewriting this as LMI, instead of $I_k$ the inertia is expressed relative to the link frame.

$$\begin{equation}L_k = I_k + m_k S(r_k)^T S(r_k)\end{equation}$$
with $S(\cdot)$ being the skew-symmetric matrix operator

$$\begin{equation}
 S_k(x) = \left [
  \begin{matrix}
    0 & -x_3 & x_2 \\
    x_3 & 0 & -x_1 \\
    -x_2 & x_1 & 0
  \end{matrix} \right ]
\end{equation}, \,
\text{for } x = \left [
  \begin{matrix}
    x_1 \\
    x_2 \\
    x_3 
  \end{matrix} \right ] $$
  
Furthermore, for each link $k$ instead of using the vector $r_k = \left [
  \begin{matrix}
    r_{k,x} \\
    r_{k,y} \\
    r_{k,z}
  \end{matrix} \right ]$ which is the center-of-mass relative to the link-frame,
the first moment-of-inertia vector $l_k$ given by
$$l_k = \left [ \begin{matrix}
    l_{k,x} \\
    l_{k,y} \\
    l_{k,z}
  \end{matrix} \right ]
  = m_k r_k = \left [ \begin{matrix}
    m_k r_{k,x} \\
    m_k r_{k,y} \\
    m_k r_{k,z}
  \end{matrix} \right ] $$
is used. Substituting, transposing and using the linearity of $S(\cdot)$, we get
$$\begin{eqnarray}
    L_k & = & I_k + m_k S(\frac{l_k}{m_k})^T S(\frac{l_k}{m_k}) \\
    I_k & = & L_k - \frac{1}{m_k} S(l_k)^T S(l_k)
\end{eqnarray} $$

The feasibility constraints can hence be written as: 
$$\begin{eqnarray}
  \left \{
  \begin{aligned}
    &m_k > 0 \\
    &L_k - \frac{1}{m_k} S(l_k)^T S(l_k) \succ 0
  \end{aligned} \right.
\end{eqnarray} 
$$

The set of all dynamic parameter vectors which are physically feasible, with
respect to inertial parameters (L), can be defined as
$$D_L = \left\{ x \in \mathbb{R}^n : m_k>0, L_k − \frac{1}{m_k} S(l_k)^T S(l_k)\succ 0 \, \middle| \, k=1,\dots,N \right \}$$

(TODO: formulate the actual D-Blocks with Schur complement etc. for the SDP LMI)

Further constraints can be added easily to the constraint matrix, e.g. for each link's mass (e.g. known or close to CAD mass), global mass (after weighing the whole robot), COM position within bounds (e.g. inside a given mesh hull) or parameter symmetry (manual or automatically derived from CAD params).

### Constrained OLS as SDP optimization problem

$\newcommand\norm[1]{\left\lVert#1\right\rVert}$

The optimization problem to find the constrained base solution $x_{base}$
\begin{equation*}
\begin{aligned}
& \underset{(u,x_{base})}{\text{minimize}} & & u \\
& \text{subject to}
& & u \ge \norm{\tau-Y_{base}x_{base}}^2 \\
\end{aligned}
\end{equation*}

using the Schur complement, this can be written in SDP form:
\begin{equation*}
\begin{aligned}
\norm{\tau-Y_{base}x_{base}}^2 \le u \\
u - (\tau-Y_{base}x_{base})^T(\tau-Y_{base}x_{base}) \ge 0 \\
U_{\tau}(u,x_{base}) \succeq 0
\end{aligned}
\end{equation*}

with 
$$U_{\tau}(u,x_{base}) = \left [
  \begin{matrix}
    u & (\tau - Y_{base}x_{base})^T \\
    \tau - Y_{base}x_{base} & 1 \\
  \end{matrix} \right ]
$$

The SDP problem then is
\begin{equation*}
\begin{aligned}
& \underset{(u,x_{base})}{\text{minimize}} & & u \\
& \text{subject to}
& & U_{\tau}(u,x_{base}) \succeq 0 \\
\end{aligned}
\end{equation*}

This can already be solved with SDP but can have many variables with higher amounts of DOF.
Using the QR decomposition, the problem can further be simplified.

Given the QR decomposition to the (base) regressor, $Y_{base} = QR = \left [
  \begin{matrix}Q_1 \, Q_2 \end{matrix} \right ]
  \left [  \begin{matrix}R_1 \\ 0 \end{matrix} \right ] = Q_1 R_1$,
and knowing that Q is orthogonal, the error $\rho$ in $\norm{\rho}^2$ can be multiplied with $Q_1$ without change
$$ (Q^T \epsilon)^T (Q^T\epsilon) = \epsilon^TQQ^T = \epsilon \epsilon^T = {\norm{\epsilon}}^2 $$

and written out, the optimization objective can be written as
$$ {\norm{\epsilon}}^2 = {\norm{Q^T \tau -Q^T Y_{base} x_{base}}}^2 = {\norm{\left [
  \begin{matrix}Q_1^T \\ Q_2^T \end{matrix} \right ] \tau - \left [  \begin{matrix}R_1 \\ 0 \end{matrix} \right ] x_{base}}}^2 $$

Defining $\rho = \left [\begin{matrix}\rho_1 \\ \rho_2 \end{matrix} \right ] = \left [
  \begin{matrix}Q_1^T\tau \\ Q_2^T\tau \end{matrix} \right ]$

the error can be written as:

$$\norm{\epsilon}^2 = {\norm{\left [\begin{matrix}\rho_1 \\ \rho_2 \end{matrix} \right ] - \left [  \begin{matrix}R_1 \\ 0 \end{matrix} \right ] x_{base}}}^2 = \norm{\rho_2}^2 + \norm{\rho_1 - R_1 x_{base}}^2$$

Hence, the optimization problem becomes 
$$ u - \norm{\rho_2}^2 = \norm{\rho_1 - R_1 x_{base}}^2 $$

An SDP can then be formulated that finds the global optimal feasible solution
\begin{equation*}
\begin{aligned}
& \underset{(u, x_{base})}{\text{minimize}} & & u \\
& \text{subject to}
& & U_{\rho_1}(u, x_{base}) \succeq 0 \\
\end{aligned}
\end{equation*}

with 
$$ U_{\rho_1}(u,x_{base}) = \left [
  \begin{matrix}
    (u - \norm{\rho_2}^2) & (\rho_1 - R_1 x_{base})^T\\
    \rho_1 - R_1 x_{base} & 1 \\
  \end{matrix} \right ] $$


### SDP with contact forces

If the regressor includes the rows for the base link dynamics, then the above formulation will also work for floating base dynamics. However, the contact forces also have to be included.

From the previous OLS optimization problem formulation, the contact forces are added in a straight forward way.

$$ U_{\tau}(u,x_{base}) = \left [
  \begin{matrix}
    u & (\tau - (Y_{base}x_{base} - \sum_{l \in L} J_{l}^T f_l^{ext}))^T \\
    \tau - (Y_{base}x_{base}-\sum_{l \in L} J_{l}^T f_l^{ext}) & 1 \\
  \end{matrix} \right ] $$

Then doing the same QR minimization step as above,

$$ \begin{align}
\norm{\epsilon}^2
& = \norm{Q^T \tau - Q^T (Y_{base}x_{base} - \sum_{l \in L} J_{l}^T f_l^{ext})}^2 \\
& = \norm{\left[
  \begin{matrix}Q_1^T \\ Q_2^T \end{matrix} \right] \tau - \left(\left[ \begin{matrix}R_1 \\ 0 \end{matrix} \right] x_{base} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right)}^2 \\
& = \norm{\left [\begin{matrix}\rho_1 \\ \rho_2 \end{matrix} \right ] - \left(R_1 x_{base} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right)}^2 \\
& = \norm{\rho_2}^2 + \norm{\rho_1 - \left( R_1 x_{base} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right) }^2
\end{align}$$

we end up with

$$ U_{\rho_1}(u,x_{base}) = \left [
  \begin{matrix}
    (u - \norm{\rho_2}^2) & (\rho_1 - \left( R_1 x_{base} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right))^T\\
    \rho_1 - \left( R_1 x_{base} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right) & 1 \\
  \end{matrix} \right ] $$
  
for the unconstrained OLS optimization problem.

### SDP in Standard parameter space

Formulating the optimization problem directly in standard space yields physical consistent solutions which are optimal in terms of the residual prediction error. The solutions are however not unique because the standard space has redundant dimensions and there is an $n$-dimensional solution space in which all solutions produce the same torque prediction. If the used base solution is not physical consistent, this may not find a solution that is optimal or not find a solution at all (due to numeric reasons).
It is possible to find a feasible base solution then and find a corresponding standard solution in order to get any solution that is consistent. It will then have an increased residual error.

We use $B$ again as the projection matrix that combines the standard columns to the base columns.

\begin{equation*}
\begin{aligned}
& \underset{(u, B x_{std})}{\text{minimize}} & & u \\
& \text{subject to}
& & U_{\tau}(u, B x_{std}) \succeq 0 \\
\end{aligned}
\end{equation*}

with 
$$ U_{\tau}(u,x_{std}) = \left [
  \begin{matrix}
    u & (\tau - Y_{base} B x_{std})^T \\
    \tau - Y_{base} B x_{std} & 1 \\
  \end{matrix} \right ] $$

which after minimization and adding contact forces becomes

$$ U_{\rho_1}(u, Bx_{std}) = \left [
  \begin{matrix}
    (u - \norm{\rho_2}^2) & (\rho_1 - \left( R_1 Bx_{std} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right))^T\\
    \rho_1 - \left( R_1 Bx_{std} - Q^T\sum_{l \in L} J_{l}^T f_l^{ext} \right) & 1 \\
  \end{matrix} \right ] $$

## References

(Notation mostly from here http://www.scholarpedia.org/article/Robot_dynamics)

Swevers, Gansemann et. al., 1997: Optimal Robot Excitation and Identification

Ting, Mistry, Peters, et. al., 2006: A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics

Ding, Wu, et. al., 2015: Dynamic Model Identification for 6-DOF Industrial Robots, http://www.hindawi.com/journals/jr/2015/471478/

Khosla, 1985: Parameter Identification of Robot Dynamics

Siciliano, Scaviccio, et. al., 2009: Robotics Modelling, Planning and Control, Springer

Traversaro, Del Prete, et. al., 2015: Inertial parameters identification and joint torques estimation with proximal force/torque sensing

Gautier, 2013: Identification of Consistent Standard Dynamic Parameters of Industrial Robots

Gautier, 1990: Numerical Calculation of the base Inertial Parameters of Robots

Sousa, Cortesão, 2014: Physical feasibility of robot base inertial parameter identification: A linear matrix inequality approach