## <span style="color:#a4d4a3">**Least Squares**</span>

The <span style="color:#ffa500">**Least Squares**</span> method is one of the cornerstones of modern estimation and optimization in robotics.  
It provides a simple yet powerful way to find the state that best fits a set of noisy measurements.

Least Squares tries to <span style="color:#ffa500">**minimize the difference**</span> between what we <span style="color:#ffa500">**measure**</span> and what we <span style="color:#ffa500">**expect to measure**</span>.

- Originally used decades ago, but computationally too expensive for large systems.  

- With the rise of efficient solvers (gtsam, g2o and isam2) and sparse linear algebra in the 2010s, it made a strong comeback in SLAM and computer vision.  

- Today, it is the foundation of most <span style="color:#ffa500">**graph-based SLAM**</span>, <span style="color:#ffa500">**bundle adjustment**</span>, and <span style="color:#ffa500">**trajectory optimization**</span> techniques.


---


## üåê <span style="color:#a4d4a3"> **Least Squares in General** </span>

The Least Squares is designed to compute a solution for an <span style="color:#ffa500">**overdetermined**</span> system (*‚Äúmore equations than unknowns‚Äù*).  

<span style="color:#00703c">**Goal:**</span>

- Minimize the <span style="color:#ffa500">**sum of squared errors**</span> in the equations.  

Standard approach for a large set of problems.

<span style="color:#00703c"> **Example:** </span>

- In <span style="color:#ffa500">**regression models**</span>, Least Squares is used to find the line or curve that best fits a set of observed data.  

### üß© <span style="color:#a4d4a3"> **Problem Definition** </span>

Given a system described by a set of $n$ observation functions $\{f_i(\mathbf{x})\}_{i=1:n}$

<span style="color:#00703c">**Let:**</span>

- $\mathbf{x} \;\;$ be the <span style="color:#ffa500">**state vector**</span>,

- $\mathbf{z}_i \;\;$ be a <span style="color:#ffa500">**measurement**</span> of the state $\mathbf{x}$,

- $\hat{\mathbf{z}}_i = f_i(\mathbf{x}) \;\;$ be a function which maps $\mathbf{x}$ to a <span style="color:#ffa500">**predicted measurement**</span> $\hat{\mathbf{z}}_i$.

<span style="color:#00703c">**Given:**</span>

- $n$ <span style="color:#ffa500">**noisy measurements**</span> $\mathbf{z}_{1:n}$ about the state $\mathbf{x}$.

<span style="color:#00703c">**Goal:**</span>

- Estimate the state $\mathbf{x}$ which <span style="color:#ffa500">**best explains the measurements**</span> $\mathbf{z}_{1:n}$.

---

### üö® <span style="color:#a4d4a3"> **Error Function** </span>

The <span style="color:#ffa500">**error**</span> $\mathbf{e}_i$ is typically the <span style="color:#ffa500">**difference**</span> between the <span style="color:#ffa500">**predicted**</span> and <span style="color:#ffa500">**actual**</span> measurement:

$$
\mathbf{e}_i(\mathbf{x}) = \mathbf{z}_i - f_i(\mathbf{x})
$$

- We assume the error has <span style="color:#ffa500">**zero mean**</span> and is <span style="color:#ffa500">**normally distributed**</span>.

- Gaussian error with <span style="color:#ffa500">**information matrix**</span> $\mathbf{\Omega}_i$.

- The <span style="color:#ffa500">**squared error**</span> of a measurement depends only on the state and is a scalar:

$$
e_i(\mathbf{x}) = \mathbf{e}_i(\mathbf{x})^{T}\,\mathbf{\Omega}_i\,\mathbf{e}_i(\mathbf{x})
$$


### üìâ <span style="color:#a4d4a3">**Find the Minimum**</span>

Find the state $\mathbf{x}^\star$ which  <span style="color:#ffa500">**minimizes the error**</span> given all measurements:

$$
\begin{aligned}
\mathbf{x}^* &= \arg\min_\mathbf{x} F(\mathbf{x}) \\
             &= \arg\min_\mathbf{x} \sum_i e_i(\mathbf{x}) \\
             &= \arg\min_\mathbf{x} \sum_i \mathbf{e}_i(\mathbf{x})^T \, \mathbf{\Omega}_i \, \mathbf{e}_i(\mathbf{x}).
\end{aligned}
$$

where:

- $\mathbf{\Omega}_i$ represents our <span style="color:#ffa500">**uncertainty**</span> in the measurements.

- A general solution is to <span style="color:#ffa500">**derive the global error function**</span> and find its nulls.

- In general this is complex with <span style="color:#ffa500">**no closed-form**</span> solution.

   **‚Ü≥** Use <span style="color:#ffa500">**Numerical Approaches**</span>.

#### ü§î <span style="color:#a4d4a3">Assumptions</span>

- A <span style="color:#ffa500">**good initial guess**</span> is available.  

- The error functions are <span style="color:#ffa500">***‚Äúsmooth‚Äù***</span> in the neighborhood of the *(hopefully)* global minima.

   **‚Ü≥** Then we can solve by <span style="color:#ffa500">**iterative local linearization**</span>.


#### <span style="color:#a4d4a3">Solve via Iterative Local Linearizations</span>

1. <span style="color:#ffa500">**Linearize**</span> the error terms around the current solution (<span style="color:#ffa500">**initial guess**</span>).

2. Compute the <span style="color:#ffa500">**first derivative**</span> of the squared error function.

3. Set it to <span style="color:#ffa500">**zero**</span> and solve a <span style="color:#ffa500">**linear system**</span>.

4. Obtain the <span style="color:#ffa500">**new state**</span> (hopefully closer to the minimum).

5. <span style="color:#ffa500">**Iterate.**</span>

#### <span style="color:#a4d4a3">Linearize the Error Function</span>

Approximate the error functions <span style="color:#ffa500">**around an initial guess**</span> $\mathbf{x}$ via a Taylor expansion:

$$
\mathbf{e}_i(\mathbf{x} + \Delta \mathbf{x}) \approx \mathbf{e}_i + \mathbf{J}_i(\mathbf{x})\,\Delta \mathbf{x},
$$

where:

- $\mathbf{J}_i$ is the Jacobian of $\mathbf{e}_i$ w.r.t. $\mathbf{x}$.

#### <span style="color:#a4d4a3">Squared Error</span>

- With the previous linearization, we fix $\mathbf{x}$ and carry out the <span style="color:#ffa500">**minimization in the increments**</span> $\Delta \mathbf{x}$. 

- We <span style="color:#ffa500">**replace**</span> the Taylor expansion in the squared error terms as follows:

Let:

$$
\mathbf{e}_i(\mathbf{x}) = \mathbf{z}_i - f_i(\mathbf{x}),\qquad
e_i(\mathbf{x}) = \mathbf{e}_i(\mathbf{x})^{T}\,\mathbf{\Omega}_i\, \mathbf{e}_i(\mathbf{x}),\qquad
F(\mathbf{x})=\sum_i e_i(\mathbf{x}).
$$

Linearize:
$$
\mathbf{e}_i(\mathbf{x}+\Delta \mathbf{x}) \approx \mathbf{e}_i(\mathbf{x}) + \mathbf{J}_i\,\Delta \mathbf{x}.
$$

Substitute into $e_i$:

$$
\begin{aligned}
e_i(x) &\approx \big(\mathbf{e}_i + \mathbf{J}_i\,\Delta \mathbf{x}\big)^{T}\,\mathbf{\Omega}_i\,\big(\mathbf{e}_i + \mathbf{J}_i\,\Delta \mathbf{x}\big) \\
&= \mathbf{e}_i^T\;\mathbf{\Omega}\;\mathbf{e}_i + \Delta \mathbf{x}\;\mathbf{J}_i\;\mathbf{\Omega}_i\;\mathbf{e}_i + \mathbf{e}_i^T\;\mathbf{\Omega}_i\;\mathbf{J}_i\;\Delta\mathbf{x} + \Delta\mathbf{x}^T \;\mathbf{J}_i\;\mathbf{\Omega}_i\;\mathbf{J}_i\;\Delta\mathbf{x}\\
&= \underbrace{\mathbf{e}_i^T\;\mathbf{\Omega}\;\mathbf{e}_i}_{={c}_i} + \Delta\mathbf{x}^T \underbrace{\;\mathbf{J}_i\;\mathbf{\Omega}_i\;\mathbf{J}_i}_{=\mathbf{H}_i}\;\Delta\mathbf{x} + 2\underbrace{\;\mathbf{e}_i^T\;\mathbf{\Omega}_i\;\mathbf{J}_i}_{=\mathbf{b}_i^T}\;\Delta\mathbf{x}
\end{aligned}
$$

Hence

$$
e_i(x) \approx c_i + 2\,\mathbf{b}_i^T\;\Delta\mathbf{x} + \Delta\mathbf{x}^T \;\mathbf{H}_i\;\Delta\mathbf{x}
$$


#### <span style="color:#a4d4a3">Global Error</span>

- The global error is the <span style="color:#ffa500">**sum of the squared error**</span> terms corresponding to the individual measurements.

- For a new expression which approximates the global error in the <span style="color:#ffa500">**neighborhoud of the current solution**</span> $\mathbf{x}$:

From $\:\: F(\mathbf{x})=\sum_i e_i(\mathbf{x})$, $\;$ we get:

$$
\begin{aligned}
F(\mathbf{x} + \Delta\mathbf{x}) &\approx \sum_i \Big( c_i \;+\; 2\,\mathbf{b}_i^T\;\Delta\mathbf{x} \;+\; \Delta\mathbf{x}^T \;\mathbf{H}_i\;\Delta\mathbf{x}\Big) \\
& = \underbrace{\sum_i c_i}_{c} + 2\;(\underbrace{\sum_i \mathbf{b}_i}_{\mathbf{b}})^T\Delta\mathbf{x} + \Delta \mathbf{x}^T\Big(\underbrace{\sum_i \mathbf{H}_i}_{\mathbf{H}}\Big)\Delta\mathbf{x} \\
& = c + 2\mathbf{b}^T\Delta\mathbf{x} + \Delta\mathbf{x}^T\mathbf{H}\Delta\mathbf{x}
\end{aligned}
$$

with:

$$
\begin{aligned}
\mathbf{b}^T &= \sum_i \mathbf{e}_i^T\mathbf{\Omega}_i\mathbf{J}_i \\
\mathbf{H} &= \sum_i \mathbf{J}_i^T\mathbf{\Omega}\mathbf{J}_i
\end{aligned}
$$

---


### ‚úçÔ∏è <span style="color:#a4d4a3">**Quadratic Form Minimization**</span>

Based on the above, we can write the global error term as a <span style="color:#ffa500">**quadratic form**</span> in $\Delta\mathbf{x}$:

$$
F(\mathbf{x} + \Delta\mathbf{x}) \approx c + 2\mathbf{b}^T\Delta\mathbf{x} + \Delta\mathbf{x}^T\mathbf{H}\Delta\mathbf{x}
$$

The <span style="color:#ffa500">**approximated derivation**</span> of $F(\mathbf{x} + \Delta\mathbf{x})$ w.r.t. $\Delta\mathbf{x}$ is then:

$$
\frac{\partial F(\mathbf{x} + \Delta\mathbf{x})}{\partial \Delta\mathbf{x}} \approx 2\mathbf{b} + 2\mathbf{H}\,\Delta\mathbf{x}.
$$

Setting it <span style="color:#ffa500">**to zero**</span> (minimum condition):

$$
0 = 2\mathbf{b} + 2\mathbf{H}\Delta\mathbf{x}
$$

Which leads to the <span style="color:#ffa500">**linear system**</span>:

$$
\mathbf{H} \Delta\mathbf{x} = -\mathbf{b}  
$$

The solution for the increment $\Delta\mathbf{x}^*$ is:

$$
\Delta\mathbf{x}^* = -\mathbf{H}^{-1}\mathbf{b}
$$

---



### üéì <span style="color:#a4d4a3"> Gauss-Newton Solution

**Iterate the following steps:**

1. <span style="color:#ffa500">**Linearize**</span> around $\mathbf{x}$ and compute for each measurement:
   $$
   \mathbf{e}_i(\mathbf{x} + \Delta\mathbf{x}) \approx \mathbf{e}_i(\mathbf{x}) + \mathbf{J}_i \Delta\mathbf{x}.
   $$

2. <span style="color:#ffa500">**Compute the terms**</span> for the linear system:
   $$
   \mathbf{b} = \sum_i \mathbf{e}_i^T \mathbf{\Omega}_i\, \mathbf{J}_i, 
   \qquad
   \mathbf{H} = \sum_i \mathbf{J}_i^T \mathbf{\Omega}_i\,\mathbf{J}_i.
   $$

3. <span style="color:#ffa500">**Solve**</span> the linear system:
   $$
   \mathbf{H} \Delta \mathbf{x}^* = -\mathbf{b} \quad\Rightarrow\quad \Delta \mathbf{x}^* = -\mathbf{H}^{-1}\mathbf{b}.
   $$

4. <span style="color:#ffa500">**Update state:**</span> 

$$
\mathbf{x} \leftarrow \mathbf{x} + \Delta \mathbf{x}^*
$$


---


#### ‚úîÔ∏è <span style="color:#a4d4a3">Gauss-Newton Summary</span>

Method to <span style="color:#ffa500">**minimize a sum of squared errors**</span>.

- Start with an <span style="color:#ffa500">**initial guess**</span>.  

- <span style="color:#ffa500">**Linearize**</span> the individual error functions.  

- This leads to a <span style="color:#ffa500">**quadratic form**</span>.  

- Obtain a <span style="color:#ffa500">**linear system**</span> by setting its derivative to <span style="color:#ffa500">**zero**</span>.  

- Solving the linear system leads to a <span style="color:#ffa500">**state update**</span>.  

- <span style="color:#ffa500">**Iterate.**</span>

---


## üñáÔ∏è <span style="color:#a4d4a3">**Relation to Probabilistic State Estimation**</span>

So far, we minimized an error function.  

*How does this relate to state estimation in the <span style="color:#ffa500">**probabilistic**</span> sense?*

### üßÆ <span style="color:#a4d4a3">**General State Estimation**</span>

Using Bayes‚Äô rule, independence, and the Markov assumption, we can write:

$$
p(x_{0:t}\mid z_{1:t}, u_{1:t}) \propto p(x_0)\,\prod_{t}\,p(x_t\mid x_{t-1},u_t)\,p(z_t\mid x_t).
$$

Written as the <span style="color:#ffa500">**Log-Likelihood**</span>:

$$
\log p(x_{0:t}\mid z_{1:t}, u_{1:t}) = \text{const.} + \log p(x_0) + \sum_t \big[\log p(x_t\mid x_{t-1},u_t) + \log p(z_t\mid x_t)\big].
$$

Assuming <span style="color:#ffa500">**Gaussian distributions**</span>, for a Gaussian $\mathcal N(x;\mu,\Sigma)$:

$$
\begin{aligned}
\log \mathcal N(x;\mu,\Sigma) &= \text{const.} - \tfrac{1}{2}\,\underbrace{(x-\mu)^T}_{\mathbf{e}^T(x)}\,\underbrace{\Sigma^{-1}}_{\mathbf{\Omega}}\,\underbrace{(x-\mu)}_{\mathbf{e}(x)}\\
&= \text{const.} - \tfrac{1}{2}\,\underbrace{\mathbf{e}(x)^T\,\mathbf{\Omega}\,\mathbf{e}(x)}_{\text{quadratic error } e(x)},
\end{aligned}
$$

Thus, up to a constant, the log-likelihood is equivalent to the <span style="color:#ffa500">**error functions**</span> used before.

Therefore,

$$
\log p(x_{0:t}\mid z_{1:t}, u_{1:t}) =\text{const.} - \tfrac{1}{2}\,e_p(x) - \tfrac{1}{2}\sum_t \big(e_{u_t}(x) + e_{z_t}(x)\big),
$$

where $e_p$ is the <span style="color:#ffa500">**prior term**</span>, $e_{u_t}$ the <span style="color:#ffa500">**motion**</span> (odometry) <span style="color:#ffa500">**error**</span>, and $e_{z_t}$ the <span style="color:#ffa500">**measurement error**</span>.

<span style="color:#ffa500">**Maximizing**</span> the log-likelihood leads to:

$$
\arg\max_x \log p(x_{0:t}\mid z_{1:t}, u_{1:t}) \equiv \arg\min_x \Big( e_p(x) + \sum_t \big[ e_{u_t}(x) + e_{z_t}(x) \big] \Big).
$$


<span style="color:#00703c">**Takeaway:**</span>

- <span style="color:#ffa500">**Least squares**</span> (with Gaussian assumptions) is <span style="color:#ffa500">**equivalent to Maximum A Posteriori**</span> (MAP) estimation. 

- <span style="color:#ffa500">**Minimizing the sum of weighted squared residuals**</span> corresponds to <span style="color:#ffa500">**maximizing the posterior probability**.</span>

---

#### ‚úîÔ∏è <span style="color:#a4d4a3">Summary</span>

- Technique to <span style="color:#ffa500">**minimize squared error**</span> functions.

- Gauss-Newton is an <span style="color:#ffa500">**iterative approach**</span> for non-linear problems.

- Uses linearization (approximation).

- Equivalent to <span style="color:#ffa500">**maximizing the log likelihood**</span> of independent Gaussians.

- Popular method in a lot of disciplines.