# ${\color{Blue} \text{ Bayesian Parameter Estimation}} $

### ${\color{Purple}1.} {\color{Blue} \text{On the Beneficial Bias of MMSE Estimation}} $

${\color{Blue} \text {Consider the Bayesian linear model } Y = H \theta+V \text{ with } \theta \sim N(0,C_{\theta\theta}) \text{ and } V \sim N(0,C_{VV} ) \text{ independent }} $
${\color{Blue} \text{ (we consider here } m_{\theta} = 0 \text{ for simplicity) } .} \qquad \qquad$

#### ${\color{Green}(a)}$ ${\color{Blue} \text{ The LMMSE estimator is} \hat{\theta}_{LMMSE} = C_{\theta Y} C_{YY}^{−1}Y =(C_{\theta\theta}^{−1}+ H^T C_{VV}^{−1} H)^{−1} H^T C_{VV}^{−1} Y }$.

${\color{Blue}  \text{ What are the unconstrained (non-linear) MMSE and the MAP estimators? }}$

The Linear Minimum Mean Squared Error (LMMSE) estimator is given by:

$ \hat{\theta}_{\text{LMMSE}} = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y $

Let's revisit the Unconstrained Minimum Mean Squared Error (MMSE) and the Maximum A Posteriori (MAP) estimators.

1. **Unconstrained (Non-linear) MMSE Estimator:**

The unconstrained MMSE estimator is obtained by minimizing the mean squared error without any constraints on the parameter. In the Bayesian framework, the unconstrained MMSE estimator is also known as the posterior mean, and it is given by the mean of the posterior distribution. In the case where $ \theta $ follows a normal distribution, the posterior mean is equal to the posterior distribution's mean:

$ \hat{\theta}_{\text{MMSE}} = (C_{\theta \theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y $

This is the same expression as the LMMSE estimator in this case.

2. **Maximum A Posteriori (MAP) Estimator:**

The MAP estimator seeks the most probable value of the parameter given the observed data and the prior information. It is obtained by maximizing the posterior distribution:

$ \hat{\theta}_{\text{MAP}} = \arg \max_{\theta} P(\theta | Y) $

In the case of a Gaussian prior, this is equivalent to minimizing the negative log posterior, and the MAP estimator is obtained as:

$ \hat{\theta}_{\text{MAP}} = \arg \min_{\theta} \left[ \frac{1}{2} (Y - H \theta)^T C_{VV}^{-1} (Y - H \theta) + \frac{1}{2} \theta^T C_{\theta \theta}^{-1} \theta \right] $

This minimization problem can be solved using optimization techniques. The solution is a compromise between fitting the observed data and staying close to the prior distribution.

In summary, both the unconstrained MMSE and the MAP estimators in this case turn out to be the same as the LMMSE estimator due to the Gaussian assumption for the prior distribution.

#### ${\color{Green}(b)}$ ${\color{Blue} \text{ What is the error covariance matrix ?} }$

$$
\begin{equation}
{\color{Blue}
R_{\tilde{\theta}\tilde{\theta}}^{LMMSE}= E_{\theta} E_{Y|\theta} \tilde{\theta}_{LMMSE}\tilde{\theta}^T_{LMMSE}
}
\end{equation}
$$

Let's simplify the expression step by step. Recall the expression for the error covariance matrix $R_{\tilde{\theta}\tilde{\theta}}^{LMMSE}$:

$ R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} = E_{\theta} E_{Y|\theta} (\tilde{\theta}_{LMMSE} - \theta)(\tilde{\theta}_{LMMSE} - \theta)^T $

Now, substitute the expression for $\tilde{\theta}_{LMMSE}$:

$ \begin{split} R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} & = E_{\theta} E_{Y|\theta} \left[(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y - (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} \theta\right] \\
& \qquad \times \left[Y^T C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} - \theta^T C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1}\right] \end{split}$


Let's simplify the terms inside the expectation. First, let's deal with the terms involving $Y$:

$\begin{split} & E_{\theta} E_{Y|\theta} \left[(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y\right] \times \left[Y^T C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1}\right] \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} E_{\theta} E_{Y|\theta} [Y Y^T] C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta} + H^T C_{VV} H)H^T C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta} + H^T C_{VV} H)C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta} + H^T C_{VV} H)(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \end{split}$

Now, let's deal with the terms involving $\theta$:

$ \begin{split} & E_{\theta} E_{Y|\theta} \left[(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} \theta\right] \times \left[\theta^T C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1}\right] \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} E_{\theta}[\theta \theta^T] C_{VV}^{-T} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} C_{\theta\theta} C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta} + H^T C_{VV} H)C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \end{split} $

Now, combine these two results:

$ \begin{split} R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} & = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta} + H^T C_{VV} H)C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \\
& \quad - (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} \end{split} $

Now, simplify further by factoring out $(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1}$:

$ R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} H H^T C_{VV} C_{VV}^{-1} H (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} - (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} $

Finally:

$ R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV} C_{VV}^{-1} H (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} - (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} $

This is the simplified expression for the error covariance matrix $R_{\tilde{\theta}\tilde{\theta}}^{LMMSE}$. The final form is dependent on the specific structure of the matrices $C_{\theta\theta}$, $C_{VV}$, and the design matrix $H$.

#### ${\color{Green}(c)}$ ${\color{Blue} \text{ The conditional bias of an estimator } \hat{\theta} \text{ is } b_{\hat{\theta}}(\theta) = E_{Y | \theta} \hat{\theta}(Y) − \theta. }$
${\color{Blue} \text{ The BLUE estimator is the LMMSE estimator under the constraint of conditional unbiasedness. So } b_{BLUE}(\theta) = 0 }$.


${\color{Blue} \text{ What is } \hat{\theta}_{BLUE} \text{ in terms of the quantities appearing in the Bayesian linear model  considered here? } }$

In the Bayesian linear model considered here, the BLUE (Best Linear Unbiased Estimator) for the parameter vector $\theta$ is given by:

$ \hat{\theta}_{\text{BLUE}} = (H^T C_{VV}^{-1} H + C_{\theta\theta}^{-1})^{-1} H^T C_{VV}^{-1} Y $

Let's break down the terms:

- $Y$: Observed data vector.
- $H$: Design matrix.
- $C_{VV}$: Covariance matrix of the observation noise $V$.
- $C_{\theta\theta}$: Covariance matrix of the prior distribution for $\theta$.

So, in terms of the quantities appearing in the Bayesian linear model:

1. $H^T C_{VV}^{-1} H$: This term reflects the information provided by the observed data and how well it explains the variations in $\theta$. It is weighted by the inverse covariance of the observation noise.

2. $C_{\theta\theta}^{-1}$: This term reflects the precision of the prior information about $\theta$. It is the inverse of the covariance matrix of the prior distribution for $\theta$.

3. $H^T C_{VV}^{-1} Y$: This term combines the information from the observed data with the inverse covariance of the observation noise.

4. The entire expression $(H^T C_{VV}^{-1} H + C_{\theta\theta}^{-1})^{-1}$: This is the inverse of the sum of the information from the observed data and the precision of the prior distribution. It represents the combined information for estimating $\theta$ in a linear, unbiased, and efficient manner.

So, the BLUE estimator $\hat{\theta}_{\text{BLUE}}$ balances the information from the observed data and the prior distribution, providing an optimal linear estimator for $\theta$ in terms of unbiasedness and efficiency.

${\color{Blue} \text{ Is there another classical deterministic estimator that equals } \hat{\theta}_{BLUE} \text{ in this case? } }$

#### ${\color{Green}(d)}$ ${\color{Blue} \text{ What is the error covariance matrix ? }}$

$$
{\color{Blue}
\begin{equation} 
R_{\tilde{\theta}\tilde{\theta}}^{BLUE}= E_{\theta} E_{Y|\theta} \tilde{\theta}_{BLUE}\tilde{\theta}^T_{BLUE}
\end{equation}
}
$$

#### ${\color{Green}(e)}$ ${\color{Blue} \text{ Show that } R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} \leq R_{\tilde{\theta}\tilde{\theta}}^{BLUE} }$
${\color{Blue} \text{ Note that this is true in spite of } \hat{\theta}_{LMMSE} \text{  being (conditionally) biased and } \hat{\theta}_{BLUE} \text{  being unbiased. }}$

#### ${\color{Green}(f)} {\color{Blue} \text{  Returning to } \hat{\theta}_{LMMSE} \text{ , what is the bias } b_{LMMSE}(\theta) ? }$

#### ${\color{Green}(g)} {\color{Blue} \text{  Show that }}$

$$
{\color{Blue}
\begin{gather}
R_{\tilde{\theta}\tilde{\theta}} = \frac{E_{\theta} b_{\hat\theta}(\theta) b_{\hat\theta}^T(\theta)}{\widehat{(bias)^2}} + \frac{E_{\theta}E_{Y | \theta} (\hat{\theta} - E_{Y|\theta}\hat{\theta})(\hat{\theta} - E_{Y|\theta}\hat{\theta})^T }{\widehat{variance}}
\end{gather}
}
$$


#### ${\color{Green}(h)} {\color{Blue} \text{ Compute } E_\theta \, b_{LMMSE}(\theta) \, b_{LMMSE}^T (\theta) .}$

#### ${\color{Green}(i)} {\color{Blue} \text{ Compute } E_{\theta} E_{Y|\theta} (\tilde{\theta}_{LMMSE} - E_{Y|\theta} \tilde{\theta}_{LMMSE} ) \, (\tilde{\theta}_{LMMSE} - E_{Y|\theta} \tilde{\theta}_{LMMSE})^T .}$
$
{\color{Blue} \text{ Note that the sum of the positive definite matrices in}} {\color{Green}(h)} {\color{Blue} \text{ and } } {\color{Green}(i)}  {\color{Blue} \text{ yields } R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} \text{ , for which }} {\color{Green}(e)} {\color{Blue} \text{ holds. }}
$
$
{\color{Blue} \text{ Hence, in spite of the the fact that LMMSE introduces a (conditional) bias, it allows to reduce the variance }}
$
$
{\color{Blue} \text{ so much that the sum of variance and squared bias gets lower than the variance in the unbiased case. }}
$

# References
- [ ] [Bayesian/ MMSE Estimation for MIMO/ OFDM Wireless Communications](https://www.youtube.com/playlist?list=PLvb0jK8jJn_Zv6tnuHyWdxhohB3NzhG6N)
   - [ ] [Two Important Quantities](https://www.youtube.com/watch?v=3mwN7Z18iSk&list=PLvb0jK8jJn_Zv6tnuHyWdxhohB3NzhG6N&t=1057s)