# Bayesian Parameter Estimation

#### ${\color{Cyan} 1. }$ On the Beneficial Bias of MMSE Estimation


Consider the Bayesian linear model $Y = H\theta+V$ with $θ ∼ N(0,C_{θθ})$ and $V ∼ N(\theta,C_{VV} )$ independent

The Bayesian linear model you've provided is described by the equation:

$ Y = H \theta + V $

where:
- $ Y $ is the observed data vector.
- $ H $ is the design matrix.
- $ \theta $ is the parameter vector with a prior distribution $ \theta \sim \mathcal{N}(0, C_{\theta \theta}) $.
- $ V $ is the noise vector with $ V \sim \mathcal{N}(0, C_{V V}) $.

This model assumes that the parameter vector $ \theta $ follows a normal distribution with mean zero and covariance matrix $ C_{\theta \theta} $, and the noise vector $ V $ follows a normal distribution with mean zero and covariance matrix $ C_{V V} $. The parameters $ \theta $ and $ V $ are assumed to be independent.

To perform Bayesian inference in this context, you typically want to estimate the posterior distribution of $ \theta $ given the observed data $ Y $. The posterior distribution is given by Bayes' theorem:

$ P(\theta | Y) \propto P(Y | \theta) P(\theta) $

where:
- $ P(\theta | Y) $ is the posterior distribution.
- $ P(Y | \theta) $ is the likelihood, given by the likelihood of the observed data given the parameters.
- $ P(\theta) $ is the prior distribution of the parameters.

The specific details of Bayesian inference, including the choice of prior distributions, the form of the likelihood, and the computation of the posterior, depend on the specific problem you are trying to solve.

If you have specific data and would like to perform Bayesian inference in Julia or have questions about a particular aspect of the model, please provide more details so that I can assist you more effectively.

- [ ] we consider here mθ = 0 for simplicity

If you're considering a prior distribution $ \theta \sim \mathcal{N}(0, C_{\theta \theta}) $ with a mean $ m_{\theta} = 0 $, it simplifies the model. The prior distribution for $ \theta $ is then fully characterized by the covariance matrix $ C_{\theta \theta} $. With this simplification, the Bayesian linear model becomes:

$ Y = H \theta + V $

where:
- $ Y $ is the observed data vector.
- $ H $ is the design matrix.
- $ \theta $ is the parameter vector with $ \theta \sim \mathcal{N}(0, C_{\theta \theta}) $.
- $ V $ is the noise vector with $ V \sim \mathcal{N}(0, C_{V V}) $.

To perform Bayesian inference in this case, you would update the prior to $ P(\theta) = \mathcal{N}(0, C_{\theta \theta}) $, and the posterior distribution is given by:

$ P(\theta | Y) \propto P(Y | \theta) P(\theta) $

Where:
- $ P(\theta | Y) $ is the posterior distribution.
- $ P(Y | \theta) $ is the likelihood.
- $ P(\theta) $ is the prior distribution.

Given the mean $ m_{\theta} = 0 $, the entire information about the prior distribution is contained in the covariance matrix $ C_{\theta \theta} $. If you have data $ Y $ and want to perform Bayesian inference, you would need to specify the likelihood $ P(Y | \theta) $ based on your specific problem.

If you have specific data and would like to proceed with Bayesian inference in Julia or have further questions, feel free to provide more details.

${\color{Cyan}(a)}$ The LMMSE estimator is $\hat{\theta}_{LMMSE} = C_{\theta Y}C_{YY}^{-1}Y =(C_{\theta\theta}^{−1}+H^T C_{VV}^{−1}H)^{−1} H^T C_{VV}^{−1}Y. $
What are the unconstrained (non-linear) MMSE and the MAP estimators?

The Linear Minimum Mean Squared Error (LMMSE) estimator is given by:

$ \hat{\theta}_{\text{LMMSE}} = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y $

Let's revisit the Unconstrained Minimum Mean Squared Error (MMSE) and the Maximum A Posteriori (MAP) estimators.

1. **Unconstrained (Non-linear) MMSE Estimator:**

The unconstrained MMSE estimator is obtained by minimizing the mean squared error without any constraints on the parameter. In the Bayesian framework, the unconstrained MMSE estimator is also known as the posterior mean, and it is given by the mean of the posterior distribution. In the case where $ \theta $ follows a normal distribution, the posterior mean is equal to the posterior distribution's mean:

$ \hat{\theta}_{\text{MMSE}} = (C_{\theta \theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y $

This is the same expression as the LMMSE estimator in this case.

2. **Maximum A Posteriori (MAP) Estimator:**

The MAP estimator seeks the most probable value of the parameter given the observed data and the prior information. It is obtained by maximizing the posterior distribution:

$ \hat{\theta}_{\text{MAP}} = \arg \max_{\theta} P(\theta | Y) $

In the case of a Gaussian prior, this is equivalent to minimizing the negative log posterior, and the MAP estimator is obtained as:

$ \hat{\theta}_{\text{MAP}} = \arg \min_{\theta} \left[ \frac{1}{2} (Y - H \theta)^T C_{VV}^{-1} (Y - H \theta) + \frac{1}{2} \theta^T C_{\theta \theta}^{-1} \theta \right] $

This minimization problem can be solved using optimization techniques. The solution is a compromise between fitting the observed data and staying close to the prior distribution.

In summary, both the unconstrained MMSE and the MAP estimators in this case turn out to be the same as the LMMSE estimator due to the Gaussian assumption for the prior distribution.

${\color{Cyan}(b)}$ What is the error covariance matrix ?
$R_{\tilde{\theta}\tilde{\theta}}^{LMMSE} = E_\theta E_{Y|\theta} \tilde{\theta}_{LMMSE} \tilde{\theta}_{LMMSE}^T$

The error covariance matrix, denoted as $R_{\tilde{\theta}\tilde{\theta}}^{\text{LMMSE}}$, represents the covariance matrix of the estimation error $\tilde{\theta}_{\text{LMMSE}}$, where:

$ \tilde{\theta}_{\text{LMMSE}} = \theta - \hat{\theta}_{\text{LMMSE}} $

Here, $\theta$ is the true parameter vector, and $\hat{\theta}_{\text{LMMSE}}$ is the LMMSE estimator.

To calculate $R_{\tilde{\theta}\tilde{\theta}}^{\text{LMMSE}}$, we need the expectations under $E_\theta$ and $E_{Y|\theta}$ of the outer product of the estimation error. It is important to note that $E_\theta$ is the expectation with respect to the true parameter $\theta$, and $E_{Y|\theta}$ is the conditional expectation given the observed data $Y$ and the true parameter $\theta$.

The error covariance matrix is given by:

$ R_{\tilde{\theta}\tilde{\theta}}^{\text{LMMSE}} = E_\theta E_{Y|\theta} (\tilde{\theta}_{\text{LMMSE}} \tilde{\theta}_{\text{LMMSE}}^T) $

Given that $\tilde{\theta}_{\text{LMMSE}} = \theta - \hat{\theta}_{\text{LMMSE}}$, we can substitute this expression:

$ R_{\tilde{\theta}\tilde{\theta}}^{\text{LMMSE}} = E_\theta E_{Y|\theta} ((\theta - \hat{\theta}_{\text{LMMSE}}) (\theta - \hat{\theta}_{\text{LMMSE}})^T) $

Expanding this expression involves taking expectations over both $\theta$ and $Y|\theta$. The exact form of $R_{\tilde{\theta}\tilde{\theta}}^{\text{LMMSE}}$ will depend on the specific details of the prior distribution, the likelihood function, and the observation model.

It's important to note that the calculation of this error covariance matrix involves complex integrations or summations and may require specific assumptions about the statistical properties of the prior and likelihood. In practical situations, numerical methods or simulations may be used for these calculations. If you have specific details about the prior and likelihood, I can provide more tailored guidance.

${\color{Cyan}(c)}$ The conditional bias of an estimator 
$\hat{\theta}$ is $b_{\hat{\theta}}(\theta) = E_{y|\theta} \hat{\theta} (Y) - \theta$

The conditional bias of an estimator $\hat{\theta}$, denoted as $b_{\hat{\theta}}(\theta)$, is defined as the expected difference between the estimator and the true parameter value $\theta$, given the true parameter value $\theta$. It is calculated as:

$ b_{\hat{\theta}}(\theta) = E_{Y|\theta} [\hat{\theta}(Y)] - \theta $

Here,
- $E_{Y|\theta} [\hat{\theta}(Y)]$ is the conditional expectation of the estimator given the true parameter value $\theta$.
- $\theta$ is the true parameter value.

The conditional bias measures the systematic error of the estimator when the true parameter value is known. If the conditional bias is zero for all possible values of $\theta$, the estimator is said to be unbiased.

It's important to note that the expectation $E_{Y|\theta} [\hat{\theta}(Y)]$ is taken over the distribution of the estimator $\hat{\theta}$ given the observed data $Y$ and the true parameter value $\theta$.

In the context of the Linear Minimum Mean Squared Error (LMMSE) estimator, the conditional bias $b_{\hat{\theta}_{\text{LMMSE}}}(\theta)$ is calculated as:

$ b_{\hat{\theta}_{\text{LMMSE}}}(\theta) = E_{Y|\theta} [\hat{\theta}_{\text{LMMSE}}(Y)] - \theta $

where $\hat{\theta}_{\text{LMMSE}}(Y)$ is the LMMSE estimator given the observed data $Y$, and the expectation is taken with respect to the conditional distribution of $Y$ given the true parameter value $\theta$.

The specific calculation of $b_{\hat{\theta}_{\text{LMMSE}}}(\theta)$ involves integrating or summing over the conditional distribution of $Y$ given $\theta$ and depends on the specific details of the prior, likelihood, and observation model. If you have more specific information about the distribution of $Y|\theta$ and the prior, I can provide more detailed guidance.

${\color{Cyan} * }$ The BLUE estimator is the LMMSE estimator under the constraint of conditional
unbiasedness. So $b_{BLUE}(\theta) = 0$.

Yes, that's correct. The Best Linear Unbiased Estimator (BLUE) is essentially the Linear Minimum Mean Squared Error (LMMSE) estimator under the constraint of conditional unbiasedness. This constraint ensures that, for every possible true parameter value $\theta$, the conditional expectation of the estimator given the observed data is equal to $\theta$.

Mathematically, for the BLUE estimator $\hat{\theta}_{\text{BLUE}}$, the conditional bias is zero for all $\theta$:

$ b_{\hat{\theta}_{\text{BLUE}}}(\theta) = E_{Y|\theta} [\hat{\theta}_{\text{BLUE}}(Y)] - \theta = 0 $

This condition of conditional unbiasedness makes the BLUE estimator particularly appealing, as it combines linearity (L) and unbiasedness (U) under the minimum mean squared error criterion.

In the context of the LMMSE estimator, the BLUE estimator is obtained by setting the constraint:

$ E_{Y|\theta} [\hat{\theta}_{\text{LMMSE}}(Y)] - \theta = 0 $

This constraint is satisfied by the LMMSE estimator, and hence, the LMMSE estimator is also the BLUE estimator in this case.

In summary, for the BLUE estimator, the conditional bias is zero, ensuring conditional unbiasedness. The LMMSE estimator, under this condition, becomes the BLUE estimator.

${\color{Cyan} * }$ What is $\hat{\theta}_{BLUE}$ in terms of the quantities appearing in the Bayesian linear model considered here?