**&#x1F516;** **&#x0031;)** On the Beneficial Bias of MMSE Estimation

Consider the Bayesian linear model $Y = H \theta+V $ with $\theta \sim N(0,C_{\theta\theta})$  and $V \sim N(0,C_{VV} )$ independent (we consider here $m_{\theta} = 0$ for simplicity) .

In the Bayesian linear model you've described, $Y = H\theta + V$, where $Y$ is the observed data, $H$ is the design matrix that relates the unknown parameter vector $\theta$ to the observations, and $V$ represents the noise in the observations. The parameters $\theta$ and the noise $V$ are assumed to follow Gaussian distributions, specifically:

- $\theta \sim N(0, C_{\theta\theta})$, indicating that $\theta$ has a multivariate normal distribution with mean vector $0$ (indicating $m_{\theta} = 0$ for simplicity) and covariance matrix $C_{\theta\theta}$.
- $V \sim N(0, C_{VV})$, indicating that the noise $V$ also follows a multivariate normal distribution with mean $0$ and covariance matrix $C_{VV}$. Furthermore, $\theta$ and $V$ are assumed to be independent.

Given these assumptions, we can proceed to derive the posterior distribution of $\theta$ given the observed data $Y$, as well as expressions for the Bayesian Linear Minimum Mean Square Error (LMMSE) estimator for $\theta$.

### Posterior Distribution of $\theta$

The posterior distribution of $\theta$ given $Y$ can be derived using Bayes' theorem, considering the Gaussian nature of the prior and likelihood. The posterior is also Gaussian due to the conjugacy between the Gaussian prior and likelihood in linear models. The posterior distribution parameters (mean and covariance) can be computed as follows:

#### Posterior Mean

$
\hat{\theta}_{\text{posterior}} = (H^T C_{VV}^{-1} H + C_{\theta\theta}^{-1})^{-1} H^T C_{VV}^{-1} Y
$

#### Posterior Covariance

$
C_{\theta|Y} = (H^T C_{VV}^{-1} H + C_{\theta\theta}^{-1})^{-1}
$

### Bayesian LMMSE Estimator

The Bayesian LMMSE estimator for $\theta$ seeks to minimize the expected squared error given the prior information. This estimator coincides with the posterior mean of $\theta$ given $Y$, which is $\hat{\theta}_{\text{posterior}}$.

### Key Properties

- **Optimality**: The Bayesian LMMSE estimator is optimal in the sense that it minimizes the mean squared error considering both the noise in the observations and the prior distribution of the parameters.
- **Incorporation of Prior Information**: The Bayesian approach allows for the incorporation of prior information about $\theta$ through $C_{\theta\theta}$, enhancing estimation accuracy, especially when the observed data $Y$ is limited or noisy.
- **Update with New Information**: As new data becomes available, the posterior distribution (and hence the Bayesian LMMSE estimate) can be updated, reflecting a Bayesian learning process.

This model exemplifies how Bayesian inference is applied in linear models, leveraging the Gaussian assumptions to obtain analytical solutions for the posterior distribution and the LMMSE estimator.

&#x1F516; (&#x0061;) The LMMSE estimator is:

$$\hat{\theta}_{LMMSE} = C_{\theta Y} C_{YY}^{−1}Y = (C_{\theta\theta}^{−1} + H^T C_{VV}^{−1} H)^{−1} H^T C_{VV}^{−1} Y$$

The expression provided for the Linear Minimum Mean Square Error (LMMSE) estimator, $\hat{\theta}_{LMMSE}$, is a key result in the context of Bayesian linear regression, particularly when dealing with Gaussian distributions for both the parameters and the observation noise. This formula elegantly captures how prior knowledge about the parameters (expressed through the covariance matrix $C_{\theta\theta}$) and the observations' noise characteristics (expressed through the covariance matrix $C_{VV}$) are utilized to estimate the parameters $\theta$ from the observed data $Y$.

### Breaking Down the Formula

The LMMSE estimator is given by:

$
\hat{\theta}_{LMMSE} = (C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1} H^T C_{VV}^{-1} Y
$

- **$C_{\theta\theta}$**: The prior covariance matrix of the parameters $\theta$. It reflects our prior belief about the variance and covariance of the parameters before observing any data.
  
- **$H$**: The design matrix that relates the parameters $\theta$ to the observations $Y$. It models how each parameter contributes to the observed data.
  
- **$C_{VV}$**: The covariance matrix of the observation noise $V$. It characterizes the uncertainty or variability in the observations due to noise.
  
- **$Y$**: The vector of observed data.

### Interpretation

- The term $C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H$ combines information from both the prior distribution of $\theta$ and the likelihood of observing $Y$ given $\theta$. This reflects a fusion of prior knowledge and observed data, where $C_{VV}^{-1}$ weights the contribution of each observation according to its reliability (inversely proportional to its variance).
  
- The product $H^T C_{VV}^{-1} Y$ can be seen as a weighted version of the observed data, where observations with lower variance (higher reliability) are given more weight.
  
- The entire expression $(C_{\theta\theta}^{-1} + H^T C_{VV}^{-1} H)^{-1}$ acts as a normalization or scaling factor that ensures the estimates $\hat{\theta}_{LMMSE}$ are appropriately scaled given the combined effects of the prior and the data.

### Key Takeaways

- The LMMSE estimator not only minimizes the mean squared error between the estimated and true parameters but also optimally balances the influence of prior knowledge and observed data.
  
- It is particularly powerful in situations where the observed data $Y$ is noisy or sparse, as it leverages the structure of the problem (through $H$) and prior information (through $C_{\theta\theta}$) to improve estimation accuracy.
  
- This estimator is a cornerstone of Bayesian inference in linear models, illustrating how Bayesian methods update prior beliefs to posterior beliefs in light of new data.

Considering :
$$
f(y|\lambda,\alpha,\beta) = 
\left\{\!\begin{aligned}
&0 &, y < \alpha \\
&\gamma e ^{-\lambda y} &, \alpha \leq y \leq \beta   \\
&0 &, \beta < y
\end{aligned}\right\} \\
= \gamma e ^{-\lambda y} 1_{[\alpha,\beta]} (y)
$$

where $\gamma$ is a normalization constant and

$$
1_\mathcal{A}(y) =
\begin{cases}
&1 &, y \in \mathcal{A} \\
&0 &, y \notin \mathcal{A}
\end{cases}
$$

is the indicator function for the set $\mathcal{A}$

To ensure the truncated exponential distribution $ f(y|\lambda,\alpha,\beta) $ is properly normalized, we need to determine the value of the normalization constant $\gamma$. The probability density function (pdf) needs to integrate to 1 over the interval $[\alpha, \beta]$ to satisfy the property of a probability distribution.

Given:

$$
f(y|\lambda,\alpha,\beta) = \gamma e^{-\lambda y} 1_{[\alpha,\beta]}(y)
$$

To find $\gamma$, we solve:

$$
\int_{\alpha}^{\beta} \gamma e^{-\lambda y} dy = 1
$$

### Solution

Performing the integration:

$$
\begin{align*}
\int_{\alpha}^{\beta} \gamma e^{-\lambda y} dy &= \left[ -\frac{\gamma}{\lambda} e^{-\lambda y} \right]_{\alpha}^{\beta} \\
&= -\frac{\gamma}{\lambda} \left( e^{-\lambda \beta} - e^{-\lambda \alpha} \right) \\
&= \frac{\gamma}{\lambda} \left( e^{-\lambda \alpha} - e^{-\lambda \beta} \right) = 1
\end{align*}
$$

Solving for $\gamma$, we find:

$
\gamma = \frac{\lambda}{e^{-\lambda \alpha} - e^{-\lambda \beta}}
$

This $\gamma$ ensures that the pdf of the truncated exponential distribution integrates to 1 over the interval $[\alpha, \beta]$, making it a valid probability distribution.

The presence of the indicator function $1_{[\alpha,\beta]}(y)$ in the definition of the pdf explicitly enforces that the distribution is zero outside the interval $[\alpha, \beta]$, ensuring the distribution is properly truncated.