## Exercice

### Théorie

**SCENARIO:**
>A company is trying to optimize its marketing strategy by predicting the number of units Y sold based on the amount of money in millions X invested in marketing campaigns. The company has a budget of 10 million

- if the company spends less 3 millions, than the number of units sold follows a normal distribution $N(2,1)$.
- if it spends more than 3 million, the units sold follows $N(5,2)$

**SETTING:**

- $ X \in [0,10] $
- $ Y \in \mathbb{R} $

- $ X \sim U[0,10] $
- $ Y \sim \begin{cases} 
N(2, 1) & \text{if } X < 3 \\
N(5, 2) & \text{if } X \geq 3 
\end{cases} $

- loss function: squared loss


**BAYES ESTIMATOR:**


The Bayes predictor $\hat{Y}(X)$ minimizes the expected squared loss, which is the conditional expectation $E[Y∣X]$.

$ Y \sim N(\mu, \sigma^2) $, the mean $\mathbb{E}[Y] = \mu$.
   - Therefore:
     - For $ X < 3 $, $\mathbb{E}[Y \mid X < 3] = 2$.
     - For $ X \geq 3 $, $\mathbb{E}[Y \mid X \geq 3] = 5$.


**CONDITIONAL RISK:**

- **For $ X < 3 $**:

  
  $$
  \begin{aligned}
    \mathbb{E}[l(Y, f^*(X))|X < x] &= \mathbb{E}[(Y - f^*(X))^2 \mid X < 3] \\
    &=\mathbb{E}[(Y - \mathbb{E}(Y|X))^2 \mid X < 3] \\
    &=\text{Var}(Y \mid X < 3) = 1 \\
  \end{aligned}$$

  
  Here, $ Y \sim N(2, 1) $, so the variance is 1.
  

- **For $ X \geq 3 $**:\
  In the same manner:

  $\mathbb{E}[l(Y, f^*(X))|X \geq x] = 2$

  Here, $ Y \sim N(5, 2) $, so the variance is 2.

**BAYES RISK:**\
Using the law of total expectations we have:

$$ 
\begin{aligned}
    R^* &=  E_{X,Y}[l(Y, f(X))] \\
    &= E(l(Y, f(X)) | X < 3)P(X < 3) + E(l(Y, f(X)) | X \geq 3)P(X \geq 3) \\
    &= \mathbb{E}[(Y - \hat{Y}(X))^2 \mid X < 3] P(X < 3) + \mathbb{E}[(Y - \hat{Y}(X))^2 \mid X \geq 3] P(X \geq 3) \\
    &= 1 * \frac{3}{10} + 2 * \frac{7}{10} \\
    &= 1,7
\end{aligned}
$$

**Conclusion:**

The bayes risk for this setting is: $1,7$

### Code

In [3]:
import numpy as np

""" 
    high number of samples to apply the law of large numbers:
    f* -> hat_f
"""
num_samples = 10000

rng = np.random.default_rng()

# create both Random Variables X and Y according to setting

X = rng.uniform(0, 10, num_samples)
Y = np.zeros(num_samples)
for i in range(num_samples):
    if X[i] < 3:
        Y[i] = rng.normal(2, 1)  
    else:
        Y[i] = rng.normal(5, np.sqrt(2))  


"""
    When computing the bayes estimator
    we want the estimator with the most probable outcome
    in our case it corresponds to the mean of the two normal laws
"""
f_hat_bayes = np.where(X < 3, 2, 5)

""" 
    random estimator following a normal distribution
"""
f_random = rng.normal(Y.mean(), np.std(Y))

"""
    calculate the square loss
"""
squared_loss_bayes = (Y - f_hat_bayes) ** 2
squared_loss_random = (Y - f_random) ** 2

"""
    calculate the empirical risk bayes
    and the risk of our random estimator
"""
bayes_risk = np.mean(squared_loss_bayes)
random_risk = np.mean(squared_loss_random)

print(f"Bayes Risk: {bayes_risk:.2f}")
print(f"Random Estimator Risk: {random_risk:.2f}")


Bayes Risk: 1.68
Random Estimator Risk: 3.74
