# Exercise 1: Bayes Estimator and Bayes Risk

## Question 1 (M)

**Propose a supervised learning setting: input space X, output space Y, a random variable (X,Y) with a joint distribution, and a loss function l(x,y). Compute the Bayes predictor $f^*$ : X → Y and the Bayes risk associated with this setting.**

### 1. Supervised Learning Setting

*   **Input Space X:** The number of babies on an airline's planes over a 1-year period.
    *   Units: Number of babies (dimensionless).
    *   Distribution: We model $X$ as a Poisson random variable, $X \sim \mathcal{P}(\lambda)$, with $\lambda = 1,000,000$. The probability mass function is $P(X=x) = \frac{e^{-\lambda}\lambda^x}{x!}$ for $x = 0, 1, 2, ...$

*   **Output Space Y:** The number of annoyed passengers.
    *   Units: Number of annoyed passengers (dimensionless).
    *   Conditional Distribution: Given $X=x$ (the number of babies), we model $Y$ as a Binomial random variable, $Y|X=x \sim \mathcal{B}(n_x, p)$.
        *   $n_x = 10x$: The number of passengers considered "exposed" or "nearby" to babies is 10 times the number of babies.
        *   $p = 0.4$: The probability that an "exposed" passenger becomes annoyed.
    *   The probability mass function for $Y$ given $X=x$ is $P(Y=y|X=x) = \binom{n_x}{y} p^y (1-p)^{n_x-y} = \binom{10x}{y} (0.4)^y (0.6)^{10x-y}$ for $y = 0, 1, ..., 10x$.

*   **Joint Distribution P(X,Y):**
    *   $P(X=x, Y=y) = P(Y=y|X=x)P(X=x)$
    *   $P(X=x, Y=y) = \left(\binom{10x}{y} (0.4)^y (0.6)^{10x-y}\right) \cdot \left(\frac{e^{-\lambda}\lambda^x}{x!}\right)$
    *   with $\lambda = 10^6$.

*   **Loss Function $l(y_{pred}, y_{actual})$:** We choose the squared loss function.
    *   $l(y_{pred}, y_{actual}) = (y_{pred} - y_{actual})^2$

### 2. Bayes Predictor $f^*(x)$

For the squared loss function, the Bayes predictor $f^*(x)$ is the conditional expectation of $Y$ given $X=x$:
$$f^*(x) = E[Y|X=x]$$

Given $X=x$, $Y$ follows a Binomial distribution $\mathcal{B}(n_x, p)$ where $n_x = 10x$ and $p=0.4$.
The expectation of a Binomial distribution $\mathcal{B}(n,p)$ is $np$.
Therefore,
$$f^*(x) = n_x \cdot p = (10x) \cdot p$$
Substituting $p=0.4$:
$$f^*(x) = 10x \cdot 0.4 = 4x$$

The Bayes predictor is $f^*(x) = 4x$. This means the best prediction for the number of annoyed passengers is 4 times the number of babies observed.

### 3. Bayes Risk $R^*$

The Bayes risk $R^*$ is the expected value of the loss function when using the Bayes predictor $f^*(X)$. For the squared loss, this is the expectation of the conditional variance of $Y$ given $X$:
$$R^* = E_{(X,Y)}[l(f^*(X), Y)] = E_X[E_{Y|X}[(f^*(X) - Y)^2 | X]]$$
$$R^* = E_X[Var(Y|X=x)]$$

Given $X=x$, $Y \sim \mathcal{B}(n_x, p)$ with $n_x = 10x$ and $p=0.4$.
The variance of a Binomial distribution $\mathcal{B}(n,p)$ is $np(1-p)$.
So, $Var(Y|X=x) = n_x \cdot p \cdot (1-p) = (10x) \cdot p \cdot (1-p)$.
Substituting $p=0.4$ and $(1-p)=0.6$:
$$Var(Y|X=x) = 10x \cdot 0.4 \cdot 0.6 = 10x \cdot 0.24 = 2.4x$$

Now we compute the expectation of this conditional variance over the distribution of $X$:
$$R^* = E_X[Var(Y|X=X)] = E_X[2.4X]$$
Since $E_X[\cdot]$ is an expectation with respect to the distribution of $X$ (which is $\mathcal{P}(\lambda)$), and $2.4$ is a constant:
$$R^* = 2.4 \cdot E[X]$$
The expectation of a Poisson distribution $X \sim \mathcal{P}(\lambda)$ is $E[X] = \lambda$.
Given $\lambda = 1,000,000$:
$$R^* = 2.4 \cdot \lambda = 2.4 \cdot 1,000,000 = 2,400,000$$

**Conclusion for Question 1:**

*   **Input Space X:** Number of babies, $X \sim \mathcal{P}(\lambda=10^6)$.
*   **Output Space Y:** Number of annoyed passengers. Conditional on $X=x$, $Y|X=x \sim \mathcal{B}(n_x=10x, p=0.4)$.
*   **Loss Function:** Squared loss, $l(y_{pred}, y_{actual}) = (y_{pred} - y_{actual})^2$.
*   **Bayes Predictor $f^*(x)$:** $f^*(x) = 4x$.
*   **Bayes Risk $R^*$:** $R^* = 2.4\lambda = 2,400,000$.
