# Exercise 2: Bayes Risk with Absolute Loss

## Question 0 (M)

**Propose a function $f: \mathbb{R} \to \mathbb{R}$ that has a zero derivative at some real value $x_0$, but $f(x_0)$ is not a local extremum of the function.**

Let us consider the function:

$$
f(x) = \begin{cases}
    -x^2 & \text{if } x < 0 \\
    x^2 & \text{if } x \geq 0
\end{cases}
$$

Let us show that $f'(0) = 0$ but $f(0)$ is not a local extremum.

**Proof:**

The right-hand derivative at $x=0$ is:
$$
\lim_{x \to 0^+} \frac{f(x) - f(0)}{x - 0} = \lim_{x \to 0^+} \frac{x^2 - 0}{x} = \lim_{x \to 0^+} x = 0
$$

The left-hand derivative at $x=0$ is:
$$
\lim_{x \to 0^-} \frac{f(x) - f(0)}{x - 0} = \lim_{x \to 0^-} \frac{-(x^2) - 0}{x} = \lim_{x \to 0^-} -x = 0
$$

Therefore, $f'(0) = 0$.

However, $f(0) = 0$ is not a local extremum. Indeed, in any neighborhood of $0$, $f(x)$ takes both positive and negative values (for $x > 0$, $f(x) > 0$; for $x < 0$, $f(x) < 0$). Thus, there is no neighborhood of $0$ in which $f(0)$ is either a maximum or a minimum.


## Question 1 (M + C):

Recall the definition of the median:

> The median $m$ of a real random variable $Y$ is any value such that $P(Y \leq m) \geq 0.5$ and $P(Y \geq m) \geq 0.5$.
> For a continuous distribution, the median $m$ satisfies $F_Y(m) = 0.5$ where $F_Y$ is the cumulative distribution function.

Let us show, with a concrete example, that the Bayes estimator for the squared loss (the conditional mean) is not, in general, optimal for the absolute loss.

**Example:**
Suppose $Y|X=x$ is a discrete random variable such that:
- $P(Y=0|X=x) = 0.1$
- $P(Y=1|X=x) = 0.9$

- The conditional mean is:
  $$
  f^*_{\text{squared}}(x) = \mathbb{E}[Y|X=x] = 0 \cdot 0.1 + 1 \cdot 0.9 = 0.9
  $$
- The conditional median is $1$ (since $P(Y < 1) = 0.1 < 0.5$ and $P(Y \leq 1) = 1 \geq 0.5$).

Now, compute the absolute risk for both estimators:

1. **Risk for $f^*_{\text{squared}}(x) = 0.9$:**
   $$
   \mathbb{E}[|Y - 0.9|] = 0.1 \cdot |0 - 0.9| + 0.9 \cdot |1 - 0.9| = 0.1 \cdot 0.9 + 0.9 \cdot 0.1 = 0.09 + 0.09 = 0.18
   $$

2. **Risk for $h(x) = 1$ (the median):**
   $$
   \mathbb{E}[|Y - 1|] = 0.1 \cdot |0 - 1| + 0.9 \cdot |1 - 1| = 0.1 \cdot 1 + 0.9 \cdot 0 = 0.1
   $$

Thus, the estimator $h(x) = 1$ (the median) has a strictly smaller risk for the absolute loss than the mean $0.9$.

**Conclusion:**
This example shows that the Bayes estimator for the squared loss is not, in general, the Bayes estimator for the absolute loss.



In [1]:
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Simulation parameters
n_samples = 100000

print("=== Exercise 2, Question 1: Simulation ===")
print("Verifying that Bayes estimator of the squared loss (the expectance of the conditional distribution) is not optimal for absolute loss")

# Generate data according to our example: Y takes value 0 with prob 0.1, value 1 with prob 0.9
Y_samples = np.random.choice([0, 1], size=n_samples, p=[0.1, 0.9])

print(f"\nGenerated {n_samples:,} samples")
print(f"Empirical probabilities: P(Y=0) = {np.mean(Y_samples == 0):.3f}, P(Y=1) = {np.mean(Y_samples == 1):.3f}")

# Define estimators
mean_estimator = 0.9  # Conditional mean from our theoretical calculation which corresponds to the Bayes estimator of the squared loss
median_estimator = 1   # Conditional median from our theoretical calculation which corresponds chosen estimator for comparison 

# Compute absolute losses
loss_mean = np.abs(Y_samples - mean_estimator)
loss_median = np.abs(Y_samples - median_estimator)

# Compute empirical risks (average absolute loss)
risk_mean = np.mean(loss_mean)
risk_median = np.mean(loss_median)

print(f"\n=== Results ===")
print(f"Empirical risk for mean estimator (0.9): {risk_mean:.4f}")
print(f"Empirical risk for median estimator (1): {risk_median:.4f}")

# Theoretical risks from our calculation
theoretical_risk_mean = 0.18
theoretical_risk_median = 0.1

print(f"\nTheoretical risks:")
print(f"Theoretical risk for mean estimator: {theoretical_risk_mean:.4f}")
print(f"Theoretical risk for median estimator: {theoretical_risk_median:.4f}")

print(f"\nVerification:")
print(f"✓ Median has lower empirical risk: {risk_median < risk_mean}")
print(f"✓ Empirical risks match theoretical:")
print(f"  - Mean estimator: {abs(risk_mean - theoretical_risk_mean):.4f} difference")
print(f"  - Median estimator: {abs(risk_median - theoretical_risk_median):.4f} difference")

print(f"\n=== Summary ===")
print(f"The simulation confirms that for absolute loss:")
print(f"1. The median estimator has lower risk than the mean estimator")
print(f"2. Empirical risks closely match theoretical calculations")
print(f"3. The Bayes estimator for squared loss (mean) is NOT optimal for absolute loss")

=== Exercise 2, Question 1: Simulation ===
Verifying that Bayes estimator of the squared loss (the expectance of the conditional distribution) is not optimal for absolute loss

Generated 100,000 samples
Empirical probabilities: P(Y=0) = 0.100, P(Y=1) = 0.900

=== Results ===
Empirical risk for mean estimator (0.9): 0.1802
Empirical risk for median estimator (1): 0.1002

Theoretical risks:
Theoretical risk for mean estimator: 0.1800
Theoretical risk for median estimator: 0.1000

Verification:
✓ Median has lower empirical risk: True
✓ Empirical risks match theoretical:
  - Mean estimator: 0.0002 difference
  - Median estimator: 0.0002 difference

=== Summary ===
The simulation confirms that for absolute loss:
1. The median estimator has lower risk than the mean estimator
2. Empirical risks closely match theoretical calculations
3. The Bayes estimator for squared loss (mean) is NOT optimal for absolute loss


## Question 2 (M):

Let $Y|X=x$ have a continuous density $p_{Y|X=x}(y)$ and a finite first moment. Show that the Bayes predictor for the absolute loss is the conditional median, i.e.,
$$
f^*_{\text{absolute}}(x) = \arg\min_{z \in \mathbb{R}} \mathbb{E}[|Y - z| \mid X = x]
$$
and that this $z$ is the median of $Y|X=x$.

**Proof:**
Let $g(z) = \mathbb{E}[|Y - z| \mid X = x] = \int_{-\infty}^{\infty} |y - z| p_{Y|X=x}(y) dy$.

To find the minimizer, we study the sign of $y-z$:
- For $y < z$, $|y-z| = -(y-z) = z-y$ and $\frac{d}{dz}|y-z| = 1$.
- For $y \geq z$, $|y-z| = y-z$ and $\frac{d}{dz}|y-z| = -1$.

So, the derivative of $g(z)$ is:
$$
\frac{d}{dz} g(z) = \int_{-\infty}^{z} 1 \cdot p_{Y|X=x}(y) dy + \int_{z}^{\infty} (-1) \cdot p_{Y|X=x}(y) dy
$$

Recall that the cumulative distribution function (CDF) of $Y|X=x$ is defined as:
$$
F_{Y|X=x}(z) = \int_{-\infty}^z p_{Y|X=x}(y) dy
$$

So, the first integral is $F_{Y|X=x}(z)$, and the second integral is $-(1 - F_{Y|X=x}(z))$ (since the total probability is 1):
$$
\frac{d}{dz} g(z) = F_{Y|X=x}(z) - (1 - F_{Y|X=x}(z)) = 2F_{Y|X=x}(z) - 1
$$

Set the derivative to zero to find the minimum:
$$
2F_{Y|X=x}(z) - 1 = 0 \implies F_{Y|X=x}(z) = 0.5
$$
Thus, the minimizer $z$ is the median of $Y|X=x$.