# Exercice 2 : Bayes risk with absolute loss

## Question 1

We use the exponential distribution to try split the two estimators for the l1 and l2 loss

In [5]:
import numpy as np
import scipy.stats as stats

# number of data points
n = 1000000

# X is uniformly distributed between 0 and 1
X = np.random.uniform(0, 1, n)

# Y|x follows an exponential distribution with rate parameter 1
Y = np.random.exponential(1, n)

# calculate conditional mean (fl2)
fl2 = np.mean(Y)

# calculate conditional median (fl1)
fl1 = np.median(Y)

# compute Rl1 for fl2
Rl1_fl2 = np.mean(np.abs(Y - fl2))

# compute Rl1 for fl1
Rl1_fl1 = np.mean(np.abs(Y - fl1))

print('Rl1(fl2):', Rl1_fl2)
print('Rl1(fl1):', Rl1_fl1)


Rl1(fl2): 0.7351313528651133
Rl1(fl1): 0.6924682246745301


Thus for this setup we have Rl1(fl1) < Rl1(fl2)

## Question 2:

To find the Bayes predictor under the L1 loss, we want to minimize the expected absolute deviation of $Y$ from a constant $z$ given $X = x$. Mathematically, we want to minimize the following expression:

$$g(z) = \int_{-\infty}^{\infty} |y - z| \cdot p_{Y|X=x}(y) \, dy$$

where $p_{Y|X=x}(y)$ is the conditional density function of $Y$ given $X = x$.

To minimize $g(z)$, we differentiate it and set the derivative equal to zero:

$$\frac{d}{dz} \left(\int_{-\infty}^{\infty} |y - z| \cdot p_{Y|X=x}(y) \, dy\right) = 0$$

Let's consider two cases:

1. $z < y$: In this case, $|y - z| = y - z$, and the derivative with respect to $z$ is $-\int_{z}^{\infty} p_{Y|X=x}(y) \, dy$.

2. $z > y$: In this case, $|y - z| = z - y$, and the derivative with respect to $z$ is $\int_{-\infty}^{z} p_{Y|X=x}(y) \, dy$.

Setting the derivative equal to zero for both cases, we get:

$$-\int_{z}^{\infty} p_{Y|X=x}(y) \, dy + \int_{-\infty}^{z} p_{Y|X=x}(y) \, dy = 0$$

Simplifying the equation gives:

$$\int_{-\infty}^{z} p_{Y|X=x}(y) \, dy = \int_{z}^{\infty} p_{Y|X=x}(y) \, dy$$

This equation essentially states that the area under the density function curve from $-\infty$ to $z$ is equal to the area from $z$ to $\infty$. Geometrically, this means that $z$ is the value that divides the conditional distribution of $Y$ into two equal halves.

Therefore, if this extremum is a minimum the Bayes predictor under the L1 loss is the median of the conditional distribution of $Y$ given $X = x$:

$$f^*_{l1}(x) = \text{median}(Y|X = x)$$


To determine whether the obtained solution is a minimum, we need to examine the second derivative of the function $g(z)$.

To find the second derivative, we differentiate $g(z)$ twice :

$$g''(z) = \frac{d^2}{dz^2} \left( \int_{-\infty}^{\infty} |y - z| \cdot p_{Y|X=x}(y) \, dy \right)$$

Let's consider two cases:

1. $z < y$: In this case, $|y - z| = y - z$, and the second derivative with respect to $z$ is $\int_{z}^{\infty} p_{Y|X=x}(y) \, dy$.

2. $z > y$: In this case, $|y - z| = z - y$, and the second derivative with respect to $z$ is $-\int_{-\infty}^{z} p_{Y|X=x}(y) \, dy$.

Adding the second derivatives from both cases, we get:

$$g''(z) = \int_{-\infty}^{z} p_{Y|X=x}(y) \, dy - \int_{z}^{\infty} p_{Y|X=x}(y) \, dy$$

Simplifying, we have:

$$g''(z) = \int_{-\infty}^{z} p_{Y|X=x}(y) \, dy + \int_{-\infty}^{z} p_{Y|X=x}(y) \, dy$$

Combining the two integrals, we obtain:

$$g''(z) = 2\int_{-\infty}^{z} p_{Y|X=x}(y) \, dy$$

Since $p_{Y|X=x}(y)$ is a probability density function, it is non-negative for all $y$. Therefore, the integral $\int_{-\infty}^{z} p_{Y|X=x}(y) \, dy$ is also non-negative.

Thus, we have shown that the second derivative $g''(z)$ is non-negative for all $z$. This implies that the obtained solution, which corresponds to the median, is a minimum for the function $g(z)$.

Therefore, the Bayes predictor under the L1 loss, $f^*_{l1}(x)$, is indeed the minimum of the expected absolute deviation and is represented by the median of the conditional distribution of $Y$ given $X = x$.
