# Fórmula de propagación de errores

Consideramos una variable normal bivariada $X = (X_1, X_2) $ y una variable aleatoria univariada $Y$ obtenida a partir de $X$, $Y = X_1 \, \exp(X_2)$. Conocida la media y la matriz de covarianza de $X$ calculamos la media y varianza de $Y$ mediante la fórmula de propagación de errores. Comparamos los resultados con simulaciones.

Media de $X$

In [None]:
import numpy as np
x1_mean = 1
x2_mean = -1
x_mean = np.array([x1_mean, x2_mean])
x_mean

Matriz de covarianza de $X$

In [None]:
x1_sigma = 0.056

Probamos por separado los casos de errores chicos y grandes en $X_2$.

In [None]:
# Seleccionar una de las dos opciones
# Caso errores chicos en X₂
x2_sigma = 0.048
# Caso errores grandes en X₂
#x2_sigma = 0.48

In [None]:
import danatools
correlation =  -0.85
covariance_x = danatools.covariance_matrix_2d(x1_sigma, x2_sigma, correlation)
covariance_x

## Varianza analítica de la variable Y

In [None]:
def function(x1, x2):
    return x1 * np.exp(x2)

Media de $Y$

In [None]:
y_mean_ana = function(x1_mean, x2_mean)
y_mean_ana

Varianza de $Y$

In [None]:
import math
x1_gradient = math.exp(x2_mean)
x1_gradient

In [None]:
x2_gradient = x1_mean * math.exp(x2_mean)
x2_gradient

In [None]:
y_variance_ana = x1_gradient**2 * covariance_x[0,0] + x2_gradient**2 * covariance_x[1,1] + 2 * x1_gradient * x2_gradient *  covariance_x[0,1]
y_variance_ana

In [None]:
y_sigma_ana = math.sqrt(y_variance_ana)
y_sigma_ana

## Varianza con bootstrapping

In [None]:
x_mean

In [None]:
from scipy.stats import multivariate_normal
population = 1000000
rng = np.random.default_rng(seed=6870)
x_data = multivariate_normal.rvs(x_mean, covariance_x, size=population, random_state=rng)
x_data

In [None]:
y_data = function(x_data[:,0], x_data[:,1])
y_data

In [None]:
y_mean_sim = y_data.mean()
y_mean_sim

In [None]:
y_sigma_sim = y_data.std(ddof=1)
y_sigma_sim

## Comparación fórmula analítica con simulaciones

In [None]:
delta_mean = (y_mean_ana / y_mean_sim - 1) 
print(f"Relative difference of the mean: {delta_mean*100:.2f}%")

In [None]:
delta_sigma = (y_sigma_ana / y_sigma_sim - 1) 
print(f"Relative difference of the standard deviation: {delta_sigma*100:.2f}%")

## Plot de la PDF conjunta de $X$

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
fig, ax = plt.subplots()
ax.set_xlabel("$X_1$")
ax.set_ylabel("$X_2$")

height_limits = [x1_mean-3*x1_sigma, x1_mean+3*x1_sigma]
weight_limits = [x2_mean-3*x2_sigma, x2_mean+3*x2_sigma]
counts, xedges, yedges, im = ax.hist2d(x_data[:,0], x_data[:,1], range=[height_limits, weight_limits], bins=100, density=True, cmap=cm.viridis)
clb = plt.colorbar(im)

ax.plot(*danatools.get_ellipse(x_mean, covariance_x, nsigma=1), color='tab:red', ls='--')
ax.plot(*danatools.get_ellipse(x_mean, covariance_x, nsigma=2), color='tab:red', ls='--')

## Plot de la PDF de $Y$

In [None]:
fig, ax = plt.subplots()
ax.set_xlabel("$Y$")
ax.set_ylabel(" Probability density")

limits = [y_mean_ana-3*y_sigma_ana, y_mean_ana+3*y_sigma_ana] 
ax.hist(y_data, range=limits, bins=100, density=True, label="Simulation")

from scipy.stats import norm
x = np.linspace(*limits, 100)
y = norm.pdf(x, loc=y_mean_ana, scale=y_sigma_ana)
plt.plot(x, y, label="Analytical")

ax.legend()

## Validez de la aproximación

Comparamos el desarrollo lineal con $f(X_1, X_2)$ en el intervalo $X_2 \in \mu_2 \pm \sigma_2$. No consideramos $X_1$ porque $f(X_1, X_2)$ ya es lineal en esta variable.

In [None]:
fig, ax = plt.subplots()
ax.set_xlabel("$X_2$")
ax.set_ylabel("$Y$")

x2_min = -2
x2_max = 0
x2 = np.linspace(x2_min, x2_max)
fx2 = function(x1_mean, x2)
plt.plot(x2, fx2, label="$f(x_2)$")

fapprox = y_mean_ana + x2_gradient * (x2 - x2_mean)
plt.plot(x2, fapprox, label="Approx")

ax.axvline(x2_mean, ls='--', label="$\mu_2$")
ax.axvspan(x2_mean-x2_sigma, x2_mean+x2_sigma, alpha=0.25, label="1σ")

ax.legend()

La fórmula de propagación de errores es válida si la aproximación lineal de la función $Y = f(X)$ en el entorno alrededor de $X = \mu_X$. En la práctica se puede considerar este entorno como algún intervalo de unos pocos σ ya que contiene la mayoría de la probabilidad de $X$. 