# `cumulative.pyx`

This notebook tests the `cumulative.pyx` module.

This module provides two python methods, `fast_cumulative`Â and `fast_log_cumulative`, that returns respectively the cumulative of a function and its logarithm. Since it is meant to be used on probability distributions, the passed function is assumed to be normalised. Therefore, these methods set $\mathrm{cdf}(x_{max}) = 1$. 

## `fast_cumulative`

We will compare the outcome of this method with the cdf of a distribution whose cdf has an analytical form:\
$
f(x) = 2x,\\F(x) = \int_0^x 2y dy = x^2,
$
for $x \in [0,1]$. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from figaro.cumulative import fast_cumulative

def cdf_2x(x):
    return x**2

def pdf(x):
    return 2*x

x   = np.linspace(0,1,10000)
dx  = x[1]-x[0]
p   = pdf(x)
cdf = cdf_2x(x)
cdf_figaro = fast_cumulative(p*dx)

fig, (ax,res) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex = True, figsize = (10,5))

ax.plot(x, cdf, ls = '--', lw = 0.8, label = "$x^2$")
ax.plot(x, cdf_figaro, ls = '-.', lw = 0.8, label = "$\mathrm{FIGARO}$")
ax.set_ylabel('$cdf(x)$')
ax.grid(True,dashes=(1,3))
ax.legend(loc = 0, frameon= False)
res.plot(x, cdf_figaro-cdf, ls = '--', color = 'k', lw = 0.3)
res.set_ylabel('$cdf_{F} - cdf$')
res.set_xlabel('$x$')
res.grid(True,dashes=(1,3))

np.alltrue(cdf==cdf_figaro)

The difference between the analytical cdf and the one obtained with our module is due to numerical integration error.
In general,\
$
e(n) = \sum_i^n f(x_i)\Delta x - \int_{x_i}^{x_i+\Delta n} f(y)dy = \sum_i^n f(x_i) - \Big[F(x_i + \Delta x) - F(x_i)\Big]\,,
$\
where $F(x) = \int f(x)$. Expanding $F(x)$ around $x_i$ we get:\
$
e(n) = \sum_i^n f(x_i) \Delta x - \Big[F(x_i) + f(x_i)\Delta x - F(x_i) + \sum_{j=1}^{\infty}\frac{f^{(j-1)}(x_i)}{j!}\Delta x^j\Big] = \sum_i^n \sum_{j=1}^{\infty}\frac{f^{(j-1)}(x_i)}{j!}\Delta x^j
$\
For polynomial pdfs, since the Taylor series contains a finite number of terms, we can correct for the integration error:\
$
e(n) = \sum_i^n \Big(2x_i\Delta x - \int_{x_i}^{x_i+\Delta n} 2ydy\Big) = \sum_i^n\Delta x^2 = n\Delta x^2\,.
$\
Accounting for this correction:

In [None]:
error = dx**2*np.arange(len(cdf))

plt.plot(x, cdf_figaro-cdf-error, ls = '--', color = 'k', lw = 0.3)
plt.ylabel('$cdf_{F} - cdf$')
plt.xlabel('$x$')
plt.grid(True,dashes=(1,3))

np.allclose(cdf+error, cdf_figaro, atol = 2e-15, rtol = 0)

Same exercise with the Gaussian cdf:

In [None]:
from scipy.stats import norm
x   = np.linspace(-10,10,10000)
dx  = x[1]-x[0]
p   = norm().pdf(x)
cdf = norm().cdf(x)
cdf_figaro = fast_cumulative(p*dx)

fig, (ax,res) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex = True, figsize = (10,5))

ax.plot(x, cdf, ls = '--', lw = 0.8, label = "$\mathrm{scipy}$")
ax.plot(x, cdf_figaro, ls = '-.', lw = 0.8, label = "$\mathrm{{FIGARO}}$")
ax.set_ylabel('$cdf(x)$')
ax.grid(True,dashes=(1,3))
ax.legend(loc = 0, frameon= False)
res.plot(x, cdf_figaro-cdf, ls = '--', color = 'k', lw = 0.3)
res.set_ylabel('$cdf_{F} - cdf$')
res.set_xlabel('$x$')
res.grid(True,dashes=(1,3))

In this case as well we have some discrepancy, due to the numerical integration error.\
First-order correction:

In [None]:
cdf_correction = fast_cumulative(-p*x/2.)

plt.plot(x, cdf_figaro-cdf-cdf_correction*dx**2, ls = '--', color = 'k', lw = 0.3)
plt.ylabel('$cdf_{F} - cdf$')
plt.xlabel('$x$')
plt.grid(True,dashes=(1,3))

## `fast_log_cumulative`

Same as above but takes as input the logarithm of a function and returns the logarithm of the cumulative.\
$f(x) = 2x$:

In [None]:
from figaro.cumulative import fast_log_cumulative

def logcdf_2x(x):
    return 2*np.log(x)

def logpdf(x):
    return np.log(2)+np.log(x)

x   = np.linspace(0,1,10000)[1:]
dx  = x[1]-x[0]
p   = logpdf(x)
cdf = logcdf_2x(x)
cdf_figaro = fast_log_cumulative(p+np.log(dx))

fig, (ax,res) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex = True, figsize = (10,5))

ax.plot(x, cdf, ls = '--', lw = 0.8, label = "$x^2$")
ax.plot(x, cdf_figaro, ls = '-.', lw = 0.8, label = "$\mathrm{FIGARO}$")
ax.set_ylabel('$\log(cdf(x))$')
ax.grid(True,dashes=(1,3))
ax.legend(loc = 0, frameon= False)
res.plot(x, np.exp(cdf_figaro) - np.exp(cdf), ls = '--', color = 'k', lw = 0.3)
res.set_ylabel('$cdf_{F} - cdf$')
res.set_xlabel('$x$')
res.grid(True,dashes=(1,3))

np.alltrue(cdf==cdf_figaro)

Correcting for the integration error:

In [None]:
error = dx**2*np.arange(1,len(cdf)+1)

plt.plot(x, np.exp(cdf_figaro)-np.exp(cdf)-error, ls = '--', color = 'k', lw = 0.3)
plt.ylabel('$cdf_{F} - cdf$')
plt.xlabel('$x$')
plt.grid(True,dashes=(1,3))

Gaussian distribution:

In [None]:
from figaro.cumulative import fast_log_cumulative

x   = np.linspace(-10,10,10000)
dx  = x[1]-x[0]
p   = norm().logpdf(x)
cdf = norm().logcdf(x)

cdf_figaro = fast_log_cumulative(p+np.log(dx))

fig, (ax,res) = plt.subplots(2, 1, gridspec_kw={'height_ratios': [3, 1]}, sharex = True, figsize = (10,5))

ax.plot(x, cdf, ls = '--', lw = 0.8, label = "$\mathrm{scipy}$")
ax.plot(x, cdf_figaro, ls = '-.', lw = 0.8, label = "$\mathrm{FIGARO}$")
ax.set_ylabel('$\log(cdf(x))$')
ax.grid(True,dashes=(1,3))
ax.legend(loc = 0, frameon = False)
res.plot(x, np.exp(cdf_figaro) - np.exp(cdf), ls = '--', color = 'k', lw = 0.3)
res.set_ylabel('$cdf_{F} - cdf$')
res.set_xlabel('$x$')
res.grid(True,dashes=(1,3))