<h1>Empirical estimation of $f$-divergences<span class="tocSkip"></span></h1>

Author: [Sylvain Combettes](https://github.com/sylvaincom).

Last update: Jan 29, 2020.

---
This notebook deals with the empirical estimation of $f$-divergences and completes my report on the _Comparison of empirical probability distributions_. Three $f$-divergences are dealt with: Kullback-Leibler divergence, Helligence distance and Variational distance. As IPMs (integral probability metrics) work on samples drawn from the probability distributions, 𝑓-divergences work on probability distributions.

<br/>

<div class="alert alert-info"><h4>README<span class="tocSkip"></span></h4><p>
The best way to open this Jupyter Notebook is to use the table of contents from the extensions called <code>nbextensions</code>. See <a href="https://towardsdatascience.com/4-awesome-tips-for-enhancing-jupyter-notebooks-4d8905f926c5">4 Awesome Tips for Enhancing Jupyter Notebooks</a> by George Seif.
    
The Python version is 3.7.3.
</p></div>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#KL-divergence" data-toc-modified-id="KL-divergence-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>KL divergence</a></span><ul class="toc-item"><li><span><a href="#Defining-our-generic-function" data-toc-modified-id="Defining-our-generic-function-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Defining our generic function</a></span></li><li><span><a href="#Running-several-simulations-to-interpret-$D_{\text{KL}}$" data-toc-modified-id="Running-several-simulations-to-interpret-$D_{\text{KL}}$-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Running several simulations to interpret $D_{\text{KL}}$</a></span><ul class="toc-item"><li><span><a href="#Comparing-two-normal-distributions" data-toc-modified-id="Comparing-two-normal-distributions-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Comparing two normal distributions</a></span></li></ul></li></ul></li><li><span><a href="#Hellinger-distance" data-toc-modified-id="Hellinger-distance-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Hellinger distance</a></span></li><li><span><a href="#Variational-distance" data-toc-modified-id="Variational-distance-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Variational distance</a></span></li></ul></div>

<h2> Imports<span class="tocSkip"></span></h2>

In [1]:
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()

We configure the size of the plots:

In [2]:
plt.rcParams["figure.figsize"] = (8,6)

# KL divergence

This section is inspired from [KL Divergence Python Example](https://towardsdatascience.com/kl-divergence-python-example-b87069e4b810).

## Defining our generic function

We define our `kl_divergence` function using functions from `numpy`. We are careful with the result $\lim\limits_{x \rightarrow 0} \log(x) = -\infty$.

In [3]:
def kl_divergence(p, q):
    """
    Kullback-Leibler divergence of two (empirical) probability distributions.
    
    Parameters
    ----------
    p : numpy.ndarray
        Vector of the values of the first (discrete) probability distribution.
    q : numpy.ndarray
        Vector of the values of the second (discrete) probability distribution.
    
    Returns
    -------
    res : numpy.float64
        Result of the computation of the Kullback-Leibler divergence of p from q.
    """
    
    res = np.sum(np.where(p!=0, p*np.log(p/q), 0))
    
    return res

## Running several simulations to interpret $D_{\text{KL}}$

### Comparing two normal distributions

Now, we are going to plot two normal distributions using [Scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) and compute their KL divergence, for several values of mean and standard deviation. The goal is to see if the interpretation of the KL divergence corresponds to our intuition.

We define a plotting function:

In [None]:
def plot_normal_divergence(m_p, sd_p, m_q, sd_q, f_divergence):
    """
    Plotting two normal distributions and computing their f-divergence.
    
    Parameters
    ----------
    m_p : float
        Mean of the normal distribution p.
    sd_p : float
        Standard deviation of the normal distribution p.
    m_q : float
        Mean of the normal distribution p.
    sd_q : float
        Standard deviation distribution p.
    f_divergence : {kl_divergence, hellinger_distance, varational_distance}
        Function that computes the f-divergence we choose.
        
    Plots
    -------
    Plots (on the same figure) the two normal distributions and their f-divergence in the title.
    """
    
    x = np.arange(-10, 10, 0.001) # x-axis of our plot
    p = norm.pdf(x, m_p, sd_p) # first normal distribution of mean m_p and standard deviation sd_p
    q = norm.pdf(x, m_q, sd_q) # second normal distribution of mean m_q and standard deviation sd_q
    
    plt.title('The %s of $p$ from $q$ is %1.3f \n (with $p$ and $q$ normal distributions)'
              % (f_divergence.__name__, f_divergence(p, q)))
    plt.plot(x, p)
    plt.plot(x, q, c='red')
    txt1 = '$m_p = %1.1f$ and $sd_p = %1.0f$' % (m_p, sd_p)
    txt2 = '$m_q = %1.1f$ and $sd_q = %1.0f$' % (m_q, sd_q)
    plt.legend([txt1, txt2])
    plt.show()

We call our plotting function:

In [None]:
plot_normal_divergence(0, 2, 1, 2, kl_divergence)
plot_normal_divergence(0, 2, 2, 2, kl_divergence)
plot_normal_divergence(0, 2, 4, 2, kl_divergence)
plot_normal_divergence(0, 2, 0, 1, kl_divergence)
plot_normal_divergence(0, 2, 0, 3, kl_divergence)
plot_normal_divergence(0, 2, 0, 4, kl_divergence)
plot_normal_divergence(0, 2, 4, 1, kl_divergence)
plot_normal_divergence(0, 2, 4, 2, kl_divergence)
plot_normal_divergence(0, 2, 4, 4, kl_divergence)

It’s important to note that the KL divergence is not symmetrical. In other words, if we switch `p` for `q` and vice versa, we get a different result.

In [None]:
plot_normal_divergence(5, 4, 0, 2, kl_divergence)

The lower the KL divergence, the closer the two distributions are to one another.

# Hellinger distance

In [None]:
def hellinger_distance(p, q):
    return np.sum((np.sqrt(p)-np.sqrt(q))**2)

In [None]:
plot_divergence(0, 2, 1, 2, hellinger_distance)
plot_divergence(0, 2, 2, 2, hellinger_distance)
plot_divergence(0, 2, 4, 2, hellinger_distance)
plot_divergence(0, 2, 0, 1, hellinger_distance)
plot_divergence(0, 2, 0, 3, hellinger_distance)
plot_divergence(0, 2, 0, 4, hellinger_distance)
plot_divergence(0, 2, 4, 1, hellinger_distance)
plot_divergence(0, 2, 4, 2, hellinger_distance)
plot_divergence(0, 2, 4, 4, hellinger_distance)

# Variational distance

In [None]:
def variational_distance(p, q):
    return np.sum(np.abs(p-q))

In [None]:
plot_divergence(0, 2, 1, 2, variational_distance)
plot_divergence(0, 2, 2, 2, variational_distance)
plot_divergence(0, 2, 4, 2, variational_distance)
plot_divergence(0, 2, 0, 1, variational_distance)
plot_divergence(0, 2, 0, 3, variational_distance)
plot_divergence(0, 2, 0, 4, variational_distance)
plot_divergence(0, 2, 4, 1, variational_distance)
plot_divergence(0, 2, 4, 2, variational_distance)
plot_divergence(0, 2, 4, 4, variational_distance)