# Mutual Information (MI)

Saleh Rezaeiravesh, saleh.rezaeiravesh@manchester.ac.uk
___

## Mutual Information (MI)

[Shannon 1948](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf): MI can be used to quantify the overlap of information content of two systems/variables.  

### MI for Random Variables (RVs)
Consider two continuous **random variables** $X$ and $Y$, for which we have $n$ jointly observed $(x_i,y_i)$. If the joint PDF of $X$ and $Y$ is $f(x,y)$, then the mutual information between $X$ and $Y$ is, 

$$
I(X,Y) =\int\int p(x,y)\ln( \frac{p(x,y)}{p_x(x)p_y(y)}) dx dy
$$

### MI for Time Series (TS)
* In contrast for uncorrelated RVs $X, Y$, where the mutual information is a symmetric quantity, for $x(t)$ and $y(t)$, the MI can be **asymmetric**. 

* MI measure **nonlinear** dependency between time series or RVs.

* According to [Schreiber, 2000](https://arxiv.org/pdf/nlin/0001042), the MI at lag $\tau$ for two time series $x_t$ and $y_t$ is defined as, 

$$
M_{xy}(\tau) = - \sum_{k} p(x_k,y_{k-\tau}) \log\frac{p(x_k,y_{k-\tau})}{p(x_k)p(y_k)}
$$


**Note:** Mutual information (MI) is still a symmetric metric. 

## Methods for Estimating MI

### KDE/Binning methods
Our aim is to esimtate $\hat{I}(X,Y)$ from a set of finite observed samples $z_i=(x_i,y_i)$, $i=1,2,\cdots,n$.

In many text books, e.g. [this one](http://staff.ustc.edu.cn/~cgong821/Wiley.Interscience.Elements.of.Information.Theory.Jul.2006.eBook-DDU.pdf), the **mutual information (MI)** is written as, 

$$
I(X,Y) = H(X) + H(Y) - H(X,Y)
$$


where, 
* $H(X)$ and $H(Y)$ are the **marginal entropies**, and
* $H(X,Y)$ is the **joint entropy**

The above three entropy can be estimated using KDE or Binning methods, for both RVs and TS. 

For time series, the above expression reads as, 

$$
I(x_k,y_{k-\tau}) = H(x_k) + H(y_k) - H(x_k,y_{k-\tau})
$$

We can extend the KDE/Binning methods to compute $I(X,Y)$ by:
1. Estimate the marginal PDFs of $X$ and $Y$
2. Estimate the joint PDF of $X$ and $Y$
3. Numerically compute the above double integral

### KSG (KL/KNN-based) method:


As detailed in [Kraskov-Stogbauer-Grassberger (KSG), 2004](https://arxiv.org/pdf/cond-mat/0305641), the error involved in estimation of each separate $H$ using KL/KNN method are not cencelled by each others' and would propagate into the estimated $I(X,y)$:

$$
\hat{H}(X) = -\psi(k)+\psi(N)+\ln c_{d_x} +\frac{d_x}{n}\sum_{i=1}^n \ln \epsilon_i
$$

$$
\hat{H}(Y) = -\psi(k)+\psi(N)+\ln c_{d_y} +\frac{d_y}{n}\sum_{i=1}^n \ln \epsilon_i
$$

$$
\hat{H}(X,Y) = -\psi(k)+\psi(N)+\ln (c_{d_x}c_{d_y}) +\frac{d_x+d_y}{n}\sum_{i=1}^n \ln \epsilon_i
$$

This reference proposed alternative approach based on the KL/KNN estimator. We call this **KSG Estimator**. They proposed two estimators, where we use the 1st one. 

$$
I^{(1)}(X,Y) = \psi(k) + \psi(n) - \langle \psi(n_x)+\psi(n_y)\rangle
$$

$$
I^{(2)}(X,Y) = \psi(k) + \psi(n) - 1/k - \langle \psi(n_x+1)+\psi(n_y+1)\rangle
$$

The above expressions can be evaluated by the KNN method, for both RVs and time series. 

## Validation of MI estimators for RVs

## Validation of MI estimators for time series