# Information Theory Tutorial

# background

## KNN mutual information estimator method
All estimators are computed using Algorithm 1 from [Kraskov](https://arxiv.org/abs/cond-mat/0305641), et al. In this method, local subspace densities are estimated by counting the number of nearest neighbors around each point. From this, the entropy is estimated and then the mutual information. In this package, we can compute the mutual information between the following pairs of random variables:

* $Z = (X,Y)$ where $X$ and $Y$ are one-dimensional scalar random variables
* $Z = (\theta_1,\theta_2)$ where $\theta_1$ and $\theta_2$ are one-dimensional angular random variables
* $Z = (X,\theta)$ where $X$ is a one-dimensional scalar random variable and $\theta$ is a one-dimensional angular random variable
* $Z = ((X_1,Y_1),(X_2,Y_2))$ where $(X_1, Y_1)$ and $(X_2,Y_2)$ are two-dimensional scalar random variables

## time-lagged mutual information
Time-lagged mutual information is an intermediate statistic between pure mutual information (time-independent) and transfer entropy (time-dependent).
To compute time-lagged mutual information between two random time-dependent variables (or random process) or a time-dependent random variable and itself, we order both random variable or a random variable and a copy of itself. We sample the first variable and record the timestamps, then sample the second variable at the same timesteps plus the timelag. Compute mutual information using these two random variables and vary the timelag to determine significant timescales. 
* $Z(X,X + \tau)$ or $Z(X,Y + \tau)$ for a timelag of length $\tau$

In [7]:
import info_theory as it
import numpy as np
from scipy.special import digamma



## compare Gaussian true mutual information to KNN-estimated mutual information, when the mean is 0 and the variance is unity

In [8]:
reps = 10 #number of repetitions
sample_size = 100 #sample size

mu = [0,0] # expectation
sigma = [[1,0.9],[0.9,1]] #covariance matrix

MI_Gauss=[]

for i in range(0, reps):

    np.random.seed(None)
    Gauss = np.random.multivariate_normal(mu, sigma, sample_size)
    MI_Gauss = np.append(MI_Gauss, it.compute_MI_scalar(data = Gauss, K=2))

In [9]:
MI_Gauss

array([0.85564702, 0.98002917, 0.9707217 , 0.96236096, 1.07428578,
       0.96342016, 0.76238563, 0.73054673, 1.1193387 , 0.91194191])

## compute true mutual information

In [10]:
def true_MI_Gauss(sigma):
    return(-0.5*np.log(1-sigma**2))


In [14]:
true_MI = true_MI_Gauss(0.9)

In [16]:
error = np.abs(true_MI- MI_Gauss)
error

array([0.02528142, 0.14966357, 0.1403561 , 0.13199535, 0.24392018,
       0.13305456, 0.06797998, 0.09981887, 0.2889731 , 0.0815763 ])

## notes
You will find that the algorithm converges rapidly as sample size increases, with more reptitions. 