In [1]:
import ipywidgets as widgets
from ipywidgets import HBox, interact, interactive, IntSlider, Label 
from IPython.display import display
import it_functions
import it_functions_examples as ife
import math
import matplotlib.pyplot as plt
import numpy as np
%matplotlib widget
%matplotlib inline
plt.rcParams['figure.figsize'] = [12, 7]

# Application of information theory to synthetic data

***

This notebook is a demonstration of the calculation and visualization of mutual information and transfer entropy using synthetic data with known time lags of interaction. 

## Definitions

>*mutual information* - the amount of information obtained by one variable when observing another. Here mutual information between source (x) and sink (y) is normalized by the shannon entropy of the sink variables, so that the output can be conceptualized as the fraction of uncertainty in the sink variable that can be explained by the source variable. Mutual information is as symmetric metric ($MI_{xy} == MI_{yx}$) and is similar to a non-linear correlation __[More info](https://github.com/pdirmeyer/l-a-cheat-sheets/blob/main/Coupling_metrics_V30_MI.pdf)__

>*transfer entropy* - the reduction of uncertainty in the sink variable (y) due to knowledge of the source variable (x) independent of reduction in uncertainty in the sink variable due to knowledge of its own past.  The output can be conceptualized as the fraction of entropy in the sink variable that can be explained by the source variable at a certain time lag, in excess of the sink variable's own history. Transfer entropy is an asymmetrical metric ($TE_{x->y} \neq TE_{y->x}$) __[More info](https://github.com/pdirmeyer/l-a-cheat-sheets/blob/main/Coupling_metrics_V30_TE.pdf)__

>For both metrics, significance is determined by shuffling the time series to destroy temporal relationships and calculating mutual information and transfer entropy. This is done 1000 times and we use the 99th percentile of those metrics calculated on shuffled values as a significance threshold for each metric

## Logistic mapping with known time lag

> Using a choatic logistic mapping with a known time lag, we can see how effective mutual information and transfer entropy are at identifying that time lag. The logistic map is defined as:

> $y_t = ax_{(t-lag)} [1-x_{(t-lag)}]+z_te$

> with lag = 5, a = 4 as the growth rate, z is drawn from a random guassian distribution, and e = 0.2 is a noise factor

### Mutual information

In [7]:
def f(ndata, e):
    ife.gen_plot_logistic_it_mi(ndata, e)
interactive_plot = interactive(f,
                                ndata = widgets.IntText(value=500,description='Number of data points:',disabled=False),
                                e = widgets.FloatText(value = 0.2, description = 'e:'))
                               
interactive_plot

interactive(children=(IntText(value=500, description='Number of data points:'), FloatText(value=0.2, descripti…

> The plot above shows the mutual information (blue), critical mutual information (blue dashed), and the pearson correlation coefficient (red) all calculated across a range of time lags. A couple of things to note:
> - Mutual information shows a significant value only at the correct time lag (5), at all other time lags the mutual information is below the significance value
> - Pearson correlation coefficient does show a peak there, but it's difficult to identify this as the most important time lag
> - By adjusting the two interactive inputs at the top of the plot (number of data points and random noise), you can see how those affect the mutual information calculations. Fewer data points and more noise makes the singal less clear

### Transfer entropy

In [6]:
def f(ndata, e):
    ife.gen_plot_logistic_it_te(ndata, e)
interactive_plot = interactive(f,
                                ndata = widgets.IntText(value=500,description='Number of data points:',disabled=False),
                                e = widgets.FloatText(value = 0.2, description = 'Random noise:'))
                               
interactive_plot

interactive(children=(IntText(value=500, description='Number of data points:'), FloatText(value=0.2, descripti…

> The plot above shows the transfer entropy from source to sink (blue) and from sink to source (orange), as well as the respective critical values for transfer entropy in both directions (dotted lines). A couple things to note:
> - With number of data = 500 and random noise = 0.2, the known critical time lag (5) should be the only significant transfer entropy value 
> - No value for $TE_{y->x}$ should rise above the critical value, unless the number of data is very low or the noise is very high
> - Similar to mutual information, by adjusting the two interactive inputs at the top of the plot (number of data points and random noise), you can see how those affect the mutual information calculations. Fewer data points and more noise makes the singal less clear

## Periodic signal with variable coupling coefficients

> Here we use a different relationship between x and y with coupling at multiple different time lags. The coupling also takes on multiple different functional forms:

> $y_t = cc_1 e^{x_{t-1}}+cc_2{x_{x-2}}^2+cc_3x_{x-3}+cc_4cos(x_{x-4})+z_te$

> where $cc_{1,2,3,4}$ are coupling coefficients and z is drawn from a random guassian distribution, and e = 0.2 is a noise factor

### Mutual information

In [5]:
def f(ndata, cc1, cc2, cc3, cc4, e):
    ife.gen_plot_periodic_it_mi(ndata, cc1, cc2, cc3, cc4, e)
interactive_plot = interactive(f,
                                ndata = widgets.IntText(value=500,description='Number of data points:',disabled=False),
                               cc1 = widgets.FloatText(value = 2, description = '$cc_1$: '),
                               cc2 = widgets.FloatText(value = 7, description = "$cc_2$ :"),
                               cc3 = widgets.FloatText(value = 0.5, description = "$cc_3 :$"),
                               cc4 = widgets.FloatText(value = 5, description = "$cc_4$ :"),                               
                               e = widgets.FloatText(value = 0.2, description = 'e :'))
                               
interactive_plot

interactive(children=(IntText(value=1000, description='Number of data points:'), FloatText(value=2.0, descript…

> The plot above shows the mutual information (blue), critical mutual information (blue dashed), and the pearson correlation coefficient (red) all calculated across a range of time lags. A couple of things to note:
> - With default values mutual information shows a significant value only at time lags of 1, 2, and 4, but not 3
> - Pearson correlation coefficient does show high values at 1, 2, and 3
> - By adjusting the six interactive inputs at the top of the plot (number of data points, coupling coefficients, and random noise), you can see how those affect the mutual information calculations. Fewer data points, weaker coupling, and more noise makes the singal less clear

### Transfer entropy

In [8]:
def f(ndata, cc1, cc2, cc3, cc4, e):
    ife.gen_plot_periodic_it_te(ndata, cc1, cc2, cc3, cc4, e)
interactive_plot = interactive(f,
                                ndata = widgets.IntText(value=500,description='Number of data points:',disabled=False),
                               cc1 = widgets.FloatText(value = 2, description = '$cc_1$: '),
                               cc2 = widgets.FloatText(value = 7, description = "$cc_2$ :"),
                               cc3 = widgets.FloatText(value = 0.5, description = "$cc_3 :$"),
                               cc4 = widgets.FloatText(value = 5, description = "$cc_4$ :"),                               
                               e = widgets.FloatText(value = 0.2, description = 'e :'))
                               
interactive_plot

interactive(children=(IntText(value=500, description='Number of data points:'), FloatText(value=2.0, descripti…

> The plot above shows the transfer entropy from source to sink (blue) and from sink to source (orange), as well as the respective critical values for transfer entropy in both directions (dotted lines). A couple things to note:
> - With number of data = 500 and random noise = 0.2, the cofficients set to their default values, then a time lag of 2 should be the only significant transfer entropy value 
> - No value for $TE_{y->x}$ should rise above the critical value, unless the number of data is very low or the noise is very high
> - Similar to mutual information, by adjusting the six interactive inputs at the top of the plot (number of data points, coupling coefficients, and random noise), you can see how those affect the mutual information calculations. Fewer data points, weaker coupling, and more noise makes the singal less clear