# Dispersion Entropy

## Rationale
- Sample Entropy, though powerful and broadly used in many signal + image-processing applications, is not fast enough,
especially for long signals.
- Permutation Entropy which is based on order relations among values of a signal, though conceptually simple and
computationally fast, considers only the order of the amplitude values and
hence some information regarding the amplitudes may be discarded (such as the mean value of amplitudes and differences between
amplitude values).

Introduce Dispersion Entropy:
- Unlike Perm Ent, it can detect the noise bandwidth + simultaneous frequency + amplitude change.
- Considerably outperforms PE to discriminate different groups of each dataset.
- Computation time significantly less than SE and PE

To appreciate the relevance and possible usefulness of DE in a number of signal analyses, it is important to understand
behaviour of the technique for various kinds of classical signal concepts such as:
- Amplitude
- Frequency
- Noise power
- Signal bandwidth

## Formula
Given a univariate signal length N
1. First, $x_j(j = 1,2,...,N)$ are mapped to $c$ classes, labelled from 1 to $c$. To do so, we employ a normal cumulative
distribution function (NCDF) to map $x$ into $y = {y_1, y_2, ..., y_N}$ from 0 to 1. Next, we use a linear algorithm to
assign each $y_j$ to an integer from 1 to $c$. To do so, we use $z^c_j = round(c.y_j + 0.5)$ where $z^c_j$ for each $j$.
2. Each embedding vector $z^{m,c}_i$ with embedding dimension $m$ and time delay $d$ is created according to
$z^{m,c}_i = \{z^c_i, z^c_{i+d}, ..., z^c_{i+(m-1)d}\}$, $i = 1,2,...,N-(m-1)d$. Each time series $z^{m,c}_i$ is mapped to
a dispersion pattern $\pi_{v_0 v_1 ... v_{m-1}}$. The number of possible dispersion patterns that can be assigned to each
time series $z^{m,c}_i$ is equal to $c^m$, since the signal has $m$ members and each member can be one of the integers from
1 to $c$.
3. For each $c^m$ potential dispersion patterns, relative frequency is obtained as follows:
$$p(\pi_{v_0 v_1 ... v_{m-1}}) = \frac{Number\{i|i \leq N - (m-1)d, z^{m,c}_i \text{ has type } \pi_{v_0 v_1 ... v_{m-1}}\}}{N-(m-1)d}$$

In other words, $p(\pi_{v_0 v_1 ... v_{m-1}})$ shows the number of dispersion patterns $\pi_{v_0 v_1 ... v_{m-1}}$ that
are assigned to $z^{m,c}_i$, divided by total number of embedding signals with embedding dimension $m$.
4. Finally, based on Shannon Entropy, DE with embedding dimension $m$, time delay $d$ and number of classes $c$ is:
$$DE(x, m, c, d) = - \sum^{c^m}_{\pi=1}p(\pi_{v_0 v_1 ... v_{m-1}}).ln(p(\pi_{v_0 v_1 ... v_{m-1}}))$$

Define signal, c, m, d

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import norm
import matplotlib.pyplot as plt

signal = [9, 8, 1, 12, 5, -3, 1.5, 8.01, 2.99, 4, -1, 10]
delay = 1
embed = 2
classes = 3

Define ncdf_mapping to map signal into y from 0 to 1

In [2]:
def ncdf_mapping(signal):
    length = len(signal)
    mean = np.mean(signal)
    std = np.std(signal) if np.std(signal) != 0 else 0.001
    ncdf = norm(loc=mean, scale=std)
    mapped_signal = np.zeros(length)
    for i in range(length):
        mapped_signal[i] = ncdf.cdf(signal[i])
    return mapped_signal

ncdf_mapping(signal)

array([0.82623478, 0.76305643, 0.19867004, 0.94619779, 0.51854591,
       0.0409939 , 0.23123962, 0.76374496, 0.34377923, 0.42986535,
       0.09803594, 0.87750645])

Map each $y_j$ to an integer from 1 to $c$ using linear z function
$$z^c_j = round(c.y_j + 0.5)$$

In [3]:
length = len(signal)
mapped_signal = ncdf_mapping(signal)
z_signal = np.round(classes * mapped_signal + 0.5)
z_signal

array([3., 3., 1., 3., 2., 1., 1., 3., 2., 2., 1., 3.])

We have $c^m$ possible dispersion patterns
`3 ^ 2 = 9` $(\pi_{11},\pi_{12},\pi_{13},\pi_{21},\pi_{22},\pi_{23},\pi_{31},\pi_{32},\pi_{33})$
and $N - (m-1) * d = 12 - (2-1) = 11$ embedding vectors of length 2 and their associated dispersion patterns:


In [4]:
dispersions = np.zeros(classes ** embed)
dispersions

array([0., 0., 0., 0., 0., 0., 0., 0., 0.])

Each embedding vector $z^{m,c}_i = \{z^c_i, z^c_{i+d}, ..., z^c_{i+(m-1)d}\}$ for $i = 1,2,...,N-(m-1)d$

<img src="images/pic11.jpg" width="600">

In [5]:
for i in range(length - (embed - 1) * delay):
    tmp_pattern = z_signal[i:i + embed * delay:delay] # last :delay means the increment length
    pattern_index = 0
    print(f"Embedding vector {i + 1}: ", tmp_pattern)
    for idx, c in enumerate(reversed(tmp_pattern)):
        c = classes if c == (classes + 1) else c
        pattern_index += ((c - 1) * (classes ** idx))
    print("Index: " ,pattern_index)

    dispersions[int(pattern_index)] += 1

print("\nFrequency of each dispersion pattern:")
dispersions


Embedding vector 1:  [3. 3.]
Index:  8.0
Embedding vector 2:  [3. 1.]
Index:  6.0
Embedding vector 3:  [1. 3.]
Index:  2.0
Embedding vector 4:  [3. 2.]
Index:  7.0
Embedding vector 5:  [2. 1.]
Index:  3.0
Embedding vector 6:  [1. 1.]
Index:  0.0
Embedding vector 7:  [1. 3.]
Index:  2.0
Embedding vector 8:  [3. 2.]
Index:  7.0
Embedding vector 9:  [2. 2.]
Index:  4.0
Embedding vector 10:  [2. 1.]
Index:  3.0
Embedding vector 11:  [1. 3.]
Index:  2.0

Frequency of each dispersion pattern:


array([1., 0., 3., 2., 1., 0., 1., 2., 1.])

- Calculate probability for each dispersion pattern:
<img src="images/pic12.jpg" width="600">

In [6]:
probs = dispersions / sum(dispersions)
probs

array([0.09090909, 0.        , 0.27272727, 0.18181818, 0.09090909,
       0.        , 0.09090909, 0.18181818, 0.09090909])

- Calculate dispersion entropy:
$$DE(x, m, c, d) = - \sum^{c^m}_{\pi=1}p(\pi_{v_0 v_1 ... v_{m-1}}).ln(p(\pi_{v_0 v_1 ... v_{m-1}}))$$

In [7]:
probs = list(filter(lambda p: p != 0., probs))
print("Filter out 0 prob: \n", probs)
de = -1 * np.sum(probs * np.log(probs))
print("Dispersion entropy: ", de)


Filter out 0 prob: 
 [0.09090909090909091, 0.2727272727272727, 0.18181818181818182, 0.09090909090909091, 0.09090909090909091, 0.18181818181818182, 0.09090909090909091]
Dispersion entropy:  1.8462202193216335
