#### Rationale

ApEn is limited: Whatever process (even white noise) is considered, a short data sequence generates zero values of entropy rate for a large (w.r.t total amount of data) length of pattern $(m)$. Hence, we cannot quantify the regularity of the time series.

In other words, embedding the dynamics into a too highly dimensional phase space produces entropy rate estimates equal to 0. This forces Pincus et al. (1993) to fix the pattern lengh (i.e. the embedding dimension m) at very small arbitraty values in order to obtain a reliable entropy rate estimate.

- Aim: Propose a new method for quantification of regularity of a process from short data sequences, based on defining a new CCE function and search for its minimum in respect of the pattern length.

- Attempt to **distinguish** the **decrease of entropy rate** related to **presence of recurrences** from that related to the **shortness of the data sequence**

#### Formula

##### 1. Conditional Entropy

To take advantage of stationary, normalize the series $\{X(i)\}$ to a process with 0 mean and unitary variance $$x(i) = \frac{X(i) - av[X]}{std[X]}$$

From the normalized series, a reconstructed $L-dimensional$ phase space with a delay of reconstruction equal to 1 is obtained by considering $N-L+1$ vectors $x_L(i) = (x(i), x(i-1), ..., x(i-L+1)$. Each vector $x_L(i)$ represents a pattern of L consecutive samples.

CE is defined: $$E(L/L-1) = - \sum_{L-1} p_{L-1} \sum_{L/L-1} p_{L/L-1} \log p_{L/L-1}$$

where $p_{L-1}$ denotes the joint probability of the pattern $x_{L-1}(i)$ and $p_{L/L-1}$ symbolizes conditional probability of the Lth sample of the pattern $x_L(i)$ given the previous $L-1$ ones.

CE can be directly derived from definition of Shannon Entropy (SE) of $x_L(i)$ $$E(L) = - \sum_L p_L \log p_L$$

CE can be obtained as the variation of the SE w.r.t L: $$E(L/L-1) = E(L) - E(L - 1)$$

- SE represents the amount of information needed to specify the point $x_L(i)$ in a L-dimensional phase space.

- CE quantifies the variation of information necessary to specify a new state in a one-dimension incremental phase space. Small CE values are obtained when a length L pattern can be almost completely predicted by a length L - 1 pattern.

##### 2. Conditional entropy estimate

$$E_{hat}(L/L-1) = E_{hat}(L) - E_{hat}(L-1)$$ where $E(L)$ and $E(L-1)$ represent estimate of SE in a L-dimensional and (L-1)-dimensional phase space. $E(L)$ can be estimated by approximating probabilities in $E(L) = - \sum_L p_L \log p_L$ with the sample frequencies.

In order to calculate, series $\{x(i)\}$ is spread on $\xi$ quantization levels each with amplitude $\epsilon = (x_{max} - x_{min})/\xi$, where $(x_{max} - x_{min})$ represents full range of process dynamics.

Quantization defines a partition of L-dim phase space in $M$ hypercubes ($M = \xi ^ L$) of side length $\epsilon$. Points inside each hypercube are a most $\epsilon$ difference in distance between their coordinates. Since each point represents a length $L$ pattern, several points in the same hypercube mean identical pattern within a precision of $\epsilon$. In contrast, when a length-L pattern appears only once, the relevant point is single in a hypercube. Since **estimate of sample frequencies depends on both series length N and on number of quantization levels $\xi$, $E_{hat}(L/L-1)$** is a function of $L, N, \xi$.

##### 3. Effect of limited number of samples on $E_{hat}(L/L-1)$

Limited data introduces a negative bias in $E_{hat}(L/L-1)$. A length $L-1$ pattern may be found only once in data sequence (the relevant point is single in a (L-1)-dimension hypercube). In this case the length L pattern, derived as an extension of length ($L-1$) pattern by adding one value, will also be detected only once (the relevant point will be single in a L-dimensional hypercube).

Therefore, the unique appearance of length L pattern is completely predicted by length (L-1) pattern (estimate of conditional probability $p_{L/L-1}$ equals 1). So single points in (L-1)-dimensional phase space give a null contribution to $E_{hat}(L/L-1)$.

Also, if the series exhibits random components, number of patterns found only once augments with L and number of points contributing to $E_{hat}(L/L-1)$ decreases more. Hence, for a completely stochastic series such as one with guassian white noise, $E_{hat}(L/L-1)$ tends to zero which give a false impression of determinism (Lower entropy -> more regularity).

More formally, bias of $\hat{E}(L/L-1)$  is clear when considering that $\hat{E}(L) = \hat{E}_{single}(L) + \hat{E}_{not single}(L)$, where $\hat{E}_{single}(L), \hat{E}_{not single}(L)$ represent contribution of $\hat{E}(L)$ by single and not single points (in hypercube).
$$\hat{E}_{single}(L) = perc(L).log(N-L+1)$$ where $perc(L)$ denotes percentage of single points in the L-dim phase space.

##### 4. Effect of pattern length on $\hat{E}(L/L-1)$

To limit presence of single points, the series length N should be $\geq \xi^{L+1}$ so even in randomly distributed noise, an average of $\geq 1$ point per hypercube for each phase space up to $L$-dimension. In contrast, when only short data sequences are available, both $L$ and $\xi$ are fixed to small arbitrary values ($L=2$, $\xi$ from 4 to 10 as in Pincus et al. 1993).

However, when N is small, even after limiting $L, \xi$, some problems arise. As the growth of single points is fast due to exponential increase in M hypercubes ($M=\xi^L$), we can get different CE values for small L.

##### 5. Corrected conditional entropy

Rationale:
1. Overcome problem of limited samples (i.e growing percentage of single points as L increases)
2. Avoid a-priori selection of embedding dimension

Formula:
Find minimum of this function:
$$CCE(L) = \hat{E}(L/L-1) + E_c(l)$$

CCE is sum of CE and **corrective term**:
$$E_c(L) = perc(L).\hat{E}(1)$$
where $perc(L)$ is percentage of single points in the L-dim phase space and $\hat{E}(1)$ is estimated value of SE for $L=1$.

Scale factor $E(1)$ chosen as it represents theoretical CE value of white noise with same probability distribution of considered series.

$Perc(L)$ is empirically chosen as single (L-1) patterns give a null contribution to CE and single L-patterns from a few (L-1)-patterns also give no robust contribution to $\hat{E}(L/L-1)$.

Since CCE is sum of 2 terms, the first decreasing and second increasing with L, it exhibits a minimum. CCE min is considered best estimate of CE with limited data.

#### Advantages
1. Possibility to improve reliability of entropy rate estimation
- The proposed correction implicitly states that length L patterns which do appear only once cannot be used to fix prediction rules and reduce entropy rate. In contrast, when repetitive patterns detected in data sequence, no correction made and regularity is recognised. So CCE estimate is a max entropy estimate according to available data. When new data considered by enlarging temporal frame, CCE estimate either remains stable if different patterns are found or decrease toward 0 if patterns previously found only once are recognised to be repetitive -> recognise regularity

- Minimisation of CCE function: avoid a-priori selection of embedding dim. Thus, dim L corresponding to CCE minimum is best embedding dimension with so few points. It a compromise between low L which do not permit resolution of complex periodic structures and large L which lead rapidly to statistical insignificance due to series' shortness. So, same process can have different optimal embedding dim while varying data set length.

Results of experiment show CCE min measure regularity. However, CCE values depend on series length and number of quantization levels:
- Small quantization level: less dependence on length of series

In [1]:
import numpy as np
from CCE import *

In [2]:
shannon_entropy_with_comments(np.array([1,2,3,1,2,3]), 3, 5)

Original series: [1 2 3 1 2 3]
Normalized series: [-1.11803399  0.          1.11803399 -1.11803399  0.          1.11803399]
epsilon: 0.447213595499958
partition: [-1.11803399 -0.67082039 -0.2236068   0.2236068   0.67082039  1.11803399]
codebook: [-1  0  1  2  3  4  5]
Uniform quantification of the time series:
quantizations before: [-1  2  4 -1  2  4]
quantizations after: [0 2 4 0 2 4]
Compose patterns of length 'L':
X:
 [[0. 2. 4. 0. 2. 4.]
 [2. 4. 0. 2. 4. 0.]
 [4. 0. 2. 4. 0. 0.]]
Eliminate last 'L-1' columns of 'X' since they are not real patterns
X after
: [[0. 2. 4. 0.]
 [2. 4. 0. 2.]
 [4. 0. 2. 4.]]
Get the number of repetitions of each pattern by going through columns of 'X':
col j of X: [0. 2. 4.]
col i (j+1 onwards) of X: [2. 4. 0.]
col j of X: [0. 2. 4.]
col i (j+1 onwards) of X: [4. 0. 2.]
col j of X: [0. 2. 4.]
col i (j+1 onwards) of X: [0. 2. 4.]
2 columns are equal, set col i to nan
col j of X: [2. 4. 0.]
col i (j+1 onwards) of X: [4. 0. 2.]
col j of X: [2. 4. 0.]
col i 

(array(1.03972077), 2.0)

In [3]:
conditional_entropy_with_comments(np.array([1,2,3,1,2,3]), 3, 5)

Calculate Shanon entropy for series with embedding dimension 'L': 1.0397207708399179 , number of unique patterns: 2.0
Calculate Shanon entropy for series with embedding dimension 'L-1': 1.0549201679861442 , number of unique patterns: 2.0
Conditional entropy: -0.015199397146226312


(-0.015199397146226312, 2.0)

In [4]:
corrected_conditional_entropy_with_comments(np.array([1,2,3,1,2,3]), 3, 5)


Calculate Ê(1) with 'L=1' which will be used to calculate the corrective term: 1.0986122886681096
CCE will be a vector that will contain several CCE values computed:
We loop for different value of embedding dimensions 'L':

L: 2 

First, compute CE for the current embedding dimension:
CE: [        nan -0.04369212         nan]
uniques: [nan  1. nan]
Second, compute the percentage of patterns which are not repeated
perc_L: 0.2
Ê(1): 1.0986122886681096
CCE is CE + corrective term:
correct_term: [       nan 0.21972246        nan]
CCE: [100.           0.17603034          nan]

L: 3 

First, compute CE for the current embedding dimension:
CE: [        nan -0.04369212 -0.0151994 ]
uniques: [nan  1.  2.]
Second, compute the percentage of patterns which are not repeated
perc_L: 0.5
Ê(1): 1.0986122886681096
CCE is CE + corrective term:
correct_term: [       nan 0.21972246 0.54930614]
CCE: [100.           0.17603034   0.53410675]

Final CCE: [100.           0.17603034   0.53410675]
Get min CCE va

0.17603033705165655

In [5]:
import random
irregular_array = []
for i in range(3):
    irregular_array.extend([1,2,random.randint(0,10)])
irregular_array = np.array(irregular_array)

regular_array = np.tile(np.array([1,2,3]), 3)
print("Third element randomized:", irregular_array)
print("More regular array:", regular_array)

Third element randomized: [1 2 9 1 2 2 1 2 0]
More regular array: [1 2 3 1 2 3 1 2 3]


In [6]:
print("Shannon entropy for more regular array:", shannon_entropy(regular_array, 5, 10)[0])
print("Conditional entropy for more regular array:", conditional_entropy(regular_array, 5, 10)[0])
print("Corrected conditional entropy for more regular array:", corrected_conditional_entropy(regular_array, 5, 10))

print("\n")

print("Shannon entropy:", shannon_entropy(irregular_array, 5, 10)[0])
print("Conditional entropy:", conditional_entropy(irregular_array, 5, 10)[0])
print("Corrected conditional entropy:", corrected_conditional_entropy(irregular_array, 5, 10))

Shannon entropy for more regular array: 1.0549201679861442
Conditional entropy for more regular array: -0.043692120681965374
Corrected conditional entropy for more regular array: -0.01641675862934222


Shannon entropy: 1.6094379124341005
Conditional entropy: -0.18232155679395423
Corrected conditional entropy: 1.0325680971551663
