In [None]:
%matplotlib inline
%config IPCompleter.greedy=True

import copy
import os
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import math as math
import scipy.stats as st
import auxiliary_functions as aux
import dfa_functions as dfaf
from scipy.optimize import curve_fit


pd.options.display.max_rows = 10

# Run the script to load the data (the series and their segmentation) and to delete all segments with indeterminate stage or whose max frequency is below 85%
from load_data import load_data, redo_classification
data, seg_res = load_data()
# Reclassificates the segments using the 4 stage system.
seg_res = redo_classification(seg_res, data)


## Detrended Fluctuation Analysis
***

### Description

It's possible to ask, for any nonstationary time-series, about the cause of the fluctuations, if they occur because of some internal non-linear behaviour or if they are caused by external conditions. If we separate those fluctuations by its causes (internal or external), it's possible to presume that the internal fluctuations can show particular correlations, corresponding to the dynamics of the system. HRV time-series, particularly, have long-term Power Law correlations (a correlation that can be described by $y = a\cdot x^k$, where $a$ and $k$ are constants) for healthy individuals, so that variations of the fractal property (self-similarity after scale changes) of those correlations can be a way of detecting disturbances.

Detrended Fluctuation Analysis can be used to detect those long-term correlations, distinguishing then from external perturbations, seen as tendencies.

Given a HRV time-series of size $N$, we integrate it, $y(k) = \sum_{i=1}^k [B(i) - B_m]$, where $B(i)$ is the $\textit{i}$th interval between heart beats and $B_m$ is the average interval, and $k$ varies between $1$ and $N$. The resulting series is divided in equal sized boxes of size $n$, the scale, and in each box a polynomial curve, the tendency, is calculated using least squares. We denote the $y$ coordenate of the polynominal as $y_n(k)$. Then, the series $y(k)$ is detrended by subtracting $y_n(k)$ in each box. The value $F(n)$ given by $$F(n) = \sqrt{\dfrac{1}{N}\cdot\sum_{k=1}^N [y(k) - y_n(k)]^2}$$ is the average fluctuation in each box as a function of the scale $n$. In order to find a relation between $F(n)$ and $n$, $F(n)$ is computed for a range $n$ values. A linear relation in logarithmic scale ($log_{10}F(n)$ and $log_{10}n$) indicates a Power Law characterized by a coefficient $\alpha$, which can be interpreted in different ways depending on its range of values. For uncorrelated data, the scale exponent is $\alpha = 0.5$. For short-range correlated data, $\alpha > 0.5$ for small scales. Power-law behavior with $\alpha > 0.5$ on large scales indicates long-range correlations in the data