# 📈 Structure Function

This is an example on how to use the structure function calculation and how to plot it.

## Basic description

The structure function is calculated from real input data and a timestamp. The input can be Array, Dataframe or Series objects, with each case briefly demonstrated below. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.optimize
from matplotlib.ticker import ScalarFormatter

In [None]:
import parmesan
from parmesan.analysis import structure, structure_function

In [None]:
def fit_powerlaw(x, y, exponent=1):
    def powerlaw(x, a):
        return a * x**exponent

    popt, pcov = scipy.optimize.curve_fit(powerlaw, x, y)
    return powerlaw(x, popt[0])

This is a set of modifications for plot appearance (up to the individual's personal aesthetics):

In [None]:
plt.rcParams["figure.figsize"] = (10, 6)
plt.rcParams["axes.grid"] = True
plt.rcParams["axes.axisbelow"] = True
plt.rcParams["font.size"] = 15
plt.rcParams["legend.fontsize"] = "medium"
plt.rcParams["font.family"] = "monospace"

## Dummy signal generation

### Step

First step is to create a "fake" dataset for the function test. In a real life scenario, one would import their data from a .txt, .csv or similar kind of file.

Firstly, a step is defined:

In [None]:
ns = 7000

Then, let's create a datetime Series with the above step, between two specific arbitrary datetimes (in this case, it's a period of 40 seconds in total):

In [None]:
t_arr = pd.date_range(
    start="2021-04-29 10:00:00", periods=ns, end="2021-04-29 10:00:40"
)  # a datetime array using ns number of steps, between two specific arbitrary dates

Now, a conversion to a numpy array and then epochtime format:

In [None]:
t = np.linspace(0, 15, ns)

t1 = np.array(t_arr)
epoch = t1.astype(np.int64) / 10**9

### Signal: Sin wave

From the procedure above, we now have our timestep as an epochtime array. This is one of the two main arguments needed in the structure function. The other one is of course the signal and for starters, let's try giving a simple sinus function:

In [None]:
wave = np.sin(2 * np.pi * (epoch - epoch[0]))

One would expect the structure function's behavior to be an undulation between anti-correlation (value of 2) and correlation (value of 0) periodically according to the wave's frequency. Let's call the function and look at the plot:

In [None]:
sh_arr, D_arr = structure(wave, t_arr)  # the time shift and structure function

fig = plt.figure()
plt.plot(sh_arr, D_arr, lw=2, color="darkgray")
plt.axvline(0.5, color="red", lw=2, label="Anti-correlated")
plt.axvline(0.75, color="green", lw=2, label="Uncorrelated")
plt.axvline(1, color="navy", lw=2, label="Correlated")
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.legend(loc=4);

The structure function's outcome is pretty elegant: considering one full period, the signal shows full anti-correlation after T/2 (same amplitude but opposite sign, i.e the red line), and then moves to correlation after another T/2 (i.e. the green line). It is exactly the result you would expect when having the structure function's formula in mind.

In [None]:
t2 = np.linspace(0.01, 0.10, 100)  # values to create a parabola
f = 30 * (t2) ** 2  # simple parabola example

The same plot in logarithmic axes:

In [None]:
fig = plt.figure()
plt.loglog(sh_arr, D_arr, lw=2, color="black")
plt.loglog(t2, f, lw=3, label="Slope = 2")
plt.grid(True, which="both", ls="-")
plt.legend(loc=4);

The maximum slope accepted in the log plot of the structure function is 2. For better illustration, the blue line in the plot is a simple parabola with a logarithmic slope of 2, which is exactly the same as the left part of our plot.

### Signal: Random dataset

Now let's try creating a dataset that could resemble a real life measurement period. Using the McCartney - Voss algorithm for pink noise generation, a random dataset is imported:

In [None]:
wave = pd.read_csv(
    "yourdata.csv.xz"
)  # replace the name of the file with your own dataset!

How the random data looks like:

In [None]:
plt.figure()
plt.plot(wave);

Calling the structure function again:

In [None]:
sh_arr, D_arr = structure(wave.values, t_arr)

In [None]:
fig = plt.figure()
plt.loglog(sh_arr, D_arr, lw=2)
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.axvline(0.03, color="brown", lw=2)
plt.plot(
    sh_arr[region := (sh_arr < 0.03)],
    (fit := fit_powerlaw(sh_arr[region], D_arr[region], exponent=2))
    * np.nanmax(D_arr[region] / fit),
    label=f"power law 2",
)
plt.axvline(0.5, color="black", lw=2)
plt.plot(
    sh_arr[region := (0.03 < sh_arr) & (sh_arr < 0.5)],
    (fit := fit_powerlaw(sh_arr[region], D_arr[region], exponent=2 / 3))
    * np.max(D_arr[region] / fit),
    label=f"power law 2/3",
)
plt.plot(
    sh_arr[region := (0.5 < sh_arr)],
    (fit := fit_powerlaw(sh_arr[region], D_arr[region], exponent=0))
    * np.max(D_arr[region] / fit),
    label=f"constant",
)

plt.grid(True, which="both", ls="-")
plt.legend()

Logarithmic axes are displayed to resemble a possible, real life scenario structure function. In that case, one could then go on and split the above plot into regions of a zeroeth slope where the variable in question is uncorrelated (on the right side of the black line), a 2/3 slope in between the brown and black lines, and slop of 2 before the brown line (correlation). 

**Attention**: the plot is a result of generated noise, so the slopes are not actually equal to these values, but it is only  <ins>similar</ins> to what one would encounter when calculating structure functions in reality (however, all slopes in the plot are lower than 2, which is the mathematical limit and the structure function here still adheres to it).

If a non-normed structure function is required, an addition of an argument like this: ```normed=False``` does the job.

In [None]:
sh_arr_2, D_arr_2 = structure(wave.values, t_arr, normed=False)

fig = plt.figure()
plt.loglog(sh_arr, D_arr, label="Normed")
plt.loglog(sh_arr_2, D_arr_2, label="Non-normed")
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.legend(loc=4)
plt.axvline(0.5, color="black", lw=2)
plt.axvline(0.03, color="brown", lw=2)
plt.grid(True, which="both", ls="-")

The default order of the structure function in Parmesan is 2. The user can calculate a different structure function order by adding the argument ```order=n```, where n is the desired order:

In [None]:
sh_arr_3, D_arr_3 = structure(wave.values, t_arr, order=3, normed=False)
sh_arr_4, D_arr_4 = structure(wave.values, t_arr, order=4, normed=False)
sh_arr_5, D_arr_5 = structure(wave.values, t_arr, order=5, normed=False)

fig = plt.figure()
plt.loglog(sh_arr_2, D_arr_2, label="n = 2")
plt.loglog(sh_arr_3, D_arr_3, label="n = 3")
plt.loglog(sh_arr_4, D_arr_4, label="n = 4")
plt.loglog(sh_arr_5, D_arr_5, label="n = 5")
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.legend(loc=4)
plt.axvline(0.5, color="black", lw=2)
plt.axvline(0.03, color="brown", lw=2)
plt.grid(True, which="both", ls="-")

## Input case 2: Dataframe

In the cases above, the structure function is called by using two arrays as an input (note the ```wave.values``` lines). The user can also call the function directly with a two-column dataframe of the step and signal.

Create a dataframe that has the selected timestamp as index and one column with the signal values. In this case, the column is named "wave":

In [None]:
w_df = pd.DataFrame(wave.values, columns=["wave"], index=t_arr)

Now, the structure function can be called with only the ```w_df``` parameter:

In [None]:
D_df = structure(w_df)

The plot:

In [None]:
fig = plt.figure()
plt.loglog(D_df.index, D_df["wave"])
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.axvline(0.5, color="black", lw=2)
plt.axvline(0.03, color="brown", lw=2)
plt.grid(True, which="both", ls="-")

## Input case 3: Series

For the sake of the example, we convert the dataframe for before to a series by using the ```df.squeeze()``` function, which converts 1-D Dataframes to scalars. Then, the function is called immediately:

In [None]:
D_s = w_df.squeeze().parmesan.structure()

The plot:

In [None]:
fig = plt.figure()
plt.loglog(D_s.index, D_s)
plt.ylabel("Structure Function D", fontsize=20, fontweight="bold")
plt.axvline(0.5, color="black", lw=2)
plt.axvline(0.03, color="brown", lw=2)
plt.grid(True, which="both", ls="-")