# Array measures

atoti is designed to handle array data efficiently.

## Loading arrays from CSV

atoti can load array from CSV files.
The separator for array elements must be provided to the `read_csv` method, and the CSV columns must use another separator.
All the arrays in a column must have the same length.

In [None]:
import atoti as tt

session = tt.create_session()

In [None]:
store = session.read_csv(
    "data/arrays.csv", keys=["Trade ID"], store_name="Store with arrays", array_sep=";"
)
store.head()

In [None]:
cube = session.create_cube(store, "Cube")

## Arrays default aggregations

As for scalar measures, atoti provides the default `SUM` and `MEAN` aggregations on array measures.
They are applied element by element on the array.

In [None]:
lvl = cube.levels
m = cube.measures

In [None]:
cube.query(m["PnL array.SUM"], m["PnL array.MEAN"], levels=lvl["Continent"])

## Additional array functions

### Arithmetic operations

In [None]:
m["PnL +10 array"] = m["PnL array.SUM"] + 10.0
m["PnL -10 array"] = m["PnL array.SUM"] - 10.0
m["PnL x10 array"] = m["PnL array.SUM"] * 10.0
m["PnL /10 array"] = m["PnL array.SUM"] / 10.0
cube.query(
    m["PnL +10 array"], m["PnL -10 array"], m["PnL x10 array"], m["PnL /10 array"]
)

### Sum, mean, min or max of all the array elements

In [None]:
m["PnL sum"] = tt.array.sum(m["PnL array.SUM"])
m["PnL mean"] = tt.array.mean(m["PnL array.SUM"])
m["min PnL"] = tt.array.min(m["PnL array.SUM"])
m["max PnL"] = tt.array.max(m["PnL array.SUM"])
cube.query(
    m["PnL sum"], m["PnL mean"], m["min PnL"], m["max PnL"], levels=lvl["Continent"]
)

### Length

In [None]:
m["PnL array length"] = tt.array.len(m["PnL array.SUM"])
cube.query(m["PnL array length"])

### Variance and Standard Deviation

In [None]:
m["PnL array variance"] = tt.array.var(m["PnL array.SUM"])
m["PnL array standard deviation"] = tt.array.std(m["PnL array.SUM"])
cube.query(
    m["PnL array variance"], m["PnL array standard deviation"], levels=lvl["Continent"]
)

### Sort

In [None]:
m["Sorted PnL array"] = tt.array.sort(m["PnL array.SUM"])
cube.query(m["Sorted PnL array"], levels=lvl["Continent"])

### Quantile

In [None]:
m["95 quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="simple")
m["95 exc quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="exc")
m["95 inc quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="inc")
m["95 centered quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="centered")
cube.query(
    m["95 quantile"],
    m["95 exc quantile"],
    m["95 inc quantile"],
    m["95 centered quantile"],
    levels=[lvl["Continent"], lvl["Country"]],
)

In [None]:
m["95 linear quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="linear"
)
m["95 lower quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="lower"
)
m["95 higher quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="higher"
)
m["95 nearest quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="nearest"
)
m["95 midpoint quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="midpoint"
)
cube.query(
    m["95 linear quantile"],
    m["95 lower quantile"],
    m["95 higher quantile"],
    m["95 nearest quantile"],
    m["95 midpoint quantile"],
)

### n greatest / n lowest

Returns an array with the n greatest/lowest values of a another array.

In [None]:
m["Top 3 PnL array"] = tt.array.n_greatest(m["PnL array.SUM"], 3)
m["Bottom 2 PnL array"] = tt.array.n_lowest(m["PnL array.SUM"], 2)
cube.query(m["Top 3 PnL array"], m["Bottom 2 PnL array"])

### nth greatest value / nth lowest value

Returns nth greatest or lowest value of a vector

In [None]:
m["3rd greatest PnL"] = tt.array.nth_greatest(m["PnL array.SUM"], 3)
m["2nd lowest PnL"] = tt.array.nth_lowest(m["PnL array.SUM"], 2)
cube.query(m["3rd greatest PnL"], m["2nd lowest PnL"])

### Element at index

Extract the element at a given index:

In [None]:
m["First element"] = m["PnL array.SUM"][0]
cube.query(m["First element"], m["PnL array.SUM"])

With the `create_static_parameter_hierarchy` function, it is possible to create a hierarchy corresponding to the indices of the array.
This hierarchy can then be used to "slice" this array and create a measure which depends on the selected index.

In [None]:
cube.create_static_parameter_hierarchy("Index", list(range(0, 10)))
m["PnL at index"] = m["PnL array.SUM"][lvl["Index"]]
cube.query(m["PnL at index"], levels=lvl["Index"])

You can also build non-integer hierarchies and map each member to its index in the hierarchy using the `index_measure` argument:

In [None]:
from datetime import date, timedelta

cube.create_static_parameter_hierarchy(
    "Dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    index_measure="Date index",
)
m["PnL at date"] = m["PnL array.SUM"][m["Date index"]]
cube.query(m["Date index"], m["PnL at date"], levels=lvl["Dates"])

In cases the indices need to be of arbitrary order or range, it is also possible to manually provide them as a list.

In [None]:
cube.create_static_parameter_hierarchy(
    "Custom dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    indices=[9, 8, 7, 6, 5, 0, 1, 2, 3, 4],
    index_measure="Custom date index",
)
m["PnL at custom date"] = m["PnL array.SUM"][m["Custom date index"]]
cube.query(m["Custom date index"], m["PnL at custom date"], levels=lvl["Custom dates"])

### Array slices

Extract a slice of the array:

In [None]:
m["First 2 elements"] = m["PnL array.SUM"][0:2]
cube.query(m["First 2 elements"], m["PnL array.SUM"])

## Load DataFrame with lists

atoti can load a pandas DataFrame containing NumPy arrays and Python lists

In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Index": [0, 1, 2],
        "NumPy array": [
            np.array([3.2, 1.0, 8, 9, 4.5, 7, 6, 18]),
            np.array([4.2, 4.0, 4, 9, 4.5, 8, 7, 8]),
            np.array([12, 1.0, 8, 9, 4.5, 7, 6, 18]),
        ],
        "Python list": [
            [3.2, 1.0, 8, 9, 4.5, 7, 6, 18],
            [4.2, 4.0, 4, 9, 4.5, 8, 7, 8],
            [12, 1.0, 8, 9, 4.5, 7, 6, 18],
        ],
    }
)
df

In [None]:
pd_store = session.read_pandas(df, "pandas")
pd_store

In [None]:
pd_store.head()