# 1. Compositional Data
Compiled by [Morgan Williams](mailto:morgan.williams@csiro.au) for C3DIS 2018 

Geochemical data is compositional in nature, meaning that values are relative and subject to closure (i.e. they sum to 100%). This leads to spurious correlation (e.g. for two variable compositions $X = C(x_1, x_2)$, $x_2 = 1-x_1$ by definition), and the restriction of possible values to $\mathbb{R}\in[0,1]$.

Compositional random variables are log-normally distributed. Log-transformations of relative compositional components allow the use standard statistical techniques, with values previously constrained to $\mathbb{R}\in[0,1]$ now spread over $\mathbb{R}\in[-\infty,\infty] \equiv \mathbb{R}$.

In [14]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from geochem import *

In [15]:
xs = 1/(np.random.randn(5)+4)
X = np.array([xs, 1-xs])
X = close(X)

for t in [LinearTransform, ALRTransform, CLRTransform, ILRTransform]:
    T = t()
    Y = T.transform(X)
    new_X = T.inverse_transform(Y)
    assert np.allclose(X, new_X)

## References