# Statistics in Lie groups

> **_Tip:_** Launch live version of this tutorial: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/morphomatics/morphomatics.github.io/HEAD?filepath=docs%2Ftutorials%2Ftutorial_biinvariant_statistics.ipynb)

## Bi-invariant similarity measures
Given two distributions of samples, it is often necessary to quantify the difference between them.
Two popular indices that can be used for data from Euclidean spaces are the Hotelling $T^2$ statistic and Bhattacharyya
distance. While the first quantifies differences between the means only, the latter takes also differing covariance
structures into account.
Remarkably, both can be generalized to data from Lie groups in a way that respects
the group's fundamental properties: The _bi-invariant_ Hotelling $T^2$ statistic and Bhattacharyya distance are
invariant under translations of the data from both left and right. For shape analysis
this leads (amongst others) to an analysis that is independent of
the choice of reference.
For the definitions see **[Bi-invariant Two-Sample Tests in Lie Groups for Shape Analysis.](https://arxiv.org/abs/2008.12195)**
In this paper, there is also shown how a bi-invariant two sample test for differences in mean shape can
be constructed from these notions.

## Example
In the following, we show how to use bi-invariant similarity measures in _Morphomatics_. For this, we choose the Lie group
$\text{GL}^+(3)$ of 3-by-3 matrices with positive determinant. For bi-invariant statistics in $\text{GL}^+(3)$
we need a non-metric (i.e., non-Riemannian) structure. When using it, a geodesic $\gamma$ passing through a matrix $A \in \text{GL}^+(3)$
with tangent vector $X \in \mathbb{R}^{3,3}$ is given by the matrix exponential:

$$
\gamma(t) = Ae^{tX}.
$$

Since the matrix exponential is a fundamental _group_ property of $\text{GL}^+(3)$, this gives the desired
connection of group and geometric properties. (This works—far more generally—in any finite-dimensional Lie group.)

Thus, we can create two sample sets $E, F \subset \text{GL}^+(3)$ with the identity matrix $I \in \text{GL}^+(3)$ as mean as follows
('groupexp' is the matrix exponential).

In [1]:
import numpy as np

from morphomatics.stats import BiinvariantStatistics
from morphomatics.manifold import GLp3

# initialize Lie group of 3-by-3 matrices with positive determinant
G = GLp3()
# initialize module for bi-invariant statistics
bistat = BiinvariantStatistics(G)
# identity matrix
I = G.group.identity

# sample 2 data sets around I
E = []
F = []
for i in range(3):
    for j in range(3):
        # create tangent vector
        e = np.zeros((1, 3, 3))
        e[0, i, j] = 1
        # shoot geodesic along tangent vector
        E.append(G.group.exp(e))
        E.append(G.group.exp(-e))
        F.append(G.group.exp(.2 * e))
        F.append(G.group.exp(-.2 * e))

We can run the following commands for these data sets.

In [2]:
T_EE = bistat.hotellingT2(E, E)
T_EF = bistat.hotellingT2(E, F)
D_EE = bistat.bhattacharyya(E, E)
D_EF = bistat.bhattacharyya(E, F)

print(f'''
Comparing E to E
Hotelling T² stat.: {T_EE}
Bhatacharyya dist.: {D_EE}

Comparing E to F
Hotelling T² stat.: {T_EF}
Bhatacharyya dist.: {D_EF}
''')


Comparing E to E
Hotelling T² stat.: 0.0
Bhatacharyya dist.: 0.0

Comparing E to F
Hotelling T² stat.: 0.0
Bhatacharyya dist.: 4.2998015026234615



As expected the difference from a data set to itself is zero for both indices.
Furthermore, since $E$ and $F$ have mean $I$, their bi-invariant Hotelling $T^2$ statistic is still (numerically) zero.
On the other hand, the Bhatacharyya distance between them is positive, since their covariance structures differ.

We can also test the invariance under translations of both indices. For $\text{GL}^+(3)$ left and right
translations ('lefttrans' and 'righttrans' in _Morphomatics_) by an element $B \in \text{GL}^+(3)$ are simply multiplications with B from left and right,
repectively.

In [3]:
# random element
B = G.rand()
BE = []
BF = []
# left translate all elements of E and F by B
for e in E:
    BE.append(G.group.lefttrans(e, B))
for f in F:
    BF.append(G.group.lefttrans(f, B))
EB = []
FB = []
# right translate all elements of E and F by B
for e in E:
    EB.append(G.group.righttrans(e, B))
for f in F:
    FB.append(G.group.righttrans(f, B))

Now, we can compute both notions on the left/right translated data sets and compare the results with the original ones.

In [4]:
T_BEBF = bistat.hotellingT2(BE, BF)
T_EBFB = bistat.hotellingT2(EB, FB)
D_BEBF = bistat.bhattacharyya(BE, BF)
D_EBFB = bistat.bhattacharyya(EB, FB)

print(f'''
Difference under left-translation 
Hotelling T² stat.: {np.abs(T_BEBF - T_EF)}
Bhatacharyya dist.: {np.abs(D_BEBF - D_EF)}

Difference under right-translation 
Hotelling T² stat.: {np.abs(T_EBFB - T_EF)}
Bhatacharyya dist.: {np.abs(D_EBFB - D_EF)}
''')


Difference under left-translation 
Hotelling T² stat.: 1.2806693387888818e-29
Bhatacharyya dist.: 2.6645352591003757e-15

Difference under right-translation 
Hotelling T² stat.: 9.376341172313876e-29
Bhatacharyya dist.: 3.552713678800501e-15



Because of the bi-invariance property both indices give the same values for the translated data.