# Flag estimator

Lead author: Dimbihery Rabenoro.

In this example, we will show how to test if a given orthogonal matrix is a matrix of eigenvectors of the population covariance matrix. If the test passes, then we provide the associated flag.

In [1]:
from scipy.stats import chi2
from matplotlib import pyplot as plt

import geomstats.backend as gs

from geomstats.geometry.special_orthogonal import SpecialOrthogonal
from geomstats.geometry.matrices import Matrices

from geomstats.learning.flag_estimator import (
    FlagEstimator,
    test_H0,
)

INFO: Using numpy backend


## Dataset generation

We start by generating controlled data.

First, we define the multiplicities for the population covariance matrix (notice that in the general case, this matrix is unknown, but for demonstrative purposes we generate our data from a known distribution).

Then, we randomly create eigenvectors and reverse sort them. 

In [2]:
multiplicities = [2, 3, 1]
d = sum(multiplicities)
n_eig = len(multiplicities)

unique_eig = gs.flip(gs.sort(gs.random.rand(n_eig)))

unique_eig

array([0.96744676, 0.89342961, 0.62453925])

In [3]:
indices = []
for i, mult in enumerate(multiplicities):
    indices.extend([i]*mult)

eig = gs.take(unique_eig, indices, axis=0)

Having the (repeated) eigenvectors, we can now create a diagonal matrix where the diagonal elements correspond to them.

In [4]:
D = gs.array_from_sparse(
    data=eig,
    indices=[(i, i ) for i in range(d)],
    target_shape=(d, d)
)

We sample an orthogonal matrix and build the corresponding covariance matrix.

In [5]:
so = SpecialOrthogonal(d)

P = so.random_point()

mean = gs.zeros(d)
Sigma = Matrices.mul(P, D, gs.transpose(P))

Finally, we generate the dataset.

In [6]:
n_samples = 1000
X = gs.random.multivariate_normal(mean=mean, cov=Sigma, 
                                  size=n_samples)

## Statistical test

Now that we have some data, we can start an estimator by passing the multiplicities.

In [7]:
estimator = FlagEstimator(multiplicities)

The statistical test, requires a `Q_0`. Since our data is generated, we know the true matrix of eigenvectors (`P`), so we expect the test to pass 95% of the time (controlled by `alpha`).

**Note**: for a given dataset generated from an unknown distribution, `Q_0` needs to be provided by the user (e.g. it can be the output of a learning method).

In [8]:
Q_0 = P

We can perform the statistical test by using `test_H0`. If True, then `Q_0` is a matrix of eigenvectors of the population covariance matrix.

In [9]:
res, (norm, quantile) = test_H0(estimator, X, Q_0, alpha=0.05)

print(f"Test result: {res}")

Test result: True


If the test passes, then we can provide the associated flag.

In [10]:
F = estimator.get_flag(Q_0)