In [1]:
import paragami

import autograd
from autograd import numpy as np
from autograd import scipy as sp

We will begin by considering a simple symmetric positive-definite matrix.  In general, we can write:

$$
A = \left[
\begin{matrix}
a_{11} & a_{12} & a_{13}  \\
a_{21} & a_{22} & a_{23}  \\
a_{31} & a_{32} & a_{33}  \\
\end{matrix}
\right]
$$

...although, of course, symmetry and positive-definiteness impose constraints on the entries $a_{ij}$ of $A$ which we will discuss below.

Suppose we are interested in optimizing some function of $A$, say, a sample covariance with a regularization penalty on the log determinant (denoted $\log |A|$).

Let the data be $X = \left(x_1, ..., x_N\right)$, where $x_n \in \mathbb{R}^3$, and write a loss function as

$$
\ell\left(X, A\right) = \sum_{n=1}^N \left(\frac{1}{2}x_n^T A x_n - \lambda \log |A| \right).
$$

Let's simulate some data that is appropriate for this loss function.

In [2]:
N = 1000

# True value of A
true_a = np.eye(3) * np.diag(np.array([1, 2, 3])) + np.random.random((3, 3)) * 0.1
true_a = 0.5 * (true_a + true_a.T)

# Data
x = np.random.multivariate_normal(mean=np.zeros(3), cov=true_a, size=(N, ))
print(x.shape)

# Regularization constant
lam = 1.0

def loss(x, lam, a):
    a_det_sign, a_log_det = np.linalg.slogdet(a)
    assert a_det_sign > 0
    return 0.5 * np.einsum('ni,ij,nj', x, a, x) - lam * a_log_det

print(loss(x, lam, true_a))

(1000, 3)
7063.652727102428


We would like to minimize the function `loss` using tools like `scipy.optimize.minimize`, but standard optimization functions take vectors, not matrices, as input, and typically require the vector to take valid values in the entire domain.

Let us first consider how to represent $A$ as a vector, and then as an unconstrained vector.

Because $A$ is symmetric, $a_{21} = a_{12}$, $a_{31} = a_{13}$, and $a_{23} = a_{32}$.  This means that to reproduce $A$ we only need to store the bold values:

$$
A = \left[
\begin{matrix}
\mathbf{a_{11}} & \mathbf{a_{12}} & \mathbf{a_{13}}  \\
a_{21} & \mathbf{a_{22}} & \mathbf{a_{23}}  \\
a_{31} & a_{32} & \mathbf{a_{33}}  \\
\end{matrix}
\right]
$$

If we wanted to represent $A$ as an flat array, we could write

$$
A_{flat} = \left[
\begin{matrix}
\mathbf{a_{11}} \\
\mathbf{a_{12}} \\
\mathbf{a_{22}} \\
\mathbf{a_{13}} \\
\mathbf{a_{23}} \\
\mathbf{a_{33}} \\
\end{matrix}
\right]
$$

and fully recover $A$ from $A_{flat}$.

Converting to and from $A$ and $A_{flat}$ can be done with the `flatten` method of a `paragami.PSDMatrixPattern` pattern.  

For the moment, because we are simply re-stacking the values of $A$ into a vector, not transforming into an unconstrained space, we use the option `free=False`.  We will discuss the `free=True` option shortly.

In [3]:
# A sample positive definite matrix.
a = np.eye(3) + np.random.random((3, 3))
a = 0.5 * (a + a.T)

# Define a pattern and fold.
a_pattern = paragami.PSDMatrixPattern(size=3)
a_flat = a_pattern.flatten(a, free=False)

print('Now, a_flat contains the elements of a exactly as shown in the formula above.\n')
print('a:\n{}\n'.format(a))
print('a_flat:\n{}\n'.format(a_flat))

Now, a_flat contains the elements of a exactly as shown in the formula above.

a:
[[1.15853698 0.88840844 0.49051765]
 [0.88840844 1.34573747 0.34190524]
 [0.49051765 0.34190524 1.83957984]]

a_flat:
[1.15853698 0.88840844 1.34573747 0.49051765 0.34190524 1.83957984]



We can also convert from $A_{flat}$ back to $A$ by 'folding'.

In [4]:
print('Folding the flattened value recovers the original matrix.\n')
a_fold = a_pattern.fold(a_flat, free=False)
print('a:\n{}\n'.format(a))
print('a_fold:\n{}\n'.format(a_flat))

Folding the flattened value recovers the original matrix.

a:
[[1.15853698 0.88840844 0.49051765]
 [0.88840844 1.34573747 0.34190524]
 [0.49051765 0.34190524 1.83957984]]

a_fold:
[1.15853698 0.88840844 1.34573747 0.49051765 0.34190524 1.83957984]



Not every length-six vector represents a valide positive definite matrix.  If the attribute `validate` of a `paragami.PSDMatrixPattern` is true, then folding automatically checks for validity.

In [27]:
print('The diagonal of a positive definite matrix must be greater',
      'than 0, and folding checks this when validate=True, which it is by default:\n')
a_flat_bad = np.array([-1, 0, 0, 0, 0, 0])
print('A bad folded value: {}'.format(a_flat_bad))
try:
    a_fold_bad = a_pattern.fold(a_flat_bad, free=False)
except ValueError as err:
    print('Folding with a_pattern raised the following ValueError:\n{}'.format(err))

print('\nIf validate is false, folding will produce an invalid matrix without an error:\n')
a_pattern.validate = False
a_fold_bad = a_pattern.fold(a_flat_bad, free=False)
print('Folding a non-pd matrix with validate=False:\n{}'.format(a_fold_bad))

print('\nHowever, it will not produce a matrix of the wrong shape even when validate is False:\n')
a_flat_very_bad = np.array([1, 0, 0])
print('A very bad folded value: {}.'.format(a_flat_very_bad))
try:
    a_fold_very_bad = a_pattern.fold(a_flat_very_bad, free=False)
except ValueError as err:
    print('Folding with a_pattern raised the following ValueError:\n{}'.format(err))

# Let's set validate back to true.
a_pattern.validate = True

The diagonal of a positive definite matrix must be greater than 0, and folding checks this when validate=True, which it is by default:

A bad folded value: [-1  0  0  0  0  0]
Folding with a_pattern raised the following ValueError:
Diagonal is less than the lower bound 0.0.

If validate is false, folding will produce an invalid matrix without an error:

Folding a non-pd matrix with validate=False:
[[-1.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]

However, it will not produce a matrix of the wrong shape even when validate is False:

A very bad folded value: [1 0 0].
Folding with a_pattern raised the following ValueError:
Wrong length for PDMatrix flat value.


Sometimes (e.g. when performing optimization) it is convenient to have a flattened value that can legally take any value of the correct length.  For a positive definite matrix, this can be achieved by a Cholesky decomposition.  The details aren't important here, as `paragami` takes care of the transformation behind the scenes with the option `free=True`. 

In [43]:
print('The free flat value a_freeflat is not immediately recognizable as a.\n')
a_freeflat = a_pattern.flatten(a, free=True)
print('a:\n{}\n'.format(a))
print('a_freeflat:\n{}\n'.format(a_freeflat))

print('However, it transforms correctly back to a when folded.\n')
a_freefold = a_pattern.fold(a_freeflat, free=True)
print('a_fold:\n{}\n'.format(a_freefold))

print('Any length-six vector will free fold back to a valid positive ',
      'semi-definite matrix up to floating point error.\n')
# Draw random free vectors and confirm that they are positive semi definite.
def assert_is_pd(mat):
    eigvals = np.linalg.eigvals(mat)
    assert np.min(eigvals) >= -1e-8
for raw in range(100):
    a_rand_freeflat = np.random.normal(scale=2, size=(6, ))
    a_rand_fold = a_pattern.fold(a_rand_freeflat, free=True)
    assert_is_pd(a_rand_fold)


print('Note that you will get an incorrect values or errors if you mix',
      'free and non-free folding and flattening!\n')
try:
    a_fold_bad = a_pattern.fold(a_freeflat, free=False)
except ValueError as err:
    print('Folding with a_pattern raised the following ValueError:\n{}'.format(err))



The free flat value a_freeflat is not immediately recognizable as a.

a:
[[1.15853698 0.88840844 0.49051765]
 [0.88840844 1.34573747 0.34190524]
 [0.49051765 0.34190524 1.83957984]]

a_freeflat:
[ 0.07357899  0.82538719 -0.20438018  0.45572168 -0.04200638  0.24433082]

However, it transforms correctly back to a when folded.

a_fold:
[[1.15853698 0.88840844 0.49051765]
 [0.88840844 1.34573747 0.34190524]
 [0.49051765 0.34190524 1.83957984]]

Any length-six vector will free fold back to a valid positive  semi-definite matrix up to floating point error.



AssertionError: 