In [1]:
import paragami

import autograd
from autograd import numpy as np
from autograd import scipy as sp

We will begin by considering a simple symmetric positive-definite matrix.  In general, we can write:

$$
A = \left[
\begin{matrix}
a_{11} & a_{12} & a_{13}  \\
a_{21} & a_{22} & a_{23}  \\
a_{31} & a_{32} & a_{33}  \\
\end{matrix}
\right]
$$

...although, of course, symmetry and positive-definiteness impose constraints on the entries $a_{ij}$ of $A$ which we will discuss below.

Suppose we are interested in optimizing some function of $A$, say, a sample covariance with a regularization penalty on the log determinant (denoted $\log |A|$).

Let the data be $X = \left(x_1, ..., x_N\right)$, where $x_n \in \mathbb{R}^3$, and write a loss function as

$$
\ell\left(X, A\right) = \sum_{n=1}^N \left(\frac{1}{2}x_n^T A x_n - \lambda \log |A| \right).
$$

Let's simulate some data that is appropriate for this loss function.

In [18]:
N = 1000

# True value of A
true_a = np.eye(3) * np.diag(np.array([1, 2, 3])) + np.random.random((3, 3)) * 0.1
true_a = 0.5 * (true_a + true_a.T)

# Data
x = np.random.multivariate_normal(mean=np.zeros(3), cov=true_a, size=(N, ))
print(x.shape)

# Regularization constant
lam = 1.0

def loss(x, lam, a):
    a_det_sign, a_log_det = np.linalg.slogdet(a)
    assert a_det_sign > 0
    return 0.5 * np.einsum('ni,ij,nj', x, a, x) - lam * a_log_det

print(loss(x, lam, true_a))

(1000, 3)
7460.3271614308505


We would like to minimize the function `loss` using tools like `scipy.optimize.minimize`, but standard optimization functions take vectors, not matrices, as input, and typically require the vector to take valid values in the entire domain.

Let us first consider how to represent $A$ as a vector, and then as an unconstrained vector.

Because $A$ is symmetric, $a_{21} = a_{12}$, $a_{31} = a_{13}$, and $a_{23} = a_{32}$.  This means that to reproduce $A$ we only need to store the bold values:

$$
A = \left[
\begin{matrix}
\mathbf{a_{11}} & \mathbf{a_{12}} & \mathbf{a_{13}}  \\
a_{21} & \mathbf{a_{22}} & \mathbf{a_{23}}  \\
a_{31} & a_{32} & \mathbf{a_{33}}  \\
\end{matrix}
\right]
$$

If we wanted to represent $A$ as an flat array, we could write

$$
A_{flat} = \left[
\begin{matrix}
\mathbf{a_{11}} \\
\mathbf{a_{12}} \\
\mathbf{a_{22}} \\
\mathbf{a_{13}} \\
\mathbf{a_{23}} \\
\mathbf{a_{33}} \\
\end{matrix}
\right]
$$

and fully recover $A$ from $A_{flat}$.

Converting to and from $A$ and $A_{flat}$ can be done with a `paragami.PDMatrixPattern` pattern.  Because we are simply re-stacking the values of $A$ into a vector, not transforming into an unconstrained space, we use `free=False`.

Note that `a_flat` contains the elements in precisely the same order as $A_{flat}$.

In [23]:
a = np.eye(3) + np.random.random((3, 3))
a = 0.5 * (a + a.T)

a_pattern = paragami.PDMatrixPattern(size=3)
a_flat = a_pattern.flatten(a, free=False)
print('a: {}'.format(a))
print('a_flat: {}'.format(a_flat))

a: [[1.4473274  0.65013957 0.0527136 ]
 [0.65013957 1.12575218 0.650802  ]
 [0.0527136  0.650802   1.81343529]]
a_flat: [1.4473274  0.65013957 1.12575218 0.0527136  0.650802   1.81343529]


 If we want to modify our function `loss` to take `a_flat` as an input rather than `a`, we can use `paragami.FlattenFunction`.

In [None]:
loss_flat = paragami.FlattenedFunction()