# MSc DS: Dealing with full convariance Gaussians with TensorFlow

In [1]:
import tensorflow as tf
import numpy as np
import scipy.stats
import scipy.io
import scipy.sparse
from scipy.io import loadmat
import pandas as pd
import tensorflow_probability as tfp
tfd = tfp.distributions
tfk = tf.keras
tfkl = tf.keras.layers
from PIL import Image
import matplotlib.pyplot as plt

We load the Iris data set.

In [2]:
from sklearn.datasets import load_iris
data = load_iris(True)[0]

We now standardise the data:

In [3]:
xfull = ((data - np.mean(data,0))/np.std(data,0)).astype(np.float32)
n = xfull.shape[0] # number of observations
p = xfull.shape[1] # number of feat*ures

We want to learn a Gaussian distribution:
$$p(x) = \mathcal{N}(x|\mu,\Sigma), $$
where $\Sigma$ is a not a diagonal matrix, using maximum likelihood.

We want to use stochastic gradient techniques to learn $\mu$ and $\Sigma$. The issue is that SGD works best on unconstrained Euclidean spaces like $\mathbb{R}^K$, and $\Sigma$ lives in a constrained space (the space of positive definite matrices). We need to **reparametrise the model with unconstrained parameters.** 

A solution is provided by something called the **[Cholesky decomposition.](https://en.wikipedia.org/wiki/Cholesky_decomposition)** Any positive-definite matrix $\Sigma$ can be uniquely written as a product
$$\Sigma = L L^T,$$
where $L$ is a **lower-triangular matrix with strictly positive diagonal entries**.

The issue is that there is still a constraint on $L$. Namely, the diagonal has to be strictly positive. A simple way to enforce that is to define another matrix $C$ such that
$$\forall i \neq j, \; c_{ij} = l_{ij}$$
$$\forall i , \; c_{ii} = \log(l_{ii}),$$
which is just applying a log to the diagonal of $L$.


Since the log function sends $]0,\infty[$ to the whole real line, **$C$ will be an unconstrained matrix lower-triangular matrix.** So, we will use $C$ as a parameter to model the covariance, rather than $\Sigma$.

Triangular matrices in TF can be created this way:

In [4]:
dim_triangular_matrix = int(p*(p+1)/2)
tfp.math.fill_triangular(tf.random.normal([dim_triangular_matrix]))

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[-0.99635327,  0.        ,  0.        ,  0.        ],
       [-0.7582339 , -0.6961709 ,  0.        ,  0.        ],
       [ 0.15100579, -1.0348179 ,  0.94574684,  0.        ],
       [-1.1293083 ,  1.5236211 , -0.64511794, -0.24113938]],
      dtype=float32)>

We can use that to finally create our variables:

In [5]:
mu = tf.Variable(tf.random.normal([p]), dtype=tf.float32)
C = tf.Variable(tfp.math.fill_triangular(tf.random.normal([dim_triangular_matrix])), dtype=tf.float32) # log-sd of the Gaussian

Now, how do we compute $\Sigma$ using $C$?

We first use define a function that can compute the likelihood of a complete data point.

In [None]:
@tf.function
def log_likelihood(x):


Now we perform SGD!

In [None]:
params = [mu] + [C]

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

In [None]:
def train_step(data):
  with tf.GradientTape() as tape: # the gradient tape saves all the step that needs to be saved fopr automatic differentiation
    loss = -log_likelihood(data)  # the loss is the average negative log likelihood
  gradients = tape.gradient(loss, params)  # here, the gradient is automatically computed
  optimizer.apply_gradients(zip(gradients, params))  # Adam iteration

In [None]:
train_data_complete = tf.data.Dataset.from_tensor_slices(xfull).shuffle(p).batch(1) 

In [None]:
EPOCHS = 601

for epoch in range(1,EPOCHS+1):
  for data in train_data_complete:
    train_step(data) # Adam iteration
  if (epoch % 100) == 1:
    ll_train = tf.reduce_mean(log_likelihood(xfull))
    print('Epoch  %g' %epoch)
    print('Training log-likelihood %g' %ll_train.numpy())
    print('-----------')

Epoch  1
Training log-likelihood -11.4537
Mean
-----------
Epoch  101
Training log-likelihood -3.29557
Mean
-----------
Epoch  201
Training log-likelihood -3.29002
Mean
-----------
Epoch  301
Training log-likelihood -3.28909
Mean
-----------
Epoch  401
Training log-likelihood -3.29002
Mean
-----------
Epoch  501
Training log-likelihood -3.28925
Mean
-----------
Epoch  601
Training log-likelihood -3.29218
Mean
-----------


And now on incomplete data.

In [None]:
  L = C
  L = tf.linalg.set_diag(L,tf.exp(tf.linalg.diag_part(C)))
  Sigma = tf.matmul(L,L, transpose_b=True)
  print(Sigma)

tf.Tensor(
[[ 1.0228177  -0.11411762  0.8826632   0.8174147 ]
 [-0.11411762  0.9998517  -0.423512   -0.3605057 ]
 [ 0.8826632  -0.423512    1.0056474   0.9578774 ]
 [ 0.8174147  -0.3605057   0.9578774   1.0072082 ]], shape=(4, 4), dtype=float32)


In [None]:
 np.cov(xfull, rowvar=False)

array([[ 1.00671141, -0.11835884,  0.87760447,  0.82343068],
       [-0.11835884,  1.0067114 , -0.43131554, -0.36858316],
       [ 0.87760447, -0.43131554,  1.0067114 ,  0.96932763],
       [ 0.82343068, -0.36858316,  0.96932763,  1.00671144]])