### Robust Gaussian Covariance Matrix Estimation

Given $m$ observations $a_1,..., a_m$ of a zero-mean random $n$ - vector $A = (A_1, ..., A_n)$, we would like to find the maximum likelihood inverse covariance matrix $\Sigma^{-1}$. Taking the log-likelihood of the data, we get:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && - \log \det \Sigma^{-1} + \sum_{i = 1}^m \text{tr}(\Sigma^{-1} a_i a_i^T) \\
  \end{aligned}
\end{equation*}

We make a slight modification to this problem: instead of taking all of the data, we would like to minimize the sum of the $k$ largest values of 

\begin{equation}
-\log \det \Sigma^{-1} + \sum_{i = 1}^m \text{tr}(\Sigma^{-1} a_i a_i^T)
\end{equation}

This corresponds to selecting the $k$ "worst-case" observations $a_i$ and minimizing those. Let $R = \Sigma^{-1}$. Letting $z$ be the vector such that:

\begin{equation}
z_i  = -\log \det R + \text{tr}(R a_i a_i^T)
\end{equation}

We get the problem:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \sum_{j = 1}^3 z_{[j]} \\
  \end{aligned}
\end{equation*}

where $z_{[1]}$ is the largest element of a vector, $z_{[2]}$ is the second largest, and so on. This is a convex optimization problem with variable $R$ and constants $a_i$.

It is equivalently formulated in epigraph form as:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && t + t_{det}\mathbb{1} \\
    &\text{subject to} &&  -\log \det R \leq t_{det} \\
                   &   &&  t_i = \text{tr}\, (R a_i a_i^T) &&& i = 1,..., m \\
  \end{aligned}
\end{equation*}

where $t \in \mathbb{R}^n$, and $t_{det} \in \mathbb{R}$ are the variables and $a_i$ are the constants.




In [6]:
import cvxpy as cp
import numpy as np
import scipy as sp

# Variable declarations

np.random.seed(0)
m = 11 # Number of observations of each random variable
n = 5 # Number of random variables
k = 3 # Needs to be less than m. 
A = np.matrix(np.random.rand(m,n))
A -= np.mean(A, axis=0)
K = np.array([(A[i].T*A[i]).flatten() for i in range(m)])


# Problem construction
problems = []
opt_vals = []

# Problem 1 (Epigraph formulation)
sigma_inv1 = cp.Variable(n,n) # Inverse covariance matrix
t = cp.Variable(m)
tdet = cp.Variable(1)

f = cp.sum_largest(t+tdet, k)
z = K*cp.reshape(sigma_inv1, n*n, 1)
C = [-cp.log_det(sigma_inv1) <= tdet, t == z]
problems.append(cp.Problem(cp.Minimize(f), C))
opt_vals.append(None)

# Problem 2 (Equivalent unconstrained formulation)
sigma_inv2 = cp.Variable(n, n) # Inverse covariance matrix
obs = cp.vstack([-cp.log_det(sigma_inv2) + cp.trace(A[i].T*A[i]*sigma_inv2) for i in range(m)])
f2 = cp.sum_largest(obs, k)
problems.append(cp.Problem(cp.Minimize(f2)))
opt_vals.append(None)

# For debugging individual problems:
if __name__ == "__main__":
    for prob, opt_val in zip(problems, opt_vals):
        prob.solve(solver="SCS", eps=1e-5)
        print("status: {}".format(prob.status))
        print("optimal value: {}".format(prob.value))
        print("true optimal value: {}".format(opt_val))



status: optimal_inaccurate
optimal value: -24.8849355499
true optimal value: None
status: optimal_inaccurate
optimal value: -24.8035951728
true optimal value: None
