# Performance analysis
This notebook is create to compare the performance of different algorithms for graphical inference, namely graphical lasso (GL or GLASSO), latent graph lasso (LVGLASSO), time-varying graphical lasso (TVGL) with our method, latent variable time-varying graphical lasso (LVGL).

## Before proceeding
Other methods are not necessarily implemented in Python. Therefore, this is a list of required steps in order to successfully install the code.

### Install instructions for
* GL (scikit-learn implementation)
For this, it is not required to do anything, as `sklearn` should be already installed in your system as it is a dependency of `regain`. Otherwise, install it as first thing with 
```
conda install scikit-learn
```
or 
```
pip install scikit-learn
```

* GLASSO (R implementation)
This is the R implementation for graphical lasso, called GLASSO. It is available as an R package, thus requiring R installed in your system. Then, in R console (simply call `R` from a command line):
```R
install.packages("glasso")
```
Refer to [GLASSO documentation](https://cran.r-project.org/web/packages/glasso/glasso.pdf) for further information.

* LVGLASSO (Matlab implementation)
This requires to have either `oct2py` or [Matlab installed](https://it.mathworks.com/help/install/ug/install-mathworks-software.html) (version2016b or higher) and [Matlab engine for Python](https://it.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html).
Then, [download the code](https://www.math.ucdavis.edu/~sqma/ADMM-LVGLasso) and unpack the folder.
Ensure to have a file called `ADMM_B.m`. Save the location of the code, as it will be needed to call the `ADMM_B.m` script.

**NOTE 1:** There is a so-called [LVGLASSO in the R packages](https://www.rdocumentation.org/packages/lvnet/versions/0.3.2/topics/lvglasso). Note that THIS IS NOT RIGHT, as it is implemented in a different way and it requires to specify a priori the number of latent variables. See the link above for further details.

**NOTE 2:** Our `regain` package has a Python wrapper that ease the calling of such Matlab functions. Therefore, conversions of numpy arrays to Matlab matrices are done under the hood from the script `regain/wrappers/lvglasso/LVGLASSO.m`.

* TVGL (Python implementation)
Since it is a Python implementation, this does not require additional software (beside having `git` installed). However, there is a little modification in the source code to do in order to obtain additional results, such as the number of iterations and the estimated covariance matrices.

1. Clone the repo (https://github.com/davidhallac/TVGL) in a folder, with
```bash
git clone https://github.com/davidhallac/TVGL.git
```
Its requirements are [`cvxpy`](http://www.cvxpy.org/en/latest/install/index.html) and [`snap`](https://snap.stanford.edu/snappy/) installed. 

2. Modify the line 76 of ./TVGL/TVGL.py (ie, `return thetaSet`) with 
```
return thetaSet, empCovSet, gvx.status, gvx
```
3. Add after the line 454 of ./TVGL/inferGraphL2.py (and other norms if required)
```
self.n_iter_ = num_iterations
```

* REGAIN
Of course, first of all you should download and install the `regain` package (our method). If you haven't done it yet, do it now!

```
conda install -c fdtomasi regain
```
or
```
pip install regain
```

Or, you have the source code, `mv` to the `regain` folder, then
```
python setup.py install
```
or
```
pip install -e .
```

and that's it. Now you are good to go!

In [None]:
from __future__ import print_function, division

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from itertools import product
from sklearn.utils.extmath import squared_norm
from sklearn.covariance import GraphLasso, empirical_covariance
from sklearn.datasets.base import Bunch
from sklearn.model_selection import GridSearchCV, ShuffleSplit

from regain import datasets
from regain import prox
from regain import utils
import time

# Performances of the different algorithms

In [None]:
TVGL_path = '~/src/TVGL'
from performance_utils import *

In [None]:
# setting 1
alpha = 0.45  #0.0025
tau = 3
beta = 50  # 1000
eta = 10

n_samples = 100
n_dim_lat = 20
T = 10
n_dim_obs = 100

k = (n_dim_obs, T)

np.random.seed(20)

# mode = 'norm'
data = {
    (dim, T): datasets.make_dataset(
        #     mode=mode,
        update_theta='l2',
        update_ell='l2',
        normalize_starting_matrices=True,
        n_samples=n_samples,
        n_dim_lat=n_dim_lat,
        n_dim_obs=dim,
        T=T,
        epsilon=1e-1,
        proportional=True,
        degree=2,
        keep_sparsity=True)
    for dim in [n_dim_obs]
}

In [None]:
# setting 2
alpha = .43
tau = 1.9
beta = 1
eta = 2

n_samples = 100  # 500
n_dim_lat = 5
T = 100
n_dim_obs = 50

k = (n_dim_obs, T)

np.random.seed(20)
# data = {(dim, T) : datasets.generate_dataset(
#     mode='fixed', n_samples=n_samples, n_dim_lat=n_dim_lat, n_dim_obs=dim,  T=T, epsilon=1e-3, degree=3)
#     for dim in n_dims}
data = {
    (dim, T): datasets.make_dataset(
        update_ell='l1', update_theta='l1',
        n_samples=n_samples, n_dim_lat=n_dim_lat, n_dim_obs=dim,
        T=T, epsilon=1e-1, proportional=False, degree=2, keep_sparsity=True)
    for dim in [n_dim_obs]
}

In [None]:
K = data[k].thetas

print([(i != 0).sum() for i in K])

In [None]:
(data[k].thetas == 0).sum() / (n_dim_obs**2 * T)

In [None]:
print(
    [
        np.linalg.norm(data[k].thetas[i] - data[k].thetas[i + 1])
        for i in range(T - 1)
    ])
print(
    [
        np.linalg.norm(data[k].ells[i] - data[k].ells[i + 1])
        for i in range(T - 1)
    ])

In [None]:
# prepare dataframe for results
n_dims = [n_dim_obs]
n_times = [T]
methods = [
    'LTGL ($\ell_2^2$)', 'LTGL ($\ell_1$)', 'GL', 'LVGLASSO',
    'TVGL ($\ell_2^2$)', 'TVGL ($\ell_1$)'
]
scores = sorted(
    [
        "MSE_precision", "MSE_observed", "MSE_latent", 'estimator',
        "mean_rank_error", 'time', 'iterations', 'precision', 'recall',
        'accuracy', 'balanced_accuracy', 'f1', 'npv', 'prevalence',
        'miss_rate', 'likelihood', 'specificity', 'plr', 'nlr'
    ])

cols = pd.MultiIndex.from_product([scores, n_dims], names=('score', 'dim'))
rows = pd.MultiIndex.from_product([methods, n_times], names=('method', 'time'))

dff = pd.DataFrame(columns=cols, index=rows)
idx = pd.IndexSlice

In [None]:
# setting 1
alpha = 0.361  #289 #0.0025
tau = 1.12
beta = 5e2
eta = 5
alpha_chandri_setting_1 = 0.29

# # setting 2
# alpha = .43 #0.0025
# tau = 1.99
# beta = 2
# eta = 20
# alpha_gl_setting_2 = .35

for i, (k, res) in enumerate(sorted(data.items())[:5]):
    dim = k[0]
    print("Start with: dim=%d, T=%d (it %d)" % (k[0], k[1], i))
    data_list = res.data
    K = res.thetas
    K_obs = res.thetas_observed
    ells = res.ells
    data_grid = np.array(data_list).transpose(
        1, 2, 0)  # to use it later for grid search

    print("starting LTGL l1...\r", end='')
    res_l = ltgl_results(
        res.X, res.y, K, K_obs, ells, alpha=alpha, beta=beta, verbose=0,
        max_iter=1000, tau=tau, eta=eta, psi='l1', phi='laplacian', tol=1e-5,
        rtol=1e-5)
    dff.loc[idx['LTGL ($\ell_1$)', k[1]], idx[:, k[0]]] = [
        res_l[x] for x in scores
    ]

    print("starting LTGL l2...\r", end='')
    res_l = ltgl_results(
        res.X, res.y, K, K_obs, ells, alpha=alpha, beta=beta, tau=tau, eta=eta,
        psi='laplacian', phi='laplacian', tol=1e-5, rtol=1e-5)
    dff.loc[idx['LTGL ($\ell_2^2$)', k[1]], idx[:, k[0]]] = [
        res_l[x] for x in scores
    ]

    print("starting GL ...\r", end='')
    try:
        res = glasso_results(data_grid, K, K_obs, ells, alpha=alpha)
        #         res = glasso_results(data_grid, K, K_obs, ells, alpha=alpha_gl_setting_2)

        # res = friedman_results(data_grid, K, K_obs, ells, alpha=alpha)
        dff.loc[idx['GL', k[1]], idx[:, k[0]]] = [res[x] for x in scores]
    except Exception as e:
        print(e)
    print("starting LVGLASSO...\r", end='')
    #     res_c = chandresekeran_results(data_grid, K, K_obs, ells, tau=tau, alpha=alpha)
    res_c = chandresekeran_results(
        data_grid, K, K_obs, ells, tau=tau, alpha=alpha_chandri_setting_1)
    dff.loc[idx['LVGLASSO', k[1]], idx[:, k[0]]] = [res_c[x] for x in scores]

    print("starting TVGL L1...\r", end='')
    res = hallac_results(
        data_grid, K, K_obs, ells, beta=beta, alpha=alpha, penalty=1, tvgl_path=TVGL_path)
    dff.loc[idx['TVGL ($\ell_1$)', k[1]], idx[:, k[0]]] = [
        res[x] for x in scores
    ]

    print("starting TVGL L22...\r", end='')
    res = hallac_results(
        data_grid, K, K_obs, ells, beta=beta, alpha=alpha, penalty=3, tvgl_path=TVGL_path)
    dff.loc[idx['TVGL ($\ell_2^2$)', k[1]], idx[:, k[0]]] = [
        res[x] for x in scores
    ]

In [None]:
mm = dff.xs(n_dim_obs, level='dim', axis=1).xs(T, level='time')
mm

In [None]:
from decimal import Decimal
' & '.join(['%.3f' % Decimal(i) for i in mm['MSE_precision']])

In [None]:
dff[[s for s in scores if s != 'estimator']].to_pickle("dff_setting_1.pkl")

In [None]:
l1 = (
    [
        np.linalg.matrix_rank(r)
        for r in mm.estimator['LTGL ($\ell_2^2$)'].latent_
    ])
l2 = (
    [
        np.linalg.matrix_rank(r)
        for r in mm.estimator['LTGL ($\ell_1$)'].latent_
    ])
l3 = ([np.linalg.matrix_rank(r) for r in mm.estimator['LVGLASSO'].L])

l4 = (
    [
        np.linalg.matrix_rank(r)
        for r in mm.estimator['LTGL ($\ell_2^2$)'].latent_
    ])
l5 = (
    [
        np.linalg.matrix_rank(r)
        for r in mm.estimator['LTGL ($\ell_1$)'].latent_
    ])
l6 = ([np.linalg.matrix_rank(r) for r in mm.estimator['LVGLASSO'].L])

In [None]:
l1, l2, l3, l4, l5, l6 = utils.load_pickle(filename="ells.pkl")

In [None]:
import collections
import matplotlib.pyplot as plt

f, (ax1, ax2) = plt.subplots(2, 1, sharey=False, figsize=(10, 5), dpi=600)

colors = ['white', 'lightblue', 'C7']
alpha = 0.95

counter = collections.Counter(l1)
ax1.bar(
    counter.keys(),
    np.array(counter.values()) / len(l1), alpha=alpha, width=0.24,
    label='LTGL ($\ell_2^2$)', color=colors[0], edgecolor='k')
counter = collections.Counter(l2)
ax1.bar(
    np.array(counter.keys()) + 0.25,
    np.array(counter.values()) / len(l1), alpha=alpha, width=0.24,
    label='LTGL ($\ell_1$)', color=colors[1], edgecolor='k')
counter = collections.Counter(l3)
ax1.bar(
    np.array(counter.keys()) - 0.25,
    np.array(counter.values()) / len(l1), alpha=alpha, width=0.24,
    label='LVGLASSO', color=colors[2], edgecolor='k')

ax1.set_xticks(range(0, 30, 2))
#ax1.set_ylim(0,5)
ax1.axvline(20, c='r', ls='--')
ax1.set_xlabel(r'ranks of L obtained with ($p_2$)')
ax1.set_ylabel('frequency')
# ax1.set_xscale("log")
# ax1.set_xlim([10, 100])
ax1.xaxis.label.set_size(15)
ax1.yaxis.label.set_size(15)

#ax1.legend()
# ax0.legend(prop={'size': 10})
# ax0.set_title('bars with legend')

counter = collections.Counter(l4)
ax2.bar(
    counter.keys(),
    np.array(counter.values()) / len(l4), alpha=alpha, width=0.24,
    label='LTGL ($\ell_2^2$)', color=colors[0], edgecolor='k')
counter = collections.Counter(l5)
ax2.bar(
    np.array(counter.keys()) + 0.25,
    np.array(counter.values()) / len(l4), alpha=alpha, width=0.24,
    label='LTGL ($\ell_1$)', color=colors[1], edgecolor='k')
counter = collections.Counter(l6)
ax2.bar(
    np.array(counter.keys()) - 0.25,
    np.array(counter.values()) / len(l4), alpha=alpha, width=0.24,
    label='LVGLASSO', color=colors[2], edgecolor='k')

ax2.set_xticks(range(0, 30, 2))
# ax2.set_xlim(2.5,6.7)
ax2.set_xlabel(r'ranks of L obtained with ($p_1$)')
ax2.set_ylabel('frequency')
ax2.xaxis.label.set_size(15)
ax2.yaxis.label.set_size(15)
ax2.axvline(5, c='r', ls='--')
ax1.legend(loc='upper left', fontsize='x-large')
plt.tight_layout()
plt.show()

In [None]:
f.savefig(
    "ranks_distribution_vertical.pdf", dpi=600, transparent=True,
    bbox_inches='tight')

In [None]:
import collections
import matplotlib.pyplot as plt

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(10, 2.6), dpi=600)

colors = ['white', 'lightblue', 'C7']
alpha = 0.5

counter = collections.Counter(l1)
ax1.plot(
    range(len(l1)), l1, alpha=alpha, label='LTGL ($\ell_2^2$)',
    color=colors[0])
counter = collections.Counter(l2)
ax1.plot(
    np.arange(len(l1)) + .2, l2, alpha=alpha, label='LTGL ($\ell_1$)',
    color=colors[1])
counter = collections.Counter(l3)
ax1.plot(
    np.arange(len(l1)) + .4, l3, alpha=alpha, label='LVGLASSO',
    color=colors[2])

# ax1.set_xticks(range(15,25, 1))
#ax1.set_ylim(0,5)
ax1.axhline(20, c='r', ls='--')
ax1.set_xlabel(r'ranks of L obtained with ($p_2$)')
ax1.set_ylabel('frequency')
ax1.xaxis.label.set_size(15)
ax1.yaxis.label.set_size(15)

#ax1.legend()
# ax0.legend(prop={'size': 10})
# ax0.set_title('bars with legend')

counter = collections.Counter(l4)
ax2.plot(
    range(len(l4)), l4, alpha=alpha, label='LTGL ($\ell_2^2$)',
    color=colors[0])
counter = collections.Counter(l5)
ax2.plot(
    np.arange(len(l4)) + .2, l5, alpha=alpha, label='LTGL ($\ell_1$)',
    color=colors[1])
counter = collections.Counter(l6)
ax2.plot(
    np.arange(len(l4)) + .4, l6, alpha=alpha, label='LVGLASSO',
    color=colors[2])

# ax2.set_xticks(range(10))
# ax2.set_xlim(2.5,6.7)
ax2.set_xlabel(r'ranks of L obtained with ($p_1$)')
ax2.xaxis.label.set_size(15)
ax2.axhline(5, c='r', ls='--')
ax1.legend(loc='best', fontsize='large')
plt.tight_layout()
plt.show()