Model Selection via BIC and EBIC #13

mnarayan · 2016-07-11T01:09:16Z

Given sparse regularized inverse covariance estimates over a grid of regularization parameters, a popular criteria to choose the optimal penalty and corresponding estimate is to apply the Bayesian (or Swartz) Information Criterion or the BIC and the Extended BIC (EBIC) in high dimensional regimes.

The BIC criterion is defined as
BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam))

The EBIC criterion is defined as
EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam)) + 4 * (# of non-zeros in Theta(lam)) * (log p) * gam

Here

n is n_samples and p is n_features
Sigma is the sample covariance/correlation in self.covariance_ and Theta(lam) comes from self.precision_
gam is an additional parameter for EBIC that takes values in [0,1]. I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.
lam is an element in the grad of lambda penalty parameters.

The goal is to implement model selection using the above criteria as an alternative to cross-validation.

References:
BIC in sparse inverse covariance estimation
EBIC in sparse inverse covariance estimation

The text was updated successfully, but these errors were encountered:

mnarayan · 2016-07-11T01:15:54Z

There is another sample splitting based model selection approach that is an alternative to cross-validation, which I will write up as well. These BIC approaches, StARS and cross-validation together pretty exhaustively cover all model selection approaches that try to pick out one penalty parameter from a grid of penalty parameters.

There is another school of model averaging, where you don't try to pick out a single optimal lambda at all, but instead try to combine multiple models. We will cover this in the weighted graphical lasso issue (#8)

jasonlaska · 2016-07-11T01:46:45Z

Ok, will see if I can finish the other branch and start this this week.

On Sun, Jul 10, 2016 at 6:15 PM mnarayan notifications@github.com wrote:

There is another sample splitting based model selection approach that is
an alternative to cross-validation, which I will write up as well. These
BIC approaches, StARS and cross-validation together pretty exhaustively
cover all model selection approaches that try to pick out one penalty
parameter from a grid of penalty parameters.

There is another school of model averaging, where you don't try to pick
out a single optimal lambda at all, but instead try to combine multiple
models. We will cover this in the weighted graphical lasso issue (#8
#8)

—
You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABk3vH1Jk26UVMyEof8lw9-zp0k8CyStks5qUZlKgaJpZM4JI9g9
.

jasonlaska · 2016-07-19T17:21:53Z

Wow this is awesome, so simple.

On Sun, Jul 10, 2016 at 6:09 PM mnarayan notifications@github.com wrote:

Given sparse regularized inverse covariance estimates over a grid of
regularization parameters, a popular criteria to choose the optimal penalty
and corresponding estimate is to apply the Bayesian (or Swartz) Information
Criterion or the BIC and the Extended BIC (EBIC) in high dimensional
regimes.

The BIC criterion is defined as
BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of
non-zeros in Theta(lam))

The EBIC criterion is defined as
EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of
non-zeros in Theta(lam)) +
4 * (# of non-zeros in Theta(lam)) * (log p) * gam

Here

n is n_samples and p is n_features

Sigma is the sample covariance/correlation in self.covariance_ and
Theta(lam) comes from self.precision_

gam is an additional parameter for EBIC that takes values in [0,1].
I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.

lam is an element in the grad of lambda penalty parameters.

The goal is to implement model selection using the above criteria as an
alternative to cross-validation.

References:
BIC in sparse inverse covariance estimation
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184.6058&rep=rep1&type=pdf
EBIC in sparse inverse covariance estimation
https://papers.nips.cc/paper/4087-extended-bayesian-information-criteria-for-gaussian-graphical-models

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#13, or mute the thread
https://github.com/notifications/unsubscribe/ABk3vOrpGBinxh1A-SCIF2d3ShdCm_Ayks5qUZe8gaJpZM4JI9g9
.

jasonlaska · 2016-07-20T03:29:29Z

So when I threshold the values in the precision

    precision_t = np.empty(precision.shape)
    precision_t[:] = precision
    precision_t[precision_t < 1e-1] = 0
    precision_nnz = np.count_nonzero(precision_t)

I start to get reasonable results with ebic, but without some thresholding, I find that all values in the resulting matrix are nonzero (clearly).

mnarayan · 2016-07-21T00:30:12Z

Are all the matrices in the path dense?

What is the largest lambda value and largest covariance value?

On Tue, Jul 19, 2016, 8:29 PM Jason Laska notifications@github.com wrote:

So when I threshold the values in the precision
precision_t = np.empty(precision.shape)
precision_t[:] = precision
precision_t[precision_t < 1e-1] = 0
precision_nnz = np.count_nonzero(precision_t)
I start to get reasonable results with ebic, but without some
thresholding, I find that all values in the resulting matrix are nonzero
(clearly).

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAov97iYZyt10mR2vykCtORpVap7mBVpks5qXZYZgaJpZM4JI9g9
.

mnarayan · 2016-07-21T00:32:22Z

Ohh, are you making sure lambda path values are going from largest to
smallest? path mode doesn't work other wise.

On Wed, Jul 20, 2016, 5:29 PM Manjari Narayan manjari.narayan@gmail.com
wrote:

Are all the matrices in the path dense?

What is the largest lambda value and largest covariance value?

On Tue, Jul 19, 2016, 8:29 PM Jason Laska notifications@github.com
wrote:
So when I threshold the values in the precision
precision_t = np.empty(precision.shape)
precision_t[:] = precision
precision_t[precision_t < 1e-1] = 0
precision_nnz = np.count_nonzero(precision_t)
I start to get reasonable results with ebic, but without some
thresholding, I find that all values in the resulting matrix are nonzero
(clearly).

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAov97iYZyt10mR2vykCtORpVap7mBVpks5qXZYZgaJpZM4JI9g9
.

mnarayan assigned jasonlaska Jul 11, 2016

mnarayan added this to the model-selection milestone Jul 11, 2016

jasonlaska closed this as completed Jul 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Selection via BIC and EBIC #13

Model Selection via BIC and EBIC #13

mnarayan commented Jul 11, 2016

mnarayan commented Jul 11, 2016

jasonlaska commented Jul 11, 2016

jasonlaska commented Jul 19, 2016

jasonlaska commented Jul 20, 2016

mnarayan commented Jul 21, 2016

mnarayan commented Jul 21, 2016

Model Selection via BIC and EBIC #13

Model Selection via BIC and EBIC #13

Comments

mnarayan commented Jul 11, 2016

mnarayan commented Jul 11, 2016

jasonlaska commented Jul 11, 2016

jasonlaska commented Jul 19, 2016

jasonlaska commented Jul 20, 2016

mnarayan commented Jul 21, 2016

mnarayan commented Jul 21, 2016