Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Selection via BIC and EBIC #13

Closed
mnarayan opened this issue Jul 11, 2016 · 6 comments
Closed

Model Selection via BIC and EBIC #13

mnarayan opened this issue Jul 11, 2016 · 6 comments
Assignees

Comments

@mnarayan
Copy link
Member

Given sparse regularized inverse covariance estimates over a grid of regularization parameters, a popular criteria to choose the optimal penalty and corresponding estimate is to apply the Bayesian (or Swartz) Information Criterion or the BIC and the Extended BIC (EBIC) in high dimensional regimes.

The BIC criterion is defined as
BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam))

The EBIC criterion is defined as
EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam)) + 4 * (# of non-zeros in Theta(lam)) * (log p) * gam

Here

  • n is n_samples and p is n_features
  • Sigma is the sample covariance/correlation in self.covariance_ and Theta(lam) comes from self.precision_
  • gam is an additional parameter for EBIC that takes values in [0,1]. I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.
  • lam is an element in the grad of lambda penalty parameters.

The goal is to implement model selection using the above criteria as an alternative to cross-validation.

References:
BIC in sparse inverse covariance estimation
EBIC in sparse inverse covariance estimation

@mnarayan mnarayan added this to the model-selection milestone Jul 11, 2016
@mnarayan
Copy link
Member Author

There is another sample splitting based model selection approach that is an alternative to cross-validation, which I will write up as well. These BIC approaches, StARS and cross-validation together pretty exhaustively cover all model selection approaches that try to pick out one penalty parameter from a grid of penalty parameters.

There is another school of model averaging, where you don't try to pick out a single optimal lambda at all, but instead try to combine multiple models. We will cover this in the weighted graphical lasso issue (#8)

@jasonlaska
Copy link
Member

Ok, will see if I can finish the other branch and start this this week.

On Sun, Jul 10, 2016 at 6:15 PM mnarayan notifications@github.com wrote:

There is another sample splitting based model selection approach that is
an alternative to cross-validation, which I will write up as well. These
BIC approaches, StARS and cross-validation together pretty exhaustively
cover all model selection approaches that try to pick out one penalty
parameter from a grid of penalty parameters.

There is another school of model averaging, where you don't try to pick
out a single optimal lambda at all, but instead try to combine multiple
models. We will cover this in the weighted graphical lasso issue (#8
#8)


You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABk3vH1Jk26UVMyEof8lw9-zp0k8CyStks5qUZlKgaJpZM4JI9g9
.

@jasonlaska
Copy link
Member

Wow this is awesome, so simple.

On Sun, Jul 10, 2016 at 6:09 PM mnarayan notifications@github.com wrote:

Given sparse regularized inverse covariance estimates over a grid of
regularization parameters, a popular criteria to choose the optimal penalty
and corresponding estimate is to apply the Bayesian (or Swartz) Information
Criterion or the BIC and the Extended BIC (EBIC) in high dimensional
regimes.

The BIC criterion is defined as
BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of
non-zeros in Theta(lam))

The EBIC criterion is defined as
EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of
non-zeros in Theta(lam)) +
4 * (# of non-zeros in Theta(lam)) * (log p) * gam

Here

  • n is n_samples and p is n_features
  • Sigma is the sample covariance/correlation in self.covariance_ and
    Theta(lam) comes from self.precision_
  • gam is an additional parameter for EBIC that takes values in [0,1].
    I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.
  • lam is an element in the grad of lambda penalty parameters.

The goal is to implement model selection using the above criteria as an
alternative to cross-validation.

References:
BIC in sparse inverse covariance estimation
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184.6058&rep=rep1&type=pdf
EBIC in sparse inverse covariance estimation
https://papers.nips.cc/paper/4087-extended-bayesian-information-criteria-for-gaussian-graphical-models


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#13, or mute the thread
https://github.com/notifications/unsubscribe/ABk3vOrpGBinxh1A-SCIF2d3ShdCm_Ayks5qUZe8gaJpZM4JI9g9
.

@jasonlaska
Copy link
Member

So when I threshold the values in the precision

    precision_t = np.empty(precision.shape)
    precision_t[:] = precision
    precision_t[precision_t < 1e-1] = 0
    precision_nnz = np.count_nonzero(precision_t)

I start to get reasonable results with ebic, but without some thresholding, I find that all values in the resulting matrix are nonzero (clearly).

@mnarayan
Copy link
Member Author

Are all the matrices in the path dense?

What is the largest lambda value and largest covariance value?

On Tue, Jul 19, 2016, 8:29 PM Jason Laska notifications@github.com wrote:

So when I threshold the values in the precision

precision_t = np.empty(precision.shape)
precision_t[:] = precision
precision_t[precision_t < 1e-1] = 0
precision_nnz = np.count_nonzero(precision_t)

I start to get reasonable results with ebic, but without some
thresholding, I find that all values in the resulting matrix are nonzero
(clearly).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAov97iYZyt10mR2vykCtORpVap7mBVpks5qXZYZgaJpZM4JI9g9
.

@mnarayan
Copy link
Member Author

Ohh, are you making sure lambda path values are going from largest to
smallest? path mode doesn't work other wise.

On Wed, Jul 20, 2016, 5:29 PM Manjari Narayan manjari.narayan@gmail.com
wrote:

Are all the matrices in the path dense?

What is the largest lambda value and largest covariance value?

On Tue, Jul 19, 2016, 8:29 PM Jason Laska notifications@github.com
wrote:

So when I threshold the values in the precision

precision_t = np.empty(precision.shape)
precision_t[:] = precision
precision_t[precision_t < 1e-1] = 0
precision_nnz = np.count_nonzero(precision_t)

I start to get reasonable results with ebic, but without some
thresholding, I find that all values in the resulting matrix are nonzero
(clearly).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAov97iYZyt10mR2vykCtORpVap7mBVpks5qXZYZgaJpZM4JI9g9
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants