Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

WIP: enet strong rules (2) #1168

Closed
wants to merge 15 commits into
from

Conversation

Projects
None yet
6 participants
Contributor

ibayer commented Sep 19, 2012

continuing work from PR #992

This PR aims to extend the current coordinate_descent.enet_path to uses an active set of features and strong rule filtering ([0] p. 20).

[0] Tibshirani, R., J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor, and R.J. Tibshirani. “Strong Rules for Discarding Predictors in Lasso-type Problems.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2011).

@ibayer ibayer referenced this pull request Sep 19, 2012

Closed

WIP: enet strong rules #992

@GaelVaroquaux GaelVaroquaux commented on an outdated diff Sep 19, 2012

sklearn/linear_model/cd_fast.pyx
@@ -103,11 +107,56 @@ def sparse_std(unsigned int n_samples,
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
+def elastic_net_kkt_violating_features(np.ndarray[DOUBLE, ndim=1] coef,
+ double l1_reg, double l2_reg,
+ np.ndarray[DOUBLE, ndim=2] X,
+ np.ndarray[DOUBLE, ndim=1] y,
+ np.ndarray[DOUBLE, ndim=1] R=None,
+ subset=None, double tol=0.09):
+ """ Function returns features that don't satisfy the elastic-net KKT.
+ The goal is to distinguish between coefficients that pass the
+ KKT only up to a certain tolerance and features that are inactive
+ (value of zero) but should be active.
+ """
+ kkt_violating_features = set()
+
+ if subset is None:
+ features_to_check = xrange(X.shape[1])
@GaelVaroquaux

GaelVaroquaux Sep 19, 2012

Owner

Have you checked (by looking at the generated C code) that it was not useful to type the input of xrange. I don't know know if Cython is clever-enough to guess that X.shape[1] is an int.

@GaelVaroquaux GaelVaroquaux and 2 others commented on an outdated diff Sep 23, 2012

sklearn/linear_model/cd_fast.pyx
def enet_coordinate_descent(np.ndarray[DOUBLE, ndim=1] w,
double alpha, double beta,
np.ndarray[DOUBLE, ndim=2] X,
np.ndarray[DOUBLE, ndim=1] y,
- int max_iter, double tol, bool positive=False):
+ int max_iter, double tol,
+ bint positive=False, bint calc_dual_gap=True,
+ np.ndarray[INTEGER, ndim=1] iter_set=None,
+ np.ndarray[DOUBLE, ndim=1] R=None):
@GaelVaroquaux

GaelVaroquaux Sep 23, 2012

Owner

'R' is a single-letter non descriptive name. It would be much better to find a better name that tells the read what the variable is. In addition, variables starting by a capital letter are supposed to be 2D arrays.

@amueller

amueller Sep 23, 2012

Owner

On 09/23/2012 08:00 PM, Gael Varoquaux wrote:

'R' is a single-letter non descriptive name. It would be much better
to find a better name
This should go on twitter ;)

@GaelVaroquaux

GaelVaroquaux Sep 23, 2012

Owner

On 09/23/2012 08:00 PM, Gael Varoquaux wrote: 'R' is a single-letter non
descriptive name. It would be much better to find a better name
This should go on twitter ;)

:).

Actually, it should be named 'residues', here :P

@ogrisel

ogrisel Sep 23, 2012

Owner

'R' is a single-letter non descriptive name. It would be much better
to find a better name

This should go on twitter ;)

Hahaha, done: https://twitter.com/ogrisel/status/249984920519311360

Owner

GaelVaroquaux commented Sep 23, 2012

@ibayer : I sent you a small PR on this PR to fix 64bits issues.

That said, after fixing this issue, I get the 'assert_array_almost_equal' failing in bench_strong_rules.py'

@GaelVaroquaux GaelVaroquaux commented on an outdated diff Sep 23, 2012

benchmarks/bench_strong_rules.py
+"""
+import gc
+from time import time
+import numpy as np
+
+from sklearn.datasets.samples_generator import make_regression
+from numpy.testing import assert_array_almost_equal
+from sklearn.linear_model.base import center_data
+
+
+def compute_bench(alpha, rho, n_samples, n_features, precompute):
+
+ enet_results = []
+ enet_strong_rules_results = []
+
+ n_test_samples = 0
@GaelVaroquaux

GaelVaroquaux Sep 23, 2012

Owner

Unused variable. You should use a pyflakes plugin for your editor to detect these.

@GaelVaroquaux GaelVaroquaux commented on the diff Sep 23, 2012

sklearn/linear_model/coordinate_descent.py
@@ -113,7 +113,8 @@ class ElasticNet(LinearModel, RegressorMixin):
"""
def __init__(self, alpha=1.0, rho=0.5, fit_intercept=True,
normalize=False, precompute='auto', max_iter=1000,
- copy_X=True, tol=1e-4, warm_start=False, positive=False):
+ copy_X=True, tol=1e-4, warm_start=False, positive=False,
+ use_strong_rule=False):
@GaelVaroquaux

GaelVaroquaux Sep 23, 2012

Owner

Is there a reason why this is not True by default?

Contributor

ibayer commented Sep 23, 2012

@GaelVaroquaux
Thanks for all the input, I'm sorry this goes so slow but I'll have some time tomorrow for your PR etc.
assert_array_almost_equal is likely due to different convergence speed between enet with and without strong rules. I'm already working this issue.

Contributor

ibayer commented Sep 24, 2012

@GaelVaroquaux
Thanks for the PR.
I don't understand the math behind line 47 (bench_strong_rules.py) especially why alpha_max is independent of alpha and rho:
alpha_max = np.max(np.abs(Xy)) / len(X)
In enet_path we use alpha_max = np.abs(Xy).max() / (n_samples * rho), but the derivation isn't completely clear to me either. Can you give me a hint? ( I guess np.max(np.abs(Xy)) gives me the highest correlated feature but then... )

Owner

GaelVaroquaux commented Sep 24, 2012

Hey @ibayer : could you merge once more from my branch: I just sent a cosmit.

Owner

GaelVaroquaux commented Sep 24, 2012

With regards alpha_max: first, I made a mistake, and it should be the definition used in enet_path. Second, the reason that it is independent of alpha, is that it gives the largest alpha for which there are non-zero features.

Finally, the derivation comes from expressing the KKT conditions at zero for each individual coefficient.

Owner

GaelVaroquaux commented Sep 24, 2012

I have been benching a bit this PR on a very sparse system, for which I was expecting performance gains from using the strong rules: https://gist.github.com/3775870

I see no performance gains. Thus I have been digging a bit in the code, and I have a few questions/remarks.

Fist, @ibayer : could you add docstrings to the different functions that explain in layman's term what they do. Right now it is hard to follow the code. Thus a small definition of what each function returns (including what is the 'strong_set') would help.

Second, I find a few things that surprise me:

  • In fit_enet_with_strong_rule, around line 361, self.coef is used a few times. In practice, in the code that I run, it is 'None', as it hasn't been initialized. I wonder if this code does what it is expected to do. For instance, testing that 'self.coef_ == 0', or computing the KKT violations on self.coef_, that is None, seems like a bug to me.
  • In my example, the final solution is very sparse (only 12 coefs non zero). However, the code in _fit_enet_with_strong_rule is not able to define a sparse active set, and ends up calling the coordinate descent algorithm with a full active set. It seems to me that the code could work better. One of the problems, is that the strong_set is computed only once, in the beginning, when the 'w' was not known at all. As a result, it gives a really bad guess of what can be pruned out. In my bench, I have included a line showing that computing the strong_set with a good estimate of 'w' gives a good pruning.

I don't really understand fully the design of this function, but the way I would have seen it, I would have expected 'w' to be progressively refined, with progressive pruning of the active_set and recomputation of the active_set. For this reason, I would have expected 'w' to be used in the code more than 'self.coef_'.

Contributor

ibayer commented Sep 24, 2012

I have not yet included everything from the first PR, no strong-rules in enet_path, norm_cols_X the stopping criterion needs to be changed and other stuff I discussed with Alex.

I'll add docstrings etc. but please be aware that this PR in not yet "complete".

In fit_enet_with_strong_rule, around line 361, self.coef is used a few times. In practice, in the code that I run, it is 'None', as it hasn't been initialized. I wonder if this code does what it is expected to do. For instance, testing that 'self.coef_ == 0', or computing the KKT violations on self.coef_, that is None, seems like a bug to me.

You are right the code is wrong, it has not yet recovered from the last rebase (multi-target is the issue here).

In my example, the final solution is very sparse (only 12 coefs non zero). However, the code in _fit_enet_with_strong_rule is not able to define a sparse active set, and ends up calling the coordinate descent algorithm with a full active set.

There is no sparse support yet, it's on my list too.

It seems to me that the code could work better. One of the problems, is that the strong_set is computed only once, in the beginning, when the 'w' was not known at all. As a result, it gives a really bad guess of what can be pruned out.

The calculation of the strong_set is independent of 'w' if the basic strong rules is used, this is the case here.
I think in your example a continuous approach (regularization path) could indeed be faster as a single fit (I think that's the default in glmnet) and would make it possible to calculate a smaller strong_set using the sequential-strong-rule. This is interesting but nothing I would address now considering the current state of the PR.

In my bench, I have included a line showing that computing the strong_set with a good estimate of 'w' gives a good pruning.

I'll have a look.

I don't really understand fully the design of this function, but the way I would have seen it, I would have expected 'w' to be progressively refined, with progressive pruning of the activeset and recomputation of the active_set. For this reason, I would have expected 'w' to be used in the code more than 'self.coef'.

The current implementation does not shrink the active_set in any way. In the beginning the active_set equals the strong_set and then missing features are added. I observed that the basic_strong_rules does a very poor job if alpha is not close to alpha_max. I would rather start with a high alpha (small active_set) and refit with decreasing alpha (active_set increases) then trying to find a way to shrink the active_set after starting with a relative low alpha ( nearly full active_set).

Hope this makes some sense.

Owner

GaelVaroquaux commented Sep 24, 2012

Hi @ibayer,

I discussed on the phone with @agramfort, and I understand things a bit
better. I'll add a few comments and maybe suggest new names based on my
understanding.

One thing that I had not understood was that the incremental strong rules
where useable only if I had a converged estimate with a higher alpha.
Based on this information, the design difficulties make much more sens.

It thus turns out that the strong rules are really mostly useful in path
settings. Given that, I feel that a slightly different design would
result in code easier to read. I would avoid adding the strong rules
logic in the methods on the EleasticNet object. Instead, I would add them
in the enet_path function. The logic being that in the object it is very
difficult to have a control on what the user has been previously doing
with the object. It forces to add many flags and options.

However, the current instanciation of enet_path is not a very good base
to add the strong rules, because it relies a lot on the ElasticNet
object. To make your life easier, I propose to refactor it in order to
try and use directly lower-level methods in enet-path. It should then be
easier to fit in the strong-rules logic.

How do that sound? I just wonder where I am going to find the time to do
that refactor...

Contributor

ibayer commented Sep 24, 2012

Hi @GaelVaroquaux

One thing that I had not understood was that the incremental strong rules where useable only if I had a converged estimate with a higher alpha. Based on this information, the design difficulties make much more sens.

Indeed, one open challenge is to find the right convergence threshold (doesn't need to be to small) and a cheap convergence criterion.

It thus turns out that the strong rules are really mostly useful in path settings. Given that, I feel that a slightly different design would result in code easier to read. I would avoid adding the strong rules logic in the methods on the EleasticNet object. Instead, I would add them in the enet_path function. The logic being that in the object it is very difficult to have a control on what the user has been previously doing with the object. It forces to add many flags and options.

That's my feeling too, I think one argument @agramfort to keep the strong-rules in the ElasticNeet object was, that for a single alpha fit (without path) the strong-rules can (only for alpha close to alpha_max) provide a significant speedup. This option is lost or get's more complicated if the strong-rules are completely added to enet_path.

However, the current instanciation of enet_path is not a very good base to add the strong rules, because it relies a lot on the ElasticNet object. To make your life easier, I propose to refactor it in order to try and use directly lower-level methods in enet-path. It should then be easier to fit in the strong-rules logic.

+1

How do that sound? I just wonder where I am going to find the time to do that refactor...

Sounds great, this way I can start improve the strong_rules logic now, they can be moved into enet_path when the refactoring is done (It's going to be a snail race, I'm short on time too). I was totally distracted by those design issues.

Owner

GaelVaroquaux commented Oct 10, 2012

On Wed, Oct 10, 2012 at 01:25:13AM -0700, ibayer wrote:

can anyone explain me where all this commits come from that high jack this PR
(and how to fix it)?

Probably a rebase.

I suspect it has something to do with the new BugFix release?

Maybe, but in this case, I am lost...

Owner

MechCoder commented Sep 15, 2014

@ibayer @agramfort I assume we can close this too :)

Owner

agramfort commented Sep 15, 2014

yes

@agramfort agramfort closed this Sep 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment