[ENH] Ridge Path and general refactoring #3373

Open
eickenberg opened this Issue Jul 13, 2014 · 5 comments

Comments

Projects
None yet
4 participants
Contributor

eickenberg commented Jul 13, 2014

Hi everybody,

during the sprint (July 14th 2014 to 20th), I am hoping among other things to iron out some inconsistencies in ridge.py and add a nice-to-have ridge_path (as proposed in #582) which conveniently provides coefficients or predictions necessary for the CV parameter selection algorithms.

The RidgeClassifierCV classifier also needs some care, since its inner cross-validation loop is invariably leave-one-out MSE. Other inner loop scoring should be possible (should be easy with ridge_path) or the classifier renamed (see #3302).

My idea is to first write ridge_path, and then gently move all functionality that could use it over.

A gist for ridge_path is here https://gist.github.com/eickenberg/e9eb106f12e40d017fd0

All of the functions are multi-penalty, multi-target. I would like to systematically broadcast penalties according to rules presented in #2068

I am still not completely sure where to place certain decision logic (i.e. solvers, etc). ridge_path seem like something that could be in the public API, and should thus probably contain decision logic and a solver= kwarg.

Also, I hear talk of a potential KernelMixin. I don't know what it should do, but I coded a kernel function as one taking two matrices X, and Y, and, treating lines as samples, returns a similarity matrix of all the samples.

Take this as a roadmap and request for comments -- very happy about any input!

@mblondel @ogrisel @fabianp @agramfort @GaelVaroquaux @bthirion

Owner

fabianp commented Jul 14, 2014

I think all the objectives you detail are worthy. My only advice would be to submit a pull request as soon as you complete the first objective instead of submitting a big chunk of code once you finish everything. It makes code easier to review and maximizes your chances of getting the code in.

Owner

GaelVaroquaux commented Jul 14, 2014

I think all the objectives you detail are worthy. My only advice would
be to submit a pull request as soon as you complete the first objective
instead of submitting a big chunk of code once you finish everything.
It makes code easier to review and maximizes your chances of getting
the code in.

Yes!

Owner

mblondel commented Jul 14, 2014

Also we need to stop wrapping GridSearchCV within RidgeCV and instead introduce a solver constructor option, as suggested in my comment.

Contributor

eickenberg commented Jul 14, 2014

Cool, yes that is on my list: I am planning to replace it with a ridge path that outputs predictions or scores for the whole grid. In the n_samples < n_features case this is even more effective. In the other case, scores still need to be calculated.

Right now my gist functions output predictions if you give it a X_test and coefs otherwise. This seems natural, except in the LOO case, where X_test is implicitly given by X_train. So I am facing some API consistency issues here.

I will start by coding a ridge_path function that contains decision logic and adding into it one of the implementations (one type of solver, e.g. eigen) that I provide in the gist. Then I will make the ridge regression objects depend upon this functionality. In the next step I will add more solvers to ridge_path and make the ridge regression objects depend on them, etc.

That way I hope to comply with the reviewability criteria @fabianp mentions while keeping the mess at a minimum.

Contributor

eickenberg commented Jul 14, 2014

Corresponding PR #3383 - but it should have already referenced this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment