New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary learning #221

Merged
merged 190 commits into from Sep 19, 2011

Conversation

Projects
None yet
7 participants
@vene
Member

vene commented Jun 25, 2011

Pull request contains: (UPDATED)

BaseDictionaryLearning object implementing transform methods
DictionaryLearning and OnlineDictionaryLearning implementing fit in different ways

Dictionary learning example
Image denoising example

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

avoid lambda they won't pickle.

avoid lambda they won't pickle.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

I would rename D with X (standard in linear_model module)

I would rename D with X (standard in linear_model module)

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

OMP ref is [mallat 93]

OMP ref is [mallat 93]

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

y shoudl be (n_samples, n_targets) so y[i] is still sample i.

y shoudl be (n_samples, n_targets) so y[i] is still sample i.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

you don't need the \

you don't need the \

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

doc formatting pb: Should have a Parameters and Returns sections. A reference to [Mallat 93] should be added too.

doc formatting pb: Should have a Parameters and Returns sections. A reference to [Mallat 93] should be added too.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Jun 15, 2011

teh Returns section should be there also the coefs should be named w for consistancy

teh Returns section should be there also the coefs should be named w for consistancy

vene and others added some commits Sep 16, 2011

DOC: larger lena size in denoising example
Large size work better because they give a better training set to the
dictionary_learning algorithm. This is a tradeoff between computation
time and quality of example
return new_code
def sparse_encode_parallel(X, Y, gram=None, cov=None, algorithm='lasso_lars',

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Sep 17, 2011

Member

I believe that this should be moved to linear_models and renamed something like 'multivariate_lasso'. I am worried that in the current situation, people starting from the lasso solver will not find it.

@GaelVaroquaux

GaelVaroquaux Sep 17, 2011

Member

I believe that this should be moved to linear_models and renamed something like 'multivariate_lasso'. I am worried that in the current situation, people starting from the lasso solver will not find it.

This comment has been minimized.

@ogrisel

ogrisel Sep 17, 2011

Member

I don't think we should call it "multivariate lasso" as this is not restricted to lasso but also works for OMP and simple thresholding too. I find the current function name much more explicit.

@ogrisel

ogrisel Sep 17, 2011

Member

I don't think we should call it "multivariate lasso" as this is not restricted to lasso but also works for OMP and simple thresholding too. I find the current function name much more explicit.

This comment has been minimized.

@vene

vene Sep 17, 2011

Member

We could add See alsos?

@vene

vene Sep 17, 2011

Member

We could add See alsos?

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

more multitask_lasso than multivariate_lasso but I feel it's too much jargon.
+1 for see also and maybe latter refactor lasso_lars and lasso_cd to support multiple inputs.

@agramfort

agramfort Sep 19, 2011

Member

more multitask_lasso than multivariate_lasso but I feel it's too much jargon.
+1 for see also and maybe latter refactor lasso_lars and lasso_cd to support multiple inputs.

Show outdated Hide outdated sklearn/decomposition/dict_learning.py
tol: float,
Tolerance for the stopping condition.
method: {'lasso_lars', 'lasso_cd'}

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Sep 17, 2011

Member

I guess this should be renamed to 'lars' or 'cv'.

@GaelVaroquaux

GaelVaroquaux Sep 17, 2011

Member

I guess this should be renamed to 'lars' or 'cv'.

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

+1 for 'lars' and 'cd' as only lasso makes sense here.

@agramfort

agramfort Sep 19, 2011

Member

+1 for 'lars' and 'cd' as only lasso makes sense here.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 17, 2011

Member

I made a couple of comments in the github diff. In addition I made a pull request. Once these are done, I am +1 for merge/

Member

GaelVaroquaux commented Sep 17, 2011

I made a couple of comments in the github diff. In addition I made a pull request. Once these are done, I am +1 for merge/

@vene

This comment has been minimized.

Show comment
Hide comment
@vene

vene Sep 17, 2011

Member

I think I addressed everything. I left the function name dictionary_learning_online, it could also be renamed as minibatch_dictionary_learning or mini_batch_dictionary_learning, do you think that should be done?

Member

vene commented Sep 17, 2011

I think I addressed everything. I left the function name dictionary_learning_online, it could also be renamed as minibatch_dictionary_learning or mini_batch_dictionary_learning, do you think that should be done?

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 18, 2011

Member

+0 for minibatch_dictionary_learning or online_dictionary_learning. Running the tests / doc right now.

Member

ogrisel commented Sep 18, 2011

+0 for minibatch_dictionary_learning or online_dictionary_learning. Running the tests / doc right now.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 18, 2011

Member

It seems that the new figure in the MiniBatchDictionaryLearning section of the doc is pointing to the wrong image (MiniBatchSparsePCA) and the alignment is weird: the lena patches are centered and the figure for the faces decomposition below on the left. I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application.

Member

ogrisel commented Sep 18, 2011

It seems that the new figure in the MiniBatchDictionaryLearning section of the doc is pointing to the wrong image (MiniBatchSparsePCA) and the alignment is weird: the lena patches are centered and the figure for the faces decomposition below on the left. I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application.

alpha: float, 1. by default
If `algorithm='lasso_lars'` or `algorithm='lasso_cd'`, `alpha` is the
penalty applied to the L1 norm.
If `algorithm='threhold'`, `alpha` is the absolute value of the

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

s/threhold/threshold

@agramfort

agramfort Sep 19, 2011

Member

s/threhold/threshold

threshold below which coefficients will be squashed to zero.
If `algorithm='omp'`, `alpha` is the tolerance parameter: the value of
the reconstruction error targeted. In this case, it overrides
`n_nonzero_coefs`.

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

that makes me think that add the constrain on the l2 reconstruction error had omp_rec_error ? @ogrisel, @GaelVaroquaux thoughts?

@agramfort

agramfort Sep 19, 2011

Member

that makes me think that add the constrain on the l2 reconstruction error had omp_rec_error ? @ogrisel, @GaelVaroquaux thoughts?

Show outdated Hide outdated sklearn/decomposition/dict_learning.py
lasso_cd: uses the coordinate descent method to compute the
Lasso solution (linear_model.Lasso). Lars will be faster if
the estimated components are sparse.

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

lars and cd here also

@agramfort

agramfort Sep 19, 2011

Member

lars and cd here also

Show outdated Hide outdated sklearn/decomposition/sparse_pca.py
lars: uses the least angle regression method (linear_model.lars_path)
cd: uses the coordinate descent method to compute the
method: {'lasso_lars', 'lasso_cd'}
lasso_lars: uses the least angle regression method

This comment has been minimized.

@agramfort

agramfort Sep 19, 2011

Member

here also lars or cd

@agramfort

agramfort Sep 19, 2011

Member

here also lars or cd

This comment has been minimized.

@vene

vene Sep 19, 2011

Member

How about also algorithm instead of method? or even fit_algorithm for consistency with dictionary learning classes?

@vene

vene Sep 19, 2011

Member

How about also algorithm instead of method? or even fit_algorithm for consistency with dictionary learning classes?

This comment has been minimized.

@ogrisel

ogrisel Sep 19, 2011

Member

Yes we already had this discussion in the precedent comments. I am still in favor of using "algorithm" pervasively but that would require updating lars_path and LocallyLinearEmbedding and maybe others.

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

@ogrisel

ogrisel Sep 19, 2011

Member

Yes we already had this discussion in the precedent comments. I am still in favor of using "algorithm" pervasively but that would require updating lars_path and LocallyLinearEmbedding and maybe others.

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

This comment has been minimized.

@vene

vene Sep 19, 2011

Member

OTOH SparsePCA and MiniBatchSparsePCA can be safely changed now, as
they weren't featured in releases right? Or would you like to wait
and change everything at the same time?

@vene

vene Sep 19, 2011

Member

OTOH SparsePCA and MiniBatchSparsePCA can be safely changed now, as
they weren't featured in releases right? Or would you like to wait
and change everything at the same time?

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Sep 19, 2011

Member

On Mon, Sep 19, 2011 at 02:22:03AM -0700, Olivier Grisel wrote:

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

If wez break the API, I'd like a backward compatible mode for one release
(e.g. using **kwards).

G

@GaelVaroquaux

GaelVaroquaux Sep 19, 2011

Member

On Mon, Sep 19, 2011 at 02:22:03AM -0700, Olivier Grisel wrote:

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

If wez break the API, I'd like a backward compatible mode for one release
(e.g. using **kwards).

G

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 19, 2011

Member

besides my comments I would rename plot_img_denoising.py to plot_image_denoising.py

also I'd love to see added the omp_rec_error for a better result in plot_img_denoising.py but I don't wand to block the merge

Member

agramfort commented Sep 19, 2011

besides my comments I would rename plot_img_denoising.py to plot_image_denoising.py

also I'd love to see added the omp_rec_error for a better result in plot_img_denoising.py but I don't wand to block the merge

Show outdated Hide outdated sklearn/decomposition/dict_learning.py
tol: float,
tolerance for numerical error
fit_algorithm: {'lasso_lars', 'lasso_cd'}

This comment has been minimized.

@vene

vene Sep 19, 2011

Member

And here too, lars and cd, right? and in the rest of the objects.

@vene

vene Sep 19, 2011

Member

And here too, lars and cd, right? and in the rest of the objects.

This comment has been minimized.

@ogrisel

ogrisel Sep 19, 2011

Member

Yes only but where the lasso_ part is mandatory. E.g. not for the transform_algorithm.

@ogrisel

ogrisel Sep 19, 2011

Member

Yes only but where the lasso_ part is mandatory. E.g. not for the transform_algorithm.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 19, 2011

Member

@agramfort: I am not familiar with what omp_rec_error is all about. So I agree we should report that discussion for another pull request.

Member

ogrisel commented Sep 19, 2011

@agramfort: I am not familiar with what omp_rec_error is all about. So I agree we should report that discussion for another pull request.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 19, 2011

Member

About the doc, this part of my comment was not addressed: "I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application."

Also I think it should be compared to the PCA output as done for all other methods in this chapter so as to keep the chapter consistent.

Member

ogrisel commented Sep 19, 2011

About the doc, this part of my comment was not addressed: "I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application."

Also I think it should be compared to the PCA output as done for all other methods in this chapter so as to keep the chapter consistent.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 19, 2011

Member

Also when I run the decomp example I often have 2 or 4 of atoms that are not white noise and the non noisy components look almost duplicated or one is a near negative of another. Maybe the L1 reg is too strong the algorithm is not stable on this data for so small dictionaries (6 atoms only, this is far from overcomplete in this regime...).

Member

ogrisel commented Sep 19, 2011

Also when I run the decomp example I often have 2 or 4 of atoms that are not white noise and the non noisy components look almost duplicated or one is a near negative of another. Maybe the L1 reg is too strong the algorithm is not stable on this data for so small dictionaries (6 atoms only, this is far from overcomplete in this regime...).

@vene

This comment has been minimized.

Show comment
Hide comment
@vene

vene Sep 19, 2011

Member

Addressed now. Sorry about that. You're right this is much better.

Member

vene commented Sep 19, 2011

Addressed now. Sorry about that. You're right this is much better.

@vene

This comment has been minimized.

Show comment
Hide comment
@vene

vene Sep 19, 2011

Member

Decomp example is also fixed, and random state has been fixed. Looks
more informative now, but also a lot creepier. shivers

Member

vene commented Sep 19, 2011

Decomp example is also fixed, and random state has been fixed. Looks
more informative now, but also a lot creepier. shivers

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 19, 2011

Member

So I agree we should report that discussion for another pull request.

+1

Member

GaelVaroquaux commented Sep 19, 2011

So I agree we should report that discussion for another pull request.

+1

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 19, 2011

Member

Ok this looks good for me. +1 for merge.

Member

ogrisel commented Sep 19, 2011

Ok this looks good for me. +1 for merge.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 19, 2011

Member

+1 me too

Member

GaelVaroquaux commented Sep 19, 2011

+1 me too

@ogrisel ogrisel merged commit d56281e into scikit-learn:master Sep 19, 2011

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 19, 2011

Member

Merged. Thanks again for your work.

Member

ogrisel commented Sep 19, 2011

Merged. Thanks again for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment