Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Dictionary learning #221

Merged
merged 190 commits into from

7 participants

@vene
Owner

Pull request contains: (UPDATED)

BaseDictionaryLearning object implementing transform methods
DictionaryLearning and OnlineDictionaryLearning implementing fit in different ways

Dictionary learning example
Image denoising example

@agramfort

avoid lambda they won't pickle.

@agramfort

I would rename D with X (standard in linear_model module)

@agramfort

OMP ref is [mallat 93]

@agramfort

y shoudl be (n_samples, n_targets) so y[i] is still sample i.

@agramfort

you don't need the \

@agramfort

doc formatting pb: Should have a Parameters and Returns sections. A reference to [Mallat 93] should be added too.

@agramfort

teh Returns section should be there also the coefs should be named w for consistancy

vene and others added some commits
@vene vene Renaming, some transposing 0cb2e3c
@vene vene Tests and the refactoring they induce 12fcc75
@vene vene PEP8 7b4cdb0
@vene vene Added signal recovery test 09e17cf
@vene vene rigurous pep8 69ccac9
@vene vene Added the example b3e6f81
@vene vene Cosmetized the example 0264056
@vene vene Added Olivier's patch extractor with enhancements 873256b
@vene vene cleanup 6ee443a
@vene vene Tests for various cases 06067f6
@vene vene PEP8, renaming, removed image size from params f1df50f
@vene vene Merge branch 'sparsepca' into sc 898777a
@vene vene Merge branch 'sparsepca' into sc 1de74af
@vene vene Revert "FIX: update_V without warm restart"
This reverts commit 38235cf.
e1ef751
@vene vene Merge branch 'sparsepca' into sc 2cef229
@ogrisel ogrisel FIX: make the dataset doctest fixture modular 00c976e
@ogrisel ogrisel typo 2c425fa
@larsmans larsmans document placement new in SVMlight reader 7ee358f
@mblondel mblondel Documentation fixes. 7158ee3
@vene vene Initial integration of Orthogonal MP 23704df
@vene vene Renaming, some transposing 170cc2f
@vene vene Tests and the refactoring they induce aa3bd39
@vene vene PEP8 487a8e1
@vene vene Added signal recovery test 6c74b15
@vene vene rigurous pep8 9f5f0c3
@vene vene Added the example 26fad8d
@vene vene Cosmetized the example 37dc0c5
@vene vene Added Olivier's patch extractor with enhancements b8ed9cd
@vene vene cleanup afd860a
@vene vene Tests for various cases dde98c0
@vene vene PEP8, renaming, removed image size from params 057b7f3
@vene vene FIX: weird branching accident 77b6612
@vene vene Revert "FIX: update_V without warm restart"
This reverts commit 38235cf.
c02d81e
@vene vene Revert "Revert "FIX: update_V without warm restart""
This reverts commit a557817.
7aa4926
@vene vene Merge branch 'sc' of github.com:vene/scikit-learn into sc cfd5b34
@vene vene FIX: update_V without warm restart 1598c40
@vene vene Added dictionary learning example 34c6585
@agramfort agramfort ENH : prettify dict learn example on image patches 43acd7f
@agramfort agramfort pep8 ad5b4ca
@vene vene Merge pull request #3 from agramfort/sc
Sc
2a256ff
@vene vene renaming for consistency, tests for PatchExtractor 849ff59
@vene vene Initial shape of dictionary learning object 7af69d7
@vene vene Added DictionaryLearning to __init__.py bf613b0
@vene vene FIX: silly bugs so that the example runs 8550f10
@vene vene ENH: Tweaked the example a bit 620b907
@vene vene PEP8 0c51fba
@agramfort agramfort FIX : using product form utils.fixes for python 2.5 2fd33fd
@agramfort agramfort pep8 badddc8
@agramfort agramfort MISC : fix docstring, cosmit in image.py a9ea545
@agramfort agramfort FIX; missing import in dict_learning.py (OMP in transform in not tested 98f592d
@GaelVaroquaux

I'd like the instantiation of the object to be done in a separate step than the fit, as some people might be confused.

@GaelVaroquaux

seed should be called random_state. That remark applies all throughout the package.

@GaelVaroquaux

There are unused imports in your code. Please run pyflakes on all the files (I use a shell for loop to do this).

@GaelVaroquaux

As discussed by mail, sparse_pca should be turned to a dict_learning function, and work on the transposed problem. Only objects should be exposed as SparsePCA. This will mean that you will need to rename the logging messages.

EDIT: Indeed, after rereading the codebase, there is a clean-up required to make sure that the vocabulary is consistent, and the imports paths make sens.

@GaelVaroquaux

I have been running this example, and noticing that the results are much more consistent with my expectations when user higher values for max_patches. Once we have the online version working, we will need to get this parameter back up.

@fabianp
Owner

Men! You just created a branch sc on origin, maybe you pushed to the wrong repo ?

Owner

haha. yes, happens to me all the time.

I think the fix is something like git push origin :sc (which of course makes a lot of sense ...)

scikits/learn/linear_model/omp.py
((160 lines not shown))
+ Whether to perform precomputations. Improves performance when n_targets
+ or n_samples is very large.
+
+ Returns:
+ --------
+ coef: array of shape: n_features or (n_features, n_targets)
+ Coefficients of the OMP solution
+ """
+ X = np.asanyarray(X)
+ y = np.asanyarray(y)
+ if y.ndim == 1:
+ y = y[:, np.newaxis]
+ if n_atoms == None and eps == None:
+ raise ValueError('OMP needs either a target number of atoms (n_atoms) \
+ or a target residual error (eps)')
+ if eps != None and eps < 0:
@larsmans Owner

If the value must be positive, shouldn't the check be eps <= 0 then? Same question for n_atoms, below.

@vene Owner
vene added a note

About n_atoms you are right, but I think we should accept eps = 0 for when you want perfect reconstruction... do you agree?

@larsmans Owner

I'm not even remotely familiar with dictionary learning, so don't ask for comments on the actual algorithm :)
It was only the exception message that I was concerned about. Looks alright now.

@GaelVaroquaux Owner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
larsmans and others added some commits
@larsmans larsmans Copyedit SparsePCA docs c7365e2
@vene vene Merge pull request #5 from agramfort/sc
Sc
24c3a68
@vene vene Merge branch 'master' into sc 9847371
@vene vene Merge branch 'sparsepca' into sc 1b76e96
@vene vene Merge branch 'sc' of git://github.com/larsmans/scikit-learn into sc c01ea0e
@vene vene Renamed online dict_learning appropriately f68b50a
@vene vene Merge branch 'sparsepca' into sc 61ee071
@vene vene Renaming part three b579582
@vene vene Fixed dico learning example d014aad
@vene vene Merge branch 'sparsepca' into sc b99397a
@vene vene FIX: checks in orthogonal_mp df89fab
@vene vene Cleanup orthogonal_mp docstrings 5c3bafd
@vene vene OMP docs, a little broken for now 484d9bd
@vene vene DOC: omp documentation improved c1e234e
@vene vene DOC: omp documentation fixes a48b00e
@vene vene DOC: dict_learning docs 8fb39dd
@vene vene dictionary learning tests 786ce12
@vene vene Fixed overcomplete case and updated dl example 598aee0
@vene vene online dictionary learning object 2f741ac
@vene vene factored base dico object ff671f5
@vene vene Merge branch 'sparsepca' into sc
Conflicts:
	scikits/learn/decomposition/sparse_pca.py
e3ef711
@vene vene pep8 16ac0ec
@vene vene more transform methods, split_sign db08066
@vene vene OMP dictionary must have normalized columns. 7826f94
@vene vene Merge branch 'master' into sc 8a553f5
@vene vene DOC: improved dict learning docs d08d58b
@vene vene Tweaked the dico example f65dec3
@vene vene exposed dict learning online in init 75773ff
@vene vene working on partial fit b06f121
@vene vene denoising example bad0431
@vene vene Annotate the example 5ae3c91
@vene vene partial fit iteration tracking, test still fails 7b42c64
@vene vene FIX: typo, s/treshold/threshold 3bdc425
@vene vene Tweak denoise example spacing 1fd8277
@vene vene pep8 examples ef41e46
@vene vene pep8 b584bab
@vene vene Merge branch 'sparsepca' into sc c154810
@vene vene Merge branch 'sparsepca' into sc 80b9854
@vene vene random state control 8065cf7
@vene vene Merge branch 'sparsepca' into sc e95d364
@vene vene clarify lasso method param in sc too 78cf3bf
@vene vene s/seed/random_state in patch extractor 61cb912
@vene vene DOC: fixed patch extraction comments cb9dbd0
@vene vene ENH: PatchExtractor transform 767de18
@vene vene d:/progs/Git/s/seed/random_state in dico learning example 346330b
@vene vene d:/progs/Git/s/seed/random_state in denoising example a67b287
@vene vene Merge branch 'sparsepca' into sc 7ec38ba
@vene vene Merge branch 'sparsepca' into sc
Conflicts:
	scikits/learn/feature_extraction/image.py
	scikits/learn/feature_extraction/tests/test_image.py
99a64df
@ogrisel
Owner

@vene can you please merge master into this update the github diff view with the current state?

@vene
Owner
@vene vene Merge branch 'master' into sc
Conflicts:
	doc/modules/decomposition.rst
	scikits/learn/decomposition/__init__.py
	scikits/learn/decomposition/tests/test_sparse_pca.py
a7fc975
@vene
Owner

How about I make a separate pull request with just the dict_learning_online function and a OnlineSparsePCA or MiniBatchSparsePCA using it, so that it can be merged faster and then used to speed up the face decomposition example?

@ogrisel
Owner

+1 for smaller pull requests as long as they are self-contained (code + test + doc + update of the impacted examples).

@GaelVaroquaux
@ogrisel
Owner

@vene Have you received my pull request for this?

scikits/learn/decomposition/dict_learning.py
((94 lines not shown))
+
+ max_iter: int,
+ maximum number of iterations to perform
+
+ tol: float,
+ tolerance for numerical error
+
+ transform_method: 'lasso_lars' | 'lasso_cd' | 'omp' | 'threshold' |
+ 'triangle'
+ method to use for transforming the data after the dictionary has been
+ learned
+
+ coding_method: 'lars' | 'cd'
+ lars: uses the least angle regression method (linear_model.lars_path)
+ cd: uses the stochastic gradient descent method to compute the
+ lasso solution (linear_model.Lasso)
@mblondel Owner
mblondel added a note

Ok I may be talking nonsense here since I don't know dictionary learning that well but to me, transform_method and coding_method sound like they are doing the same thing (using an existing dictionary to transform data is encoding the data for me). So I would suggest encoding_method and dictionary_learning_method or just learning_method for short. What do you think @agramfort, @vene?

@ogrisel Owner
ogrisel added a note

I would vote dictionary_method used in fit to estimate the dictionary components and coding_method used in transform to find the loadings to encode the input signal according to a given dictionary.

@mblondel Owner
mblondel added a note

Ok +1. My main concern was that transform_method and coding_method really sound like the same thing.

@vene Owner
vene added a note

+1. I don't know if I will update this particular pull request because the git tree might have been messed up from it being fragmented and updated separately. I hope it can be fixed.

@agramfort Owner

coding_method is the algo used for the fit ie. should probably be named fit_method but I am fine with coding_method also. Note that CD is coordinate_descent which is the same à stochastic gradient descent.

@mblondel Owner
mblondel added a note
@ogrisel Owner
ogrisel added a note

Just coordinate descent. The scikit-learn implementation of CD is purely deterministic.

@vene Owner
vene added a note

How about I simply name them fit_method and transform_method? Wouldn't it be clearer?

@larsmans Owner

"Method" is confusing in an OO setting. I'd prefer fit_algorithm and transform_algorithm.

@vene Owner
vene added a note

This is an excellent point. However, "method" is consistent with lars_path, and possibly other places in the codebase. Of course, using my suggestion, it's confusing because the object actually has fit and transorm methods. What should we do?

@larsmans Owner

Various other estimators take algorithm parameters, so something is bound to be inconsistent ;)

@vene Owner
vene added a note

In this case, I am +1 for fit_algorithm and transform_algorithm.

@ogrisel Owner
ogrisel added a note

+1 for this suggestion and +1 for using consistently the 'algorithm' over the 'method' convention in the scikit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scikits/learn/decomposition/dict_learning.py
((196 lines not shown))
+ Parameters
+ ----------
+ X: array-like, shape (n_samples, n_features)
+ Training vector, where n_samples in the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ self: object
+ Returns the object itself
+ """
+ self.fit_transform(X, y, **params)
+ return self
+
+
+class DictionaryLearningOnline(BaseDictionaryLearning):
@mblondel Owner
mblondel added a note

I wonder if we couldn't just implement partial_fit directly in DictionaryLearning. Sure DictionaryLearningOnline contains a fit method for convenience but if data fits in memory, people could just as well use the batch dictionary learning. If you really want to keep the convenience fit method (which repeatedly calls partial_fit) you can also add a constructor parameter to DictionaryLearning to let the user choose. As a bonus point, you'll also get grid searchability.

@ogrisel Owner
ogrisel added a note

I think would still prefer to have two distinct estimators: the fit method of the DictionaryLearning would rely on the batch implementation provided by the dictionary_learning function while the DictionaryLearningOnline.fit method would be based on dictionary_learning_online method which is probably faster but converges to slightly lower quality minima in general (AFAIK).

@vene Owner
vene added a note

DictionaryLearningOnline, like MiniBatchSparsePCA (actually it's the same algorithm) is significantly faster (we don't have a stopping condition at the moment but with a capped limit of iterations results are comparable, at much higher speed).

The main difference is that MiniBatchSparsePCA cannot implement partial_fit because it's online in the features sense. DictionaryLearningOnline should be able to implement partial_fit, but some kinks need to be ironed out.

Grid searchability would be of no use at the moment since there is no useful way a classifier could be used after the dictionary learning. One of the discussions we will need to make is how to structure a collection of image patches so as not to lose the information re: to which image does a patch belong to, so we can do image recognition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/decomposition.rst
((36 lines not shown))
+
+* Orthogonal matching pursuit (:ref:`OMP`)
+
+* Lasso using least-angle regression (:ref:`least_angle_regression`)
+
+* Lasso using coordinate descent (:ref:`lasso`)
+
+* Thresholding
+
+* Triangle k-means
+
+
+The dictionary learning objects offer, via the `split_code` parameter, the
+possibility to separate the positive and negative values in the results of
+sparse coding. This is useful when dictionary learning is used for
+classification or regression tasks, because it allows the learning algorithm to assign different weights to negative loadings of a particular atom, than to the
@agramfort Owner

pep8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort
Owner

@vene : could you prepare a clean PR now that OMP is merged?

@ogrisel
Owner

@vene : also what is the status of the online sparse pca branch with the decomposition of the faces in the documentation? I cannot find the PR any more but the current master does not include this work.

@vene
Owner

@agramfort, I'm working on merging master into this, lots of conflicts, I'm very unsatisfied with how I managed this code. I clearly need to learn better git workflow.

@ogrisel: it's here #270, you were probably looking for "online" in the title :)

MiniBatchSparsePCA is the first priority for reviewing, followed by the new image patch extraction pull request that I will put together as soon as I can manage. Then all that will remain out of this particular pull request is the documentation, examples and the dictionary learning estimator framework. After this we shall proceed to KMeansCoder. Sorry for all the confusion everyone, I let a branch grow too large, and messed up the lineage somewhere along the way so that I can't easily cherrypick stuff out.

@vene vene Merge branch 'master' into sc
Conflicts:
	doc/modules/linear_model.rst
	examples/linear_model/plot_omp.py
	scikits/learn/linear_model/__init__.py
	scikits/learn/linear_model/omp.py
	scikits/learn/linear_model/tests/test_omp.py
d2cbaaa
@ogrisel
Owner

Indeed: @GaelVaroquaux and @agramfort can you please voice your opinion on:

#270 (comment)

I think that once this is cleared the MiniBatchSparsePCA PR can be merged. Also for later you should work on versions of NMF and SparsePCA / Dictionary Learning that works on scipy.sparse inputs. I think you can work on this before or after we work on reviewing the KMeansCoder stuff as you wish but before you start the matrix completion stuff we discussed on the ML.

NMF on text data sounds like a very promising way to find "sparse" and "additive" topics for semantically describing text documents:

https://www.quora.com/Has-anyone-tried-Spectral-Clustering-on-high-dimensional-sparse-data-such-as-TF-IDF-text-features

@vene
Owner
@vene
Owner

Diff looks ok, though the commit history is ugly. As soon as MiniBatchSparsePCA and PatchExtractor are merged, this pull request will be of reviewable size. I'll try to clean up its history when we get there.

@GaelVaroquaux
vene added some commits
@vene vene Merge branch 'master' into sc
Conflicts:
	scikits/learn/feature_extraction/image.py
	scikits/learn/feature_extraction/tests/test_image.py
5d1429e
@vene vene Accidentally removed a line in a test 4e1406a
@vene vene pep8 in doc 126e09f
@vene vene rename coding_method, transform_method to fit/transform_algorithm 8ec8583
@vene vene fix broken test b8c330d
@vene vene Merge branch 'master' into sc
Conflicts:
	scikits/learn/decomposition/__init__.py
	scikits/learn/decomposition/sparse_pca.py
	scikits/learn/decomposition/tests/test_sparse_pca.py
d68ebfd
@vene vene duplicated import ef7fb1c
@vene vene FIX: denoise example was broken 01e3b5a
@vene
Owner

This pull request is next on the list. Do you think that this pull request is manageable? Should I try an interactive rebase to clean up its messed up history?

After establishing this, I will shortly present what is to be reviewed about the current code here.

@GaelVaroquaux
@vene
Owner
@GaelVaroquaux
@vene
Owner

Number 2 in the comment above is now fixed! All that remains is to discuss the API. Basically this means the transform function implemented by BaseDictionaryLearning.

@GaelVaroquaux

I know I keep coming up with new comments :$, but you could simply set vmin=-.5, vmax=.5 in the imshow of the difference: -1, 1 is never achieved, and it kills the visual contrast.

In addition, I wonder if giving the norm of the difference in the title would be useful: it would enable comparison on a numerical basis.

Finally, I think that you should explain a bit more what you are doing in the docstring, and comment on the results. For instance the fact that lars (l1 penalized regression) induces a bias in the coefficients can be seen in the difference that is reminiscent of the local intensity value.

@vene
Owner

I think this is getting close to merge-quality. Would anybody care to take a look?

@GaelVaroquaux
@agramfort
Owner

what if you use the stopping condition on the norm of the residual rather than hard coding the number of atoms when using OMP? since you know the noise level it should be easy to set.

Otherwise the code looks cleaner and the API is now simple.

I would reference the JMLR paper for the online Dict learning.

Pyflakes report :

scikits/learn/decomposition/dict_learning.py:22
'euclidean_distances' imported but unused
scikits/learn/decomposition/dict_learning.py:92
local variable 'overwrite_gram' is assigned to but never used
scikits/learn/decomposition/dict_learning.py:95
local variable 'overwrite_cov' is assigned to but never used

scikits/learn/decomposition/tests/test_dict_learning.py:4
'SkipTest' imported but unused
scikits/learn/decomposition/tests/test_dict_learning.py:6
'make_sparse_coded_signal' imported but unused

we're almost there ! :)

@ogrisel
Owner

+1 for a stopping criterion on the scaled residuals rather than hard coding the number of iterations.

@ogrisel
Owner

Actually using the scaled normal of the change of the dictionary weights is probably more stable that the residuals (as we do in coordinate descent).

@agramfort
Owner
@ogrisel
Owner

The issue is that you don't have access to the complete dataset in the online / minibatch setting. Hence the use of the change on the dictionary weights as a measure of convergence.

@vene
Owner
scikits/learn/decomposition/dict_learning.py
((899 lines not shown))
+ Pseudo number generator state used for random sampling.
+
+ Attributes
+ ----------
+ components_: array, [n_atoms, n_features]
+ components extracted from the data
+
+ References
+ ----------
+ J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning
+ for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
+
+
+ See also
+ --------
+ SparsePCA
@ogrisel Owner
ogrisel added a note

Explain here the relationship between dictionary learning and sparse PCA (i.e. that one is solving the transposed problem of the other).

@vene Owner
vene added a note
@ogrisel Owner
ogrisel added a note

I don't know, just grep on the existing doc and copy one that looks good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@vene
Owner

I have merged the denoising enhancements from Alex. I've been toying with a exponentially weighted average tracking the amount of change in the dictionary, and it seems to get pretty close to zero for real data (digits), but for a randn array I found it oscillating around 50. We could have a max_iter param and warn that convergence was not reached (toying with the value of alpha can improve results, etc). Do you think we should put this in now or after merging?

@agramfort
Owner

stopping criteria is hard for general pbs especially non convex like this one. Also it's frequent to see methods converge faster on real data where the structure is present. Let's add this after merging. What I'd really like to see added is the constraint on the residual for the denoising. As you know the noise variance you should do better than fixing the number of atoms which could then be data dependent. This should improve the denoising result.

@ogrisel ogrisel commented on the diff
doc/modules/decomposition.rst
@@ -347,3 +347,105 @@ of the data.
matrix factorization"
<http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf>`_
C. Boutsidis, E. Gallopoulos, 2008
+
+
+
+.. _DictionaryLearning:
+
+Dictionary Learning
+===================
+
+Generic dictionary learning
+-------------------------
@ogrisel Owner
ogrisel added a note

missing 2 -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on the diff
doc/modules/decomposition.rst
((90 lines not shown))
+--------------------------
+
+:class:`DictionaryLearningOnline` implements a faster, but less accurate
+version of the dictionary learning algorithm that is better suited for large
+datasets.
+
+By default, :class:`DictionaryLearningOnline` divides the data into
+mini-batches and optimizes in an online manner by cycling over the mini-batches
+for the specified number of iterations. However, at the moment it does not
+implement a stopping condition.
+
+The estimator also implements `partial_fit`, which updates the dictionary by
+iterating only once over a mini-batch. This can be used for online learning
+when the data is not readily available from the start, or for when the data
+does not fit into the memory.
+
@ogrisel Owner
ogrisel added a note

Could you please update the face decomposition example to include OnlineDictionaryLearning and insert the matching plots as figure here?

@ogrisel Owner
ogrisel added a note

It would also be a good way to tell that {Online}SparsePCA and {Online}DictionaryLearning are using the same underlying implementation but one is putting the sparse penalty on the dictionary atoms while the other is putting it on the dictionary loadings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel
Owner

Ok for putting the stopping criterion after merging but we should not forget about it.

@vene
Owner

OK is this ready for merging?

@ogrisel
Owner

The documentation on dictionary learning should have the math formaul of the objective function (as for sparse PCA). Also I would move the DL section right after the section on SparsePCA to make it clear that this is the same algo but in one case we put the sparsity penalty on the dictionary components while on the other case on the dictionary loadings (a.k.a. the sparse code).

Maybe you should also include the figure obtained from the decomposition of the faces using DL (but they are arguably less interesting than the Sparse PCA since they look like the original: maybe alpha is too strong?).

@ogrisel
Owner

Also maybe the class DictionaryLearningOnline should be renamed to MiniBatchDictionaryLearning for consistency with MiniBatchKMeans and MiniBatchSparsePCA. WDYT?

@vene
Owner
@ogrisel
Owner

+1 for renaming everything pseudo-online to MiniBatch[Class] even if in the Sparse PCA case it's not in the usual n_samples axis and does not have a partial_fit method.

vene and others added some commits
@vene vene Merge branch 'master' into sc ca92354
@vene vene Merge branch 'vene-sc' of git://github.com/ogrisel/scikit-learn into sc 5841984
@GaelVaroquaux GaelVaroquaux DOC: larger lena size in denoising example
Large size work better because they give a better training set to the
dictionary_learning algorithm. This is a tradeoff between computation
time and quality of example
f0b0cfe
@GaelVaroquaux GaelVaroquaux commented on the diff
sklearn/decomposition/dict_learning.py
((149 lines not shown))
+ new_code = np.sign(cov) * np.maximum(np.abs(cov) - alpha, 0)
+
+ elif algorithm == 'omp':
+ if n_nonzero_coefs is None and alpha is None:
+ n_nonzero_coefs = n_features / 10
+ norms_squared = np.sum((Y ** 2), axis=0)
+ new_code = orthogonal_mp_gram(gram, cov, n_nonzero_coefs, alpha,
+ norms_squared, overwrite_Xy=overwrite_cov
+ )
+ else:
+ raise NotImplemented('Sparse coding method %s not implemented' %
+ algorithm)
+ return new_code
+
+
+def sparse_encode_parallel(X, Y, gram=None, cov=None, algorithm='lasso_lars',
@GaelVaroquaux Owner

I believe that this should be moved to linear_models and renamed something like 'multivariate_lasso'. I am worried that in the current situation, people starting from the lasso solver will not find it.

@ogrisel Owner
ogrisel added a note

I don't think we should call it "multivariate lasso" as this is not restricted to lasso but also works for OMP and simple thresholding too. I find the current function name much more explicit.

@vene Owner
vene added a note

We could add See alsos?

@agramfort Owner

more multitask_lasso than multivariate_lasso but I feel it's too much jargon.
+1 for see also and maybe latter refactor lasso_lars and lasso_cd to support multiple inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/dict_learning.py
((342 lines not shown))
+ X: array of shape (n_samples, n_features)
+ Data matrix.
+
+ n_atoms: int,
+ Number of dictionary atoms to extract.
+
+ alpha: int,
+ Sparsity controlling parameter.
+
+ max_iter: int,
+ Maximum number of iterations to perform.
+
+ tol: float,
+ Tolerance for the stopping condition.
+
+ method: {'lasso_lars', 'lasso_cd'}
@GaelVaroquaux Owner

I guess this should be renamed to 'lars' or 'cv'.

@agramfort Owner

+1 for 'lars' and 'cd' as only lasso makes sense here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@GaelVaroquaux

I made a couple of comments in the github diff. In addition I made a pull request. Once these are done, I am +1 for merge/

@vene
Owner

I think I addressed everything. I left the function name dictionary_learning_online, it could also be renamed as minibatch_dictionary_learning or mini_batch_dictionary_learning, do you think that should be done?

@ogrisel
Owner

+0 for minibatch_dictionary_learning or online_dictionary_learning. Running the tests / doc right now.

@ogrisel
Owner

It seems that the new figure in the MiniBatchDictionaryLearning section of the doc is pointing to the wrong image (MiniBatchSparsePCA) and the alignment is weird: the lena patches are centered and the figure for the faces decomposition below on the left. I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application.

@agramfort agramfort commented on the diff
sklearn/decomposition/dict_learning.py
((48 lines not shown))
+ lasso_cd: uses the coordinate descent method to compute the
+ Lasso solution (linear_model.Lasso). lasso_lars will be faster if
+ the estimated components are sparse.
+ omp: uses orthogonal matching pursuit to estimate the sparse solution
+ threshold: squashes to zero all coefficients less than alpha from
+ the projection X.T * Y
+
+ n_nonzero_coefs: int, 0.1 * n_features by default
+ Number of nonzero coefficients to target in each column of the
+ solution. This is only used by `algorithm='lars'` and `algorithm='omp'`
+ and is overridden by `alpha` in the `omp` case.
+
+ alpha: float, 1. by default
+ If `algorithm='lasso_lars'` or `algorithm='lasso_cd'`, `alpha` is the
+ penalty applied to the L1 norm.
+ If `algorithm='threhold'`, `alpha` is the absolute value of the
@agramfort Owner

s/threhold/threshold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on the diff
sklearn/decomposition/dict_learning.py
((52 lines not shown))
+ threshold: squashes to zero all coefficients less than alpha from
+ the projection X.T * Y
+
+ n_nonzero_coefs: int, 0.1 * n_features by default
+ Number of nonzero coefficients to target in each column of the
+ solution. This is only used by `algorithm='lars'` and `algorithm='omp'`
+ and is overridden by `alpha` in the `omp` case.
+
+ alpha: float, 1. by default
+ If `algorithm='lasso_lars'` or `algorithm='lasso_cd'`, `alpha` is the
+ penalty applied to the L1 norm.
+ If `algorithm='threhold'`, `alpha` is the absolute value of the
+ threshold below which coefficients will be squashed to zero.
+ If `algorithm='omp'`, `alpha` is the tolerance parameter: the value of
+ the reconstruction error targeted. In this case, it overrides
+ `n_nonzero_coefs`.
@agramfort Owner

that makes me think that add the constrain on the l2 reconstruction error had omp_rec_error ? @ogrisel, @GaelVaroquaux thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/dict_learning.py
((518 lines not shown))
+ verbose:
+ degree of output the procedure will print
+
+ shuffle: boolean,
+ whether to shuffle the data before splitting it in batches
+
+ n_jobs: int,
+ number of parallel jobs to run, or -1 to autodetect.
+
+ method: {'lasso_lars', 'lasso_cd'}
+ lasso_lars: uses the least angle regression method
+ (linear_model.lars_path)
+ lasso_cd: uses the coordinate descent method to compute the
+ Lasso solution (linear_model.Lasso). Lars will be faster if
+ the estimated components are sparse.
+
@agramfort Owner

lars and cd here also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/sparse_pca.py
@@ -553,9 +36,10 @@ class SparsePCA(BaseEstimator, TransformerMixin):
tol: float,
Tolerance for the stopping condition.
- method: {'lars', 'cd'}
- lars: uses the least angle regression method (linear_model.lars_path)
- cd: uses the coordinate descent method to compute the
+ method: {'lasso_lars', 'lasso_cd'}
+ lasso_lars: uses the least angle regression method
@agramfort Owner

here also lars or cd

@vene Owner
vene added a note

How about also algorithm instead of method? or even fit_algorithm for consistency with dictionary learning classes?

@ogrisel Owner
ogrisel added a note

Yes we already had this discussion in the precedent comments. I am still in favor of using "algorithm" pervasively but that would require updating lars_path and LocallyLinearEmbedding and maybe others.

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

@vene Owner
vene added a note
@GaelVaroquaux Owner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort
Owner

besides my comments I would rename plot_img_denoising.py to plot_image_denoising.py

also I'd love to see added the omp_rec_error for a better result in plot_img_denoising.py but I don't wand to block the merge

sklearn/decomposition/dict_learning.py
((703 lines not shown))
+
+ Parameters
+ ----------
+ n_atoms: int,
+ number of dictionary elements to extract
+
+ alpha: int,
+ sparsity controlling parameter
+
+ max_iter: int,
+ maximum number of iterations to perform
+
+ tol: float,
+ tolerance for numerical error
+
+ fit_algorithm: {'lasso_lars', 'lasso_cd'}
@vene Owner
vene added a note

And here too, lars and cd, right? and in the rest of the objects.

@ogrisel Owner
ogrisel added a note

Yes only but where the lasso_ part is mandatory. E.g. not for the transform_algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel
Owner

@agramfort: I am not familiar with what omp_rec_error is all about. So I agree we should report that discussion for another pull request.

@ogrisel
Owner

About the doc, this part of my comment was not addressed: "I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application."

Also I think it should be compared to the PCA output as done for all other methods in this chapter so as to keep the chapter consistent.

@ogrisel
Owner

Also when I run the decomp example I often have 2 or 4 of atoms that are not white noise and the non noisy components look almost duplicated or one is a near negative of another. Maybe the L1 reg is too strong the algorithm is not stable on this data for so small dictionaries (6 atoms only, this is far from overcomplete in this regime...).

@vene
Owner
@vene
Owner
@GaelVaroquaux
@ogrisel
Owner

Ok this looks good for me. +1 for merge.

@GaelVaroquaux
@ogrisel ogrisel merged commit d56281e into scikit-learn:master
@ogrisel
Owner

Merged. Thanks again for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.