Skip to content
This repository

Dictionary learning #221

Merged
merged 190 commits into from over 2 years ago

7 participants

Vlad Niculae Gael Varoquaux Olivier Grisel Alexandre Gramfort Lars Buitinck Mathieu Blondel Fabian Pedregosa
Vlad Niculae
Owner
vene commented June 25, 2011

Pull request contains: (UPDATED)

BaseDictionaryLearning object implementing transform methods
DictionaryLearning and OnlineDictionaryLearning implementing fit in different ways

Dictionary learning example
Image denoising example

Alexandre Gramfort

avoid lambda they won't pickle.

Alexandre Gramfort

I would rename D with X (standard in linear_model module)

Alexandre Gramfort

OMP ref is [mallat 93]

Alexandre Gramfort

y shoudl be (n_samples, n_targets) so y[i] is still sample i.

Alexandre Gramfort

you don't need the \

Alexandre Gramfort

doc formatting pb: Should have a Parameters and Returns sections. A reference to [Mallat 93] should be added too.

Alexandre Gramfort

teh Returns section should be there also the coefs should be named w for consistancy

and others added some commits June 15, 2011
Vlad Niculae Renaming, some transposing 0cb2e3c
Vlad Niculae Tests and the refactoring they induce 12fcc75
Vlad Niculae PEP8 7b4cdb0
Vlad Niculae Added signal recovery test 09e17cf
Vlad Niculae rigurous pep8 69ccac9
Vlad Niculae Added the example b3e6f81
Vlad Niculae Cosmetized the example 0264056
Vlad Niculae Added Olivier's patch extractor with enhancements 873256b
Vlad Niculae cleanup 6ee443a
Vlad Niculae Tests for various cases 06067f6
Vlad Niculae PEP8, renaming, removed image size from params f1df50f
Vlad Niculae Merge branch 'sparsepca' into sc 898777a
Vlad Niculae Merge branch 'sparsepca' into sc 1de74af
Vlad Niculae Revert "FIX: update_V without warm restart"
This reverts commit 38235cf.
e1ef751
Vlad Niculae Merge branch 'sparsepca' into sc 2cef229
Olivier Grisel FIX: make the dataset doctest fixture modular 00c976e
Olivier Grisel typo 2c425fa
Lars Buitinck document placement new in SVMlight reader 7ee358f
Mathieu Blondel Documentation fixes. 7158ee3
Vlad Niculae Initial integration of Orthogonal MP 23704df
Vlad Niculae Renaming, some transposing 170cc2f
Vlad Niculae Tests and the refactoring they induce aa3bd39
Vlad Niculae PEP8 487a8e1
Vlad Niculae Added signal recovery test 6c74b15
Vlad Niculae rigurous pep8 9f5f0c3
Vlad Niculae Added the example 26fad8d
Vlad Niculae Cosmetized the example 37dc0c5
Vlad Niculae Added Olivier's patch extractor with enhancements b8ed9cd
Vlad Niculae cleanup afd860a
Vlad Niculae Tests for various cases dde98c0
Vlad Niculae PEP8, renaming, removed image size from params 057b7f3
Vlad Niculae FIX: weird branching accident 77b6612
Vlad Niculae Revert "FIX: update_V without warm restart"
This reverts commit 38235cf.
c02d81e
Vlad Niculae Revert "Revert "FIX: update_V without warm restart""
This reverts commit a557817.
7aa4926
Vlad Niculae Merge branch 'sc' of github.com:vene/scikit-learn into sc cfd5b34
Vlad Niculae FIX: update_V without warm restart 1598c40
Vlad Niculae Added dictionary learning example 34c6585
Alexandre Gramfort ENH : prettify dict learn example on image patches 43acd7f
Alexandre Gramfort pep8 ad5b4ca
Vlad Niculae Merge pull request #3 from agramfort/sc
Sc
2a256ff
Vlad Niculae renaming for consistency, tests for PatchExtractor 849ff59
Vlad Niculae Initial shape of dictionary learning object 7af69d7
Vlad Niculae Added DictionaryLearning to __init__.py bf613b0
Vlad Niculae FIX: silly bugs so that the example runs 8550f10
Vlad Niculae ENH: Tweaked the example a bit 620b907
Vlad Niculae PEP8 0c51fba
Alexandre Gramfort FIX : using product form utils.fixes for python 2.5 2fd33fd
Alexandre Gramfort pep8 badddc8
Alexandre Gramfort MISC : fix docstring, cosmit in image.py a9ea545
Alexandre Gramfort FIX; missing import in dict_learning.py (OMP in transform in not tested 98f592d
Gael Varoquaux

I'd like the instantiation of the object to be done in a separate step than the fit, as some people might be confused.

Gael Varoquaux

seed should be called random_state. That remark applies all throughout the package.

Gael Varoquaux
Owner

There are unused imports in your code. Please run pyflakes on all the files (I use a shell for loop to do this).

Gael Varoquaux
Owner

As discussed by mail, sparse_pca should be turned to a dict_learning function, and work on the transposed problem. Only objects should be exposed as SparsePCA. This will mean that you will need to rename the logging messages.

EDIT: Indeed, after rereading the codebase, there is a clean-up required to make sure that the vocabulary is consistent, and the imports paths make sens.

Gael Varoquaux

I have been running this example, and noticing that the results are much more consistent with my expectations when user higher values for max_patches. Once we have the online version working, we will need to get this parameter back up.

Fabian Pedregosa
Owner

Men! You just created a branch sc on origin, maybe you pushed to the wrong repo ?

Owner

haha. yes, happens to me all the time.

I think the fix is something like git push origin :sc (which of course makes a lot of sense ...)

scikits/learn/linear_model/omp.py
((160 lines not shown))
  160
+        Whether to perform precomputations. Improves performance when n_targets
  161
+        or n_samples is very large.
  162
+
  163
+    Returns:
  164
+    --------
  165
+    coef: array of shape: n_features or (n_features, n_targets)
  166
+        Coefficients of the OMP solution
  167
+    """
  168
+    X = np.asanyarray(X)
  169
+    y = np.asanyarray(y)
  170
+    if y.ndim == 1:
  171
+        y = y[:, np.newaxis]
  172
+    if n_atoms == None and eps == None:
  173
+        raise ValueError('OMP needs either a target number of atoms (n_atoms) \
  174
+                         or a target residual error (eps)')
  175
+    if eps != None and eps < 0:
4
Lars Buitinck Owner
larsmans added a note June 26, 2011

If the value must be positive, shouldn't the check be eps <= 0 then? Same question for n_atoms, below.

Vlad Niculae Owner
vene added a note June 28, 2011

About n_atoms you are right, but I think we should accept eps = 0 for when you want perfect reconstruction... do you agree?

Lars Buitinck Owner
larsmans added a note June 28, 2011

I'm not even remotely familiar with dictionary learning, so don't ask for comments on the actual algorithm :)
It was only the exception message that I was concerned about. Looks alright now.

Gael Varoquaux Owner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
and others added some commits June 26, 2011
Lars Buitinck Copyedit SparsePCA docs c7365e2
Vlad Niculae Merge pull request #5 from agramfort/sc
Sc
24c3a68
Vlad Niculae Merge branch 'master' into sc 9847371
Vlad Niculae Merge branch 'sparsepca' into sc 1b76e96
Vlad Niculae Merge branch 'sc' of git://github.com/larsmans/scikit-learn into sc c01ea0e
Vlad Niculae Renamed online dict_learning appropriately f68b50a
Vlad Niculae Merge branch 'sparsepca' into sc 61ee071
Vlad Niculae Renaming part three b579582
Vlad Niculae Fixed dico learning example d014aad
Vlad Niculae Merge branch 'sparsepca' into sc b99397a
Vlad Niculae FIX: checks in orthogonal_mp df89fab
Vlad Niculae Cleanup orthogonal_mp docstrings 5c3bafd
Vlad Niculae OMP docs, a little broken for now 484d9bd
Vlad Niculae DOC: omp documentation improved c1e234e
Vlad Niculae DOC: omp documentation fixes a48b00e
Vlad Niculae DOC: dict_learning docs 8fb39dd
Vlad Niculae dictionary learning tests 786ce12
Vlad Niculae Fixed overcomplete case and updated dl example 598aee0
Vlad Niculae online dictionary learning object 2f741ac
Vlad Niculae factored base dico object ff671f5
Vlad Niculae Merge branch 'sparsepca' into sc
Conflicts:
	scikits/learn/decomposition/sparse_pca.py
e3ef711
Vlad Niculae pep8 16ac0ec
Vlad Niculae more transform methods, split_sign db08066
Vlad Niculae OMP dictionary must have normalized columns. 7826f94
Vlad Niculae Merge branch 'master' into sc 8a553f5
Vlad Niculae DOC: improved dict learning docs d08d58b
Vlad Niculae Tweaked the dico example f65dec3
Vlad Niculae exposed dict learning online in init 75773ff
Vlad Niculae working on partial fit b06f121
Vlad Niculae denoising example bad0431
Vlad Niculae Annotate the example 5ae3c91
Vlad Niculae partial fit iteration tracking, test still fails 7b42c64
Vlad Niculae FIX: typo, s/treshold/threshold 3bdc425
Vlad Niculae Tweak denoise example spacing 1fd8277
Vlad Niculae pep8 examples ef41e46
Vlad Niculae pep8 b584bab
Vlad Niculae Merge branch 'sparsepca' into sc c154810
Vlad Niculae Merge branch 'sparsepca' into sc 80b9854
Vlad Niculae random state control 8065cf7
Vlad Niculae Merge branch 'sparsepca' into sc e95d364
Vlad Niculae clarify lasso method param in sc too 78cf3bf
Vlad Niculae s/seed/random_state in patch extractor 61cb912
Vlad Niculae DOC: fixed patch extraction comments cb9dbd0
Vlad Niculae ENH: PatchExtractor transform 767de18
Vlad Niculae d:/progs/Git/s/seed/random_state in dico learning example 346330b
Vlad Niculae d:/progs/Git/s/seed/random_state in denoising example a67b287
Vlad Niculae Merge branch 'sparsepca' into sc 7ec38ba
Vlad Niculae Merge branch 'sparsepca' into sc
Conflicts:
	scikits/learn/feature_extraction/image.py
	scikits/learn/feature_extraction/tests/test_image.py
99a64df
Olivier Grisel
Owner

@vene can you please merge master into this update the github diff view with the current state?

Vlad Niculae
Owner
vene commented July 20, 2011
Vlad Niculae Merge branch 'master' into sc
Conflicts:
	doc/modules/decomposition.rst
	scikits/learn/decomposition/__init__.py
	scikits/learn/decomposition/tests/test_sparse_pca.py
a7fc975
Vlad Niculae
Owner
vene commented July 21, 2011

How about I make a separate pull request with just the dict_learning_online function and a OnlineSparsePCA or MiniBatchSparsePCA using it, so that it can be merged faster and then used to speed up the face decomposition example?

Olivier Grisel
Owner

+1 for smaller pull requests as long as they are self-contained (code + test + doc + update of the impacted examples).

Gael Varoquaux
Owner
Olivier Grisel
Owner

@vene Have you received my pull request for this?

scikits/learn/decomposition/dict_learning.py
((94 lines not shown))
  94
+
  95
+    max_iter: int,
  96
+        maximum number of iterations to perform
  97
+
  98
+    tol: float,
  99
+        tolerance for numerical error
  100
+
  101
+    transform_method: 'lasso_lars' | 'lasso_cd' | 'omp' | 'threshold' |
  102
+                      'triangle'
  103
+        method to use for transforming the data after the dictionary has been
  104
+        learned
  105
+
  106
+    coding_method: 'lars' | 'cd'
  107
+        lars: uses the least angle regression method (linear_model.lars_path)
  108
+        cd: uses the stochastic gradient descent method to compute the
  109
+            lasso solution (linear_model.Lasso)
13
Mathieu Blondel Owner
mblondel added a note August 06, 2011

Ok I may be talking nonsense here since I don't know dictionary learning that well but to me, transform_method and coding_method sound like they are doing the same thing (using an existing dictionary to transform data is encoding the data for me). So I would suggest encoding_method and dictionary_learning_method or just learning_method for short. What do you think @agramfort, @vene?

Olivier Grisel Owner
ogrisel added a note August 06, 2011

I would vote dictionary_method used in fit to estimate the dictionary components and coding_method used in transform to find the loadings to encode the input signal according to a given dictionary.

Mathieu Blondel Owner
mblondel added a note August 06, 2011

Ok +1. My main concern was that transform_method and coding_method really sound like the same thing.

Vlad Niculae Owner
vene added a note August 07, 2011

+1. I don't know if I will update this particular pull request because the git tree might have been messed up from it being fragmented and updated separately. I hope it can be fixed.

Alexandre Gramfort Owner
agramfort added a note August 07, 2011

coding_method is the algo used for the fit ie. should probably be named fit_method but I am fine with coding_method also. Note that CD is coordinate_descent which is the same à stochastic gradient descent.

Mathieu Blondel Owner
mblondel added a note August 07, 2011
Olivier Grisel Owner
ogrisel added a note August 08, 2011

Just coordinate descent. The scikit-learn implementation of CD is purely deterministic.

Vlad Niculae Owner
vene added a note August 10, 2011

How about I simply name them fit_method and transform_method? Wouldn't it be clearer?

Lars Buitinck Owner
larsmans added a note August 10, 2011

"Method" is confusing in an OO setting. I'd prefer fit_algorithm and transform_algorithm.

Vlad Niculae Owner
vene added a note August 10, 2011

This is an excellent point. However, "method" is consistent with lars_path, and possibly other places in the codebase. Of course, using my suggestion, it's confusing because the object actually has fit and transorm methods. What should we do?

Lars Buitinck Owner
larsmans added a note August 10, 2011

Various other estimators take algorithm parameters, so something is bound to be inconsistent ;)

Vlad Niculae Owner
vene added a note August 10, 2011

In this case, I am +1 for fit_algorithm and transform_algorithm.

Olivier Grisel Owner
ogrisel added a note August 11, 2011

+1 for this suggestion and +1 for using consistently the 'algorithm' over the 'method' convention in the scikit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scikits/learn/decomposition/dict_learning.py
((196 lines not shown))
  196
+        Parameters
  197
+        ----------
  198
+        X: array-like, shape (n_samples, n_features)
  199
+            Training vector, where n_samples in the number of samples
  200
+            and n_features is the number of features.
  201
+
  202
+        Returns
  203
+        -------
  204
+        self: object
  205
+            Returns the object itself
  206
+        """
  207
+        self.fit_transform(X, y, **params)
  208
+        return self
  209
+
  210
+
  211
+class DictionaryLearningOnline(BaseDictionaryLearning):
3
Mathieu Blondel Owner
mblondel added a note August 06, 2011

I wonder if we couldn't just implement partial_fit directly in DictionaryLearning. Sure DictionaryLearningOnline contains a fit method for convenience but if data fits in memory, people could just as well use the batch dictionary learning. If you really want to keep the convenience fit method (which repeatedly calls partial_fit) you can also add a constructor parameter to DictionaryLearning to let the user choose. As a bonus point, you'll also get grid searchability.

Olivier Grisel Owner
ogrisel added a note August 07, 2011

I think would still prefer to have two distinct estimators: the fit method of the DictionaryLearning would rely on the batch implementation provided by the dictionary_learning function while the DictionaryLearningOnline.fit method would be based on dictionary_learning_online method which is probably faster but converges to slightly lower quality minima in general (AFAIK).

Vlad Niculae Owner
vene added a note August 07, 2011

DictionaryLearningOnline, like MiniBatchSparsePCA (actually it's the same algorithm) is significantly faster (we don't have a stopping condition at the moment but with a capped limit of iterations results are comparable, at much higher speed).

The main difference is that MiniBatchSparsePCA cannot implement partial_fit because it's online in the features sense. DictionaryLearningOnline should be able to implement partial_fit, but some kinks need to be ironed out.

Grid searchability would be of no use at the moment since there is no useful way a classifier could be used after the dictionary learning. One of the discussions we will need to make is how to structure a collection of image patches so as not to lose the information re: to which image does a patch belong to, so we can do image recognition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/decomposition.rst
((36 lines not shown))
  330
+
  331
+* Orthogonal matching pursuit (:ref:`OMP`)
  332
+
  333
+* Lasso using least-angle regression (:ref:`least_angle_regression`)
  334
+
  335
+* Lasso using coordinate descent (:ref:`lasso`)
  336
+
  337
+* Thresholding
  338
+
  339
+* Triangle k-means
  340
+
  341
+
  342
+The dictionary learning objects offer, via the `split_code` parameter, the
  343
+possibility to separate the positive and negative values in the results of 
  344
+sparse coding. This is useful when dictionary learning is used for 
  345
+classification or regression tasks, because it allows the learning algorithm to assign different weights to negative loadings of a particular atom, than to the
1
Alexandre Gramfort Owner
agramfort added a note August 07, 2011

pep8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Alexandre Gramfort
Owner

@vene : could you prepare a clean PR now that OMP is merged?

Olivier Grisel
Owner

@vene : also what is the status of the online sparse pca branch with the decomposition of the faces in the documentation? I cannot find the PR any more but the current master does not include this work.

Vlad Niculae
Owner
vene commented August 08, 2011

@agramfort, I'm working on merging master into this, lots of conflicts, I'm very unsatisfied with how I managed this code. I clearly need to learn better git workflow.

@ogrisel: it's here #270, you were probably looking for "online" in the title :)

MiniBatchSparsePCA is the first priority for reviewing, followed by the new image patch extraction pull request that I will put together as soon as I can manage. Then all that will remain out of this particular pull request is the documentation, examples and the dictionary learning estimator framework. After this we shall proceed to KMeansCoder. Sorry for all the confusion everyone, I let a branch grow too large, and messed up the lineage somewhere along the way so that I can't easily cherrypick stuff out.

Vlad Niculae Merge branch 'master' into sc
Conflicts:
	doc/modules/linear_model.rst
	examples/linear_model/plot_omp.py
	scikits/learn/linear_model/__init__.py
	scikits/learn/linear_model/omp.py
	scikits/learn/linear_model/tests/test_omp.py
d2cbaaa
Olivier Grisel
Owner

Indeed: @GaelVaroquaux and @agramfort can you please voice your opinion on:

#270 (comment)

I think that once this is cleared the MiniBatchSparsePCA PR can be merged. Also for later you should work on versions of NMF and SparsePCA / Dictionary Learning that works on scipy.sparse inputs. I think you can work on this before or after we work on reviewing the KMeansCoder stuff as you wish but before you start the matrix completion stuff we discussed on the ML.

NMF on text data sounds like a very promising way to find "sparse" and "additive" topics for semantically describing text documents:

https://www.quora.com/Has-anyone-tried-Spectral-Clustering-on-high-dimensional-sparse-data-such-as-TF-IDF-text-features

Vlad Niculae
Owner
vene commented August 08, 2011
Vlad Niculae
Owner
vene commented August 08, 2011

Diff looks ok, though the commit history is ugly. As soon as MiniBatchSparsePCA and PatchExtractor are merged, this pull request will be of reviewable size. I'll try to clean up its history when we get there.

Gael Varoquaux
Owner
added some commits August 09, 2011
Vlad Niculae Merge branch 'master' into sc
Conflicts:
	scikits/learn/feature_extraction/image.py
	scikits/learn/feature_extraction/tests/test_image.py
5d1429e
Vlad Niculae Accidentally removed a line in a test 4e1406a
Vlad Niculae pep8 in doc 126e09f
Vlad Niculae rename coding_method, transform_method to fit/transform_algorithm 8ec8583
Vlad Niculae fix broken test b8c330d
Vlad Niculae Merge branch 'master' into sc
Conflicts:
	scikits/learn/decomposition/__init__.py
	scikits/learn/decomposition/sparse_pca.py
	scikits/learn/decomposition/tests/test_sparse_pca.py
d68ebfd
Vlad Niculae duplicated import ef7fb1c
Vlad Niculae FIX: denoise example was broken 01e3b5a
Vlad Niculae
Owner
vene commented August 12, 2011

This pull request is next on the list. Do you think that this pull request is manageable? Should I try an interactive rebase to clean up its messed up history?

After establishing this, I will shortly present what is to be reviewed about the current code here.

Gael Varoquaux
Owner
Vlad Niculae
Owner
vene commented August 12, 2011
Gael Varoquaux
Owner
Vlad Niculae
Owner
vene commented August 12, 2011

Number 2 in the comment above is now fixed! All that remains is to discuss the API. Basically this means the transform function implemented by BaseDictionaryLearning.

Gael Varoquaux
Owner

I know I keep coming up with new comments :$, but you could simply set vmin=-.5, vmax=.5 in the imshow of the difference: -1, 1 is never achieved, and it kills the visual contrast.

In addition, I wonder if giving the norm of the difference in the title would be useful: it would enable comparison on a numerical basis.

Finally, I think that you should explain a bit more what you are doing in the docstring, and comment on the results. For instance the fact that lars (l1 penalized regression) induces a bias in the coefficients can be seen in the difference that is reminiscent of the local intensity value.

Vlad Niculae
Owner
vene commented August 24, 2011

I think this is getting close to merge-quality. Would anybody care to take a look?

Gael Varoquaux
Owner
Alexandre Gramfort
Owner

what if you use the stopping condition on the norm of the residual rather than hard coding the number of atoms when using OMP? since you know the noise level it should be easy to set.

Otherwise the code looks cleaner and the API is now simple.

I would reference the JMLR paper for the online Dict learning.

Pyflakes report :

scikits/learn/decomposition/dict_learning.py:22
'euclidean_distances' imported but unused
scikits/learn/decomposition/dict_learning.py:92
local variable 'overwrite_gram' is assigned to but never used
scikits/learn/decomposition/dict_learning.py:95
local variable 'overwrite_cov' is assigned to but never used

scikits/learn/decomposition/tests/test_dict_learning.py:4
'SkipTest' imported but unused
scikits/learn/decomposition/tests/test_dict_learning.py:6
'make_sparse_coded_signal' imported but unused

we're almost there ! :)

Olivier Grisel
Owner

+1 for a stopping criterion on the scaled residuals rather than hard coding the number of iterations.

Olivier Grisel
Owner

Actually using the scaled normal of the change of the dictionary weights is probably more stable that the residuals (as we do in coordinate descent).

Alexandre Gramfort
Owner
Olivier Grisel
Owner

The issue is that you don't have access to the complete dataset in the online / minibatch setting. Hence the use of the change on the dictionary weights as a measure of convergence.

Vlad Niculae
Owner
vene commented August 28, 2011
scikits/learn/decomposition/dict_learning.py
((899 lines not shown))
  899
+        Pseudo number generator state used for random sampling.
  900
+
  901
+    Attributes
  902
+    ----------
  903
+    components_: array, [n_atoms, n_features]
  904
+        components extracted from the data
  905
+
  906
+    References
  907
+    ----------
  908
+    J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning
  909
+    for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
  910
+
  911
+
  912
+    See also
  913
+    --------
  914
+    SparsePCA
3
Olivier Grisel Owner

Explain here the relationship between dictionary learning and sparse PCA (i.e. that one is solving the transposed problem of the other).

Vlad Niculae Owner
vene added a note September 03, 2011
Olivier Grisel Owner

I don't know, just grep on the existing doc and copy one that looks good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Vlad Niculae
Owner

I have merged the denoising enhancements from Alex. I've been toying with a exponentially weighted average tracking the amount of change in the dictionary, and it seems to get pretty close to zero for real data (digits), but for a randn array I found it oscillating around 50. We could have a max_iter param and warn that convergence was not reached (toying with the value of alpha can improve results, etc). Do you think we should put this in now or after merging?

Alexandre Gramfort
Owner

stopping criteria is hard for general pbs especially non convex like this one. Also it's frequent to see methods converge faster on real data where the structure is present. Let's add this after merging. What I'd really like to see added is the constraint on the residual for the denoising. As you know the noise variance you should do better than fixing the number of atoms which could then be data dependent. This should improve the denoising result.

Olivier Grisel ogrisel commented on the diff September 12, 2011
doc/modules/decomposition.rst
@@ -347,3 +347,105 @@ of the data.
347 347
       matrix factorization"
348 348
       <http://www.cs.rpi.edu/~boutsc/files/nndsvd.pdf>`_
349 349
       C. Boutsidis, E. Gallopoulos, 2008
  350
+
  351
+
  352
+
  353
+.. _DictionaryLearning:
  354
+
  355
+Dictionary Learning
  356
+===================
  357
+
  358
+Generic dictionary learning
  359
+-------------------------
1
Olivier Grisel Owner

missing 2 -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Olivier Grisel ogrisel commented on the diff September 12, 2011
doc/modules/decomposition.rst
((90 lines not shown))
  436
+--------------------------
  437
+
  438
+:class:`DictionaryLearningOnline` implements a faster, but less accurate
  439
+version of the dictionary learning algorithm that is better suited for large
  440
+datasets. 
  441
+
  442
+By default, :class:`DictionaryLearningOnline` divides the data into
  443
+mini-batches and optimizes in an online manner by cycling over the mini-batches
  444
+for the specified number of iterations. However, at the moment it does not
  445
+implement a stopping condition.
  446
+
  447
+The estimator also implements `partial_fit`, which updates the dictionary by
  448
+iterating only once over a mini-batch. This can be used for online learning
  449
+when the data is not readily available from the start, or for when the data
  450
+does not fit into the memory.
  451
+
2
Olivier Grisel Owner

Could you please update the face decomposition example to include OnlineDictionaryLearning and insert the matching plots as figure here?

Olivier Grisel Owner

It would also be a good way to tell that {Online}SparsePCA and {Online}DictionaryLearning are using the same underlying implementation but one is putting the sparse penalty on the dictionary atoms while the other is putting it on the dictionary loadings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Olivier Grisel
Owner

Ok for putting the stopping criterion after merging but we should not forget about it.

Vlad Niculae
Owner

OK is this ready for merging?

Olivier Grisel
Owner

The documentation on dictionary learning should have the math formaul of the objective function (as for sparse PCA). Also I would move the DL section right after the section on SparsePCA to make it clear that this is the same algo but in one case we put the sparsity penalty on the dictionary components while on the other case on the dictionary loadings (a.k.a. the sparse code).

Maybe you should also include the figure obtained from the decomposition of the faces using DL (but they are arguably less interesting than the Sparse PCA since they look like the original: maybe alpha is too strong?).

Olivier Grisel
Owner

Also maybe the class DictionaryLearningOnline should be renamed to MiniBatchDictionaryLearning for consistency with MiniBatchKMeans and MiniBatchSparsePCA. WDYT?

Vlad Niculae
Owner
Olivier Grisel
Owner

+1 for renaming everything pseudo-online to MiniBatch[Class] even if in the Sparse PCA case it's not in the usual n_samples axis and does not have a partial_fit method.

and others added some commits September 17, 2011
Vlad Niculae Merge branch 'master' into sc ca92354
Vlad Niculae Merge branch 'vene-sc' of git://github.com/ogrisel/scikit-learn into sc 5841984
Gael Varoquaux DOC: larger lena size in denoising example
Large size work better because they give a better training set to the
dictionary_learning algorithm. This is a tradeoff between computation
time and quality of example
f0b0cfe
Gael Varoquaux GaelVaroquaux commented on the diff September 16, 2011
sklearn/decomposition/dict_learning.py
((149 lines not shown))
  149
+        new_code = np.sign(cov) * np.maximum(np.abs(cov) - alpha, 0)
  150
+
  151
+    elif algorithm == 'omp':
  152
+        if n_nonzero_coefs is None and alpha is None:
  153
+            n_nonzero_coefs = n_features / 10
  154
+        norms_squared = np.sum((Y ** 2), axis=0)
  155
+        new_code = orthogonal_mp_gram(gram, cov, n_nonzero_coefs, alpha,
  156
+                                      norms_squared, overwrite_Xy=overwrite_cov
  157
+                                      )
  158
+    else:
  159
+        raise NotImplemented('Sparse coding method %s not implemented' %
  160
+                             algorithm)
  161
+    return new_code
  162
+
  163
+
  164
+def sparse_encode_parallel(X, Y, gram=None, cov=None, algorithm='lasso_lars',
4
Gael Varoquaux Owner

I believe that this should be moved to linear_models and renamed something like 'multivariate_lasso'. I am worried that in the current situation, people starting from the lasso solver will not find it.

Olivier Grisel Owner

I don't think we should call it "multivariate lasso" as this is not restricted to lasso but also works for OMP and simple thresholding too. I find the current function name much more explicit.

Vlad Niculae Owner
vene added a note September 17, 2011

We could add See alsos?

Alexandre Gramfort Owner

more multitask_lasso than multivariate_lasso but I feel it's too much jargon.
+1 for see also and maybe latter refactor lasso_lars and lasso_cd to support multiple inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/dict_learning.py
((342 lines not shown))
  342
+    X: array of shape (n_samples, n_features)
  343
+        Data matrix.
  344
+
  345
+    n_atoms: int,
  346
+        Number of dictionary atoms to extract.
  347
+
  348
+    alpha: int,
  349
+        Sparsity controlling parameter.
  350
+
  351
+    max_iter: int,
  352
+        Maximum number of iterations to perform.
  353
+
  354
+    tol: float,
  355
+        Tolerance for the stopping condition.
  356
+
  357
+    method: {'lasso_lars', 'lasso_cd'}
2
Gael Varoquaux Owner

I guess this should be renamed to 'lars' or 'cv'.

Alexandre Gramfort Owner

+1 for 'lars' and 'cd' as only lasso makes sense here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Gael Varoquaux

I made a couple of comments in the github diff. In addition I made a pull request. Once these are done, I am +1 for merge/

Vlad Niculae
Owner

I think I addressed everything. I left the function name dictionary_learning_online, it could also be renamed as minibatch_dictionary_learning or mini_batch_dictionary_learning, do you think that should be done?

Olivier Grisel
Owner

+0 for minibatch_dictionary_learning or online_dictionary_learning. Running the tests / doc right now.

Olivier Grisel
Owner

It seems that the new figure in the MiniBatchDictionaryLearning section of the doc is pointing to the wrong image (MiniBatchSparsePCA) and the alignment is weird: the lena patches are centered and the figure for the faces decomposition below on the left. I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application.

Alexandre Gramfort agramfort commented on the diff September 18, 2011
sklearn/decomposition/dict_learning.py
((48 lines not shown))
  48
+        lasso_cd: uses the coordinate descent method to compute the
  49
+        Lasso solution (linear_model.Lasso). lasso_lars will be faster if
  50
+        the estimated components are sparse.
  51
+        omp: uses orthogonal matching pursuit to estimate the sparse solution
  52
+        threshold: squashes to zero all coefficients less than alpha from
  53
+        the projection X.T * Y
  54
+
  55
+    n_nonzero_coefs: int, 0.1 * n_features by default
  56
+        Number of nonzero coefficients to target in each column of the
  57
+        solution. This is only used by `algorithm='lars'` and `algorithm='omp'`
  58
+        and is overridden by `alpha` in the `omp` case.
  59
+
  60
+    alpha: float, 1. by default
  61
+        If `algorithm='lasso_lars'` or `algorithm='lasso_cd'`, `alpha` is the
  62
+        penalty applied to the L1 norm.
  63
+        If `algorithm='threhold'`, `alpha` is the absolute value of the
1
Alexandre Gramfort Owner

s/threhold/threshold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Alexandre Gramfort agramfort commented on the diff September 18, 2011
sklearn/decomposition/dict_learning.py
((52 lines not shown))
  52
+        threshold: squashes to zero all coefficients less than alpha from
  53
+        the projection X.T * Y
  54
+
  55
+    n_nonzero_coefs: int, 0.1 * n_features by default
  56
+        Number of nonzero coefficients to target in each column of the
  57
+        solution. This is only used by `algorithm='lars'` and `algorithm='omp'`
  58
+        and is overridden by `alpha` in the `omp` case.
  59
+
  60
+    alpha: float, 1. by default
  61
+        If `algorithm='lasso_lars'` or `algorithm='lasso_cd'`, `alpha` is the
  62
+        penalty applied to the L1 norm.
  63
+        If `algorithm='threhold'`, `alpha` is the absolute value of the
  64
+        threshold below which coefficients will be squashed to zero.
  65
+        If `algorithm='omp'`, `alpha` is the tolerance parameter: the value of
  66
+        the reconstruction error targeted. In this case, it overrides
  67
+        `n_nonzero_coefs`.
1
Alexandre Gramfort Owner

that makes me think that add the constrain on the l2 reconstruction error had omp_rec_error ? @ogrisel, @GaelVaroquaux thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/dict_learning.py
((518 lines not shown))
  518
+    verbose:
  519
+        degree of output the procedure will print
  520
+
  521
+    shuffle: boolean,
  522
+        whether to shuffle the data before splitting it in batches
  523
+
  524
+    n_jobs: int,
  525
+        number of parallel jobs to run, or -1 to autodetect.
  526
+
  527
+    method: {'lasso_lars', 'lasso_cd'}
  528
+        lasso_lars: uses the least angle regression method
  529
+        (linear_model.lars_path)
  530
+        lasso_cd: uses the coordinate descent method to compute the
  531
+        Lasso solution (linear_model.Lasso). Lars will be faster if
  532
+        the estimated components are sparse.
  533
+
1
Alexandre Gramfort Owner

lars and cd here also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/decomposition/sparse_pca.py
@@ -553,9 +36,10 @@ class SparsePCA(BaseEstimator, TransformerMixin):
553 36
     tol: float,
554 37
         Tolerance for the stopping condition.
555 38
 
556  
-    method: {'lars', 'cd'}
557  
-        lars: uses the least angle regression method (linear_model.lars_path)
558  
-        cd: uses the coordinate descent method to compute the
  39
+    method: {'lasso_lars', 'lasso_cd'}
  40
+        lasso_lars: uses the least angle regression method
5
Alexandre Gramfort Owner

here also lars or cd

Vlad Niculae Owner
vene added a note September 19, 2011

How about also algorithm instead of method? or even fit_algorithm for consistency with dictionary learning classes?

Olivier Grisel Owner

Yes we already had this discussion in the precedent comments. I am still in favor of using "algorithm" pervasively but that would require updating lars_path and LocallyLinearEmbedding and maybe others.

What do people think? If we do so one should not forget to update the API section of the whats_new.rst doc.

Vlad Niculae Owner
vene added a note September 19, 2011
Gael Varoquaux Owner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Alexandre Gramfort
Owner

besides my comments I would rename plot_img_denoising.py to plot_image_denoising.py

also I'd love to see added the omp_rec_error for a better result in plot_img_denoising.py but I don't wand to block the merge

sklearn/decomposition/dict_learning.py
((703 lines not shown))
  703
+
  704
+    Parameters
  705
+    ----------
  706
+    n_atoms: int,
  707
+        number of dictionary elements to extract
  708
+
  709
+    alpha: int,
  710
+        sparsity controlling parameter
  711
+
  712
+    max_iter: int,
  713
+        maximum number of iterations to perform
  714
+
  715
+    tol: float,
  716
+        tolerance for numerical error
  717
+
  718
+    fit_algorithm: {'lasso_lars', 'lasso_cd'}
2
Vlad Niculae Owner
vene added a note September 19, 2011

And here too, lars and cd, right? and in the rest of the objects.

Olivier Grisel Owner

Yes only but where the lasso_ part is mandatory. E.g. not for the transform_algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Olivier Grisel
Owner

@agramfort: I am not familiar with what omp_rec_error is all about. So I agree we should report that discussion for another pull request.

Olivier Grisel
Owner

About the doc, this part of my comment was not addressed: "I would rather move the faces decomposition figure up, right after the mathematical formulation explanation and before the paragraph on sparse coding and image denoising application."

Also I think it should be compared to the PCA output as done for all other methods in this chapter so as to keep the chapter consistent.

Olivier Grisel
Owner

Also when I run the decomp example I often have 2 or 4 of atoms that are not white noise and the non noisy components look almost duplicated or one is a near negative of another. Maybe the L1 reg is too strong the algorithm is not stable on this data for so small dictionaries (6 atoms only, this is far from overcomplete in this regime...).

Vlad Niculae
Owner
Vlad Niculae
Owner
Gael Varoquaux
Olivier Grisel
Owner

Ok this looks good for me. +1 for merge.

Gael Varoquaux
Olivier Grisel ogrisel merged commit d56281e into from September 19, 2011
Olivier Grisel ogrisel closed this September 19, 2011
Olivier Grisel
Owner

Merged. Thanks again for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.