[WIP] Adding Sparse Autoencoder to the neural_network folder #2099

Closed
wants to merge 13 commits into
from

Projects

None yet
@IssamLaradji

Sparse Autoencoder

400px-autoencoder636

Sparse autoencoder tries to learn a function f(x)=x. In other words, it tries to learn an approximation to the identity function, so as to output that is similar to the input. Its main use is to extract discriminating features (mainly tested on image recognition type of problems) which have been observed to improve classification performance.

Properties:

- For batch processing, scipy's optimizer 'fmin_l_bfgs_bfmin_l_bfgs_b' was used to minimize the cost which is very efficient and powerful. - The gradient was tested by computing it numerically and analytically (run test_sae.py) - Everything is vectorized, except the minibatch loop - The algorithm was developed based on the tutorials/references given by Prof. Andrew NG, - [1] http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm - [2] http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization - [3] http://ufldl.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder

TODO:

- Fix the test file - Remove the comments in the file (They are there for debugging purposes only) - Beat the travis build
@vene vene commented on an outdated diff Jun 28, 2013
sklearn/neural_network/sae.py
+ units and n_hidden is the number of hidden units.
+ intercept_hidden_ : array-like, shape (n_hidden,), optional
+ Biases of the hidden units
+ intercept_visible_ : array-like, shape (n_visible,), optional
+ Biases of the visible units
+
+ Examples
+ --------
+
+ >>> import numpy as np
+ >>> from sklearn.neural_network import SAE
+ >>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
+ >>> model = SAE(n_hidden=10)
+ >>> model.fit(X)
+ SAE(batch_size=10000, beta=3, lr=0.0001, n_hidden=10, n_iter=20,
+ random_state=None, sparsityParam=0.01, verbose=False)
@vene
vene Jun 28, 2013 scikit-learn member

Careful with the indentation here, it causes an error.

@vene vene commented on an outdated diff Jun 28, 2013
sklearn/neural_network/test/test_sae.py
@@ -0,0 +1,47 @@
+import numpy as np
+from sklearn.neural_network import SAE
+from numpy.testing import assert_almost_equal
+from sklearn.datasets import load_digits
+X = load_digits().data
+X=X.T
+sae = SAE()
+n_samples=X.shape[1]
+n_visible=X.shape[0]
+initial_theta=sae._initParams(n_visible)
+
+#This is Unofficial Testing (basic testing): this is run to test if the gradients computed numerically are close enough to those computed analytically
+
+def test_SAEGradient(theta, X,n_visible, n_hidden, lr,sparsityParam, beta, n_samples,n_slice):
@vene
vene Jun 28, 2013 scikit-learn member

Test cases shouldn't have parameters, and test files should't print anything, but I assume this file is part of the things that still need to be changed to Python style. Anyway I think it's interesting to use gradient checking in tests; I wonder whether the test can be made small enough not to be slow.

On the other hand maybe the gradient function is small and simple enough to not need individual testing but just be tested as part of the greater thing. I don't know at this point.

@vene
Member
vene commented Jun 28, 2013

Thanks for working on this, can't wait to get to play with it!

I recommend using PEP8 and pyflakes as part of your development process, it's more effective than leaving it for later. Find a nice checker like flake8 to plug into your editor.

@IssamLaradji

@vene thanks for the great suggestions! I'll make sure to have them covered soon enough.

@haarts
haarts commented Jul 1, 2013

I'm curious how this relates to PR #1954.

@IssamLaradji

@haarts, they are both feature extractors, but they follow different algorithms
These references might provide insight,
Sparse autoencoders: http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity
RBM: http://deeplearning.net/tutorial/rbm.html

@haarts
haarts commented Jul 1, 2013

@IssamLaradji thanks! I find the autoencoders easier to understand by some margin.

@haarts haarts and 3 others commented on an outdated diff Jul 1, 2013
sklearn/neural_network/sae.py
+class SAE(BaseEstimator, TransformerMixin):
+ """
+ Sparse Autoencoder (SAE)
+
+ A Sparse Autoencoder with one hidden layer. Parameters are trained using
+ Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
+ Parameters
+ ----------
+ n_hidden : int
+ Number of hidden neurons
+ lr : float, optional
+ Learning rate to use during learning. It is *highly* recommended
+ to tune this hyper-parameter. Possible values are 10**[0., -3.].
+ beta : float, optional
+ Weight of sparsity penalty term
+ sparsityParam : float, optional
@haarts
haarts Jul 1, 2013

nitpick: sparsity_param.

@IssamLaradji
IssamLaradji Jul 1, 2013

Thanks @haarts, I will fix all these unconventional coding style by abiding to the flake8 python structure rules. I will have that done by today hopefully!

@GaelVaroquaux
GaelVaroquaux Jul 4, 2013 scikit-learn member

In sklearn/neural_network/sae.py:

+class SAE(BaseEstimator, TransformerMixin):

Neither the class name, nor the file, should be called 'SAE'. Acronyms
are hard to follow. I can suggest 'SparseAutoEnc' and sparse_auto_enc.py.

@vene
vene Jul 4, 2013 scikit-learn member

how about just autoencoder or auto_encoder? Except for the buzzword
value of the word sparse does it really add much more? The only
non-sparse autoencoder I can think of is the bottleneck, semantic hashing
autoencoder which is specific enough to be called that if we ever implement
it. WDYT?

On Thu, Jul 4, 2013 at 11:52 AM, Gael Varoquaux notifications@github.comwrote:

In sklearn/neural_network/sae.py:

+class SAE(BaseEstimator, TransformerMixin):

  • """
  • Sparse Autoencoder (SAE)
  • A Sparse Autoencoder with one hidden layer. Parameters are trained using
  • Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
  • Parameters

  • n_hidden : int
  •    Number of hidden neurons
    
  • lr : float, optional
  •    Learning rate to use during learning. It is _highly_ recommended
    
  •    to tune this hyper-parameter. Possible values are 10**[0., -3.].
    
  • beta : float, optional
  •    Weight of sparsity penalty term
    
  • sparsityParam : float, optional

In sklearn/neural_network/sae.py: > +class SAE(BaseEstimator,
TransformerMixin):
Neither the class name, nor the file, should be called 'SAE'. Acronyms are
hard to follow. I can suggest 'SparseAutoEnc' and sparse_auto_enc.py.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2099/files#r5027510
.

@GaelVaroquaux
GaelVaroquaux Jul 5, 2013 scikit-learn member

how about just autoencoder or auto_encoder?

+1 for auto_encoder and AutoEncoder

IssamLaradji added some commits Jul 2, 2013
@IssamLaradji IssamLaradji Substantial Improvements Made
1) Performance increase
2) Conformant to flake8 convention, except some violations to capital
letters and long sentences rules
3) Added the option of naive backpropagation (no fancy optimizer) to
support partial fit
4) Added the option to choose any scipy optimizer for optimization,
including, lbfgs, cg, newton-cg and so on.
442f0b9
@IssamLaradji IssamLaradji Example added, and performance boosted f38bfe1
@IssamLaradji

Hi! I improved the code by removing unnecessary calculations and taking shortcuts in matrix multiplications.

I also added an example file autoencoder_mnist.py that applies SAE on a small subset of MNIST dataset, and got satisfying results. Using SGD as the classifier, I benchmarked the extracted features,

  • The score of applying SGD on the raw pixels is 0.943
  • The score of applying SGD on the SAE extracted features is 0.986

Thanks.

@IssamLaradji

@GaelVaroquaux, @vene , From the names that were suggested, I went with autoencoder, for it's daintiness :)
And yes I believe sparse is redundant nowadays, I never encountered the non-sparse autoencoder.

Thanks for the suggestions

@mhlr
mhlr commented Sep 26, 2013

What is the status of this? Thanks

@IssamLaradji

Hi @mhlr thanks for asking.

I believe its working as intended, at least I got nice results with various datasets. An example file, autoencoder_mnist.py, shows exactly that. Test cases and proper documentation, however, are not yet complete. Therefore, you might face unanticipated bugs :(.

@ogrisel
Member
ogrisel commented Sep 26, 2013

As highlighted by the failing common tests, the fit method is not compatible with the sklearn API conventions.

@rolisz
rolisz commented Jan 8, 2014

Issam, do you need a hand with this? At least with fixing the common tests?

@IssamLaradji

Hi @rolisz, I will be glad to get your help! I will soon change the code here to inherit from Multi-layer Perceptron (the PR I didn't complete yet), as they have many methods in common. Anyhow, please help if possible. I am not familiar with the process of more than one person working on the same PR. So, please tell me what I should do to add you as a developer for the PR? Thanks!

@jnothman
Member
jnothman commented Jan 8, 2014

Basically, other people can offer PRs to your branch, by going to your
github fork of scikit-learn. And it may be more sensible for MLP and
Autoencoders to enherit from a common base class, than to build one
straight on top of the other.

On 9 January 2014 10:19, Issam Laradji notifications@github.com wrote:

Hi @rolisz https://github.com/rolisz, I will be glad to get your help!
I will soon change the code here to inherit from Multi-layer Perceptron
(the PR I didn't complete yet), as they have many methods in common.
Anyhow, please help if possible. I am not familiar with the process of more
than one person working on the same PR. So, please tell me what I should do
to add you as a developer for the PR? Thanks!


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/2099#issuecomment-31887144
.

@IssamLaradji

Right. Thanks ! :)

@majidaldo

so why put effort to implement sparse ae in scikit when pylearn2 has a better foundatoin for deep learning?
later on, there will probably be an interface between scikit and pylearn2.

@coveralls

Coverage Status

Coverage decreased (-7.03%) when pulling eab10f0 on IssamLaradji:SparseAutoencoder into 4b82379 on scikit-learn:master.

@mrgloom
mrgloom commented Sep 12, 2015

Can this autoencoder produce binary codes?
Something like this paper about semantic hashing
http://www.cs.utoronto.ca/~rsalakhu/papers/semantic_final.pdf

@amueller
Member

@mrgloom that paper uses an RBM which is already in master.
They use a poisson activation, though, which is not implemented in our RBM.

@kastnerkyle
Member

So are sparse autoencoders, gaussian binary RBM, etc. still desired? Might be good for someone to look at once we merge MLP and the random classifier - should be able to reuse all the existing tools.

@amueller
Member

I'm "meh" on these. Is anyone using them?

@kastnerkyle
Member

Sparse and denoising autoencoders some - RBM rarely. I don't see a huge
need but it would be easy pickings once MLP is merged - same toolkit.
Variational autoencoders could be intersting since they can get useful
results in ~5 layers (2 in, 2 out, linear in the middle) but not enough
citations yet.

On Wed, Oct 21, 2015 at 4:59 AM, Andreas Mueller notifications@github.com
wrote:

I'm "meh" on these. Is anyone using them?


Reply to this email directly or view it on GitHub
#2099 (comment)
.

@amueller
Member

Variational autoencoder without automatic differentiation? Are you sure? ;)

@kastnerkyle
Member

Yeah it should be easy enough. Though maybe I will live to regret that

On Wed, Oct 21, 2015 at 5:05 AM, Andreas Mueller notifications@github.com
wrote:

Variational autoencoder without automatic differentiation? Are you sure? ;)


Reply to this email directly or view it on GitHub
#2099 (comment)
.

@amueller
Member

I think we kind of decided against including autoencoders. Anyone mind if I close?

@amueller amueller closed this Oct 25, 2016
@GaelVaroquaux
Member
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment