Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve speed of SparseAutoencoder and make it more flexible #451

Merged
merged 63 commits into from
Jan 30, 2016

Conversation

stereomatchingkiss
Copy link
Contributor

This commit intent to complete two things

1 : improve the speed of SparseAutoencoder
I cache the computation result in the data member, so the algorithm do not need to compute it two times.Since the member function is const, I declared the data member as mutable;Do anyone think that make the member function as non-const would be better?

2 : make the SparseAutoencoder more versatile
I add two template parameters as following

template<typename HiddenLayer, typename OutputLayer>
class SparseAutoencoderFunction;

This allow users use different layer from ann two compute the features

@rcurtin
Copy link
Member

rcurtin commented Sep 23, 2015

So you mentioned in IRC that this implementation is faster than the implementation in src/mlpack/methods/sparse_autoencoder; can I get a little more information on how you calculated that? I don't doubt your results or anything, I just want to run some tests of my own and see if I can replicate the speedup on other interesting architectures and systems. :)

//sparse autoencoder function greedy
using SAEFG = ann::SparseAutoencoderFunction<FSigmoidLayer, FSigmoidLayer, std::true_type>;

BOOST_AUTO_TEST_SUITE(SparseAutoencoderTest2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note -- when this is merged, we can remove the old sparse_autoencoder_test.cpp and revert this to SparseAutoencoderTest instead of SparseAutoencoderTest2.

@stereomatchingkiss
Copy link
Contributor Author

1 : can I get a little more information on how you calculated that?
Of course, the file sparse_autoencoder_function already provide the example(I write it as part of the class comments), I hope this is a correct way to measure the performance

2 : This file looks like it's heavily based on the code in mlpack/methods/sparse_autoencoder/sparse_autoencoder_function.hpp
Yes, I put it in ann mainly because this class break the old api and SparseAutoEncoder looks like part of the family member of neural network(I am not an expert of neural network, please correct me if I am wrong), for backward compatibility I do not change the original version.

3 : But maybe it would be a better idea to make this class work with the Trainer class in src/mlpack/methods/ann/
I do not know how to make it work yet, but if this could work, it should be able to provide unified interface. I think I would wait for the comments of zoq's.

a : add functions Train
b : add new constructor
c : add Serialize

2 : move it from ann to methods/sparse_autoencoder
(because submat of arma is a proxy object)
2 : make codes meet the requirement of style guide
template<typename InputVecType, typename OutputVecType>
static void fn(const InputVecType& x, OutputVecType& y)
{
y = (1.0 / (1 + arma::exp(-x)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally my intention for the overflow checks in the LogisticFunction class was to avoid strange issues during the training process. But I'm intended to remove the checks in the LogisticFunction class, if you agree we can remove the LazyLogisticFunction class and just use the LogisticFunction class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally fine with that

2 : add static functions to initialize weights
3 : fix bug--did not initialize parameters correctly
@stereomatchingkiss
Copy link
Contributor Author

After some experiments, not all of the layers of ann are suitable for SparseAutoencoder, it need to do some modification(ex : you cannot just call the dropout function, you need to call the activation function like sigmoid or RELU after calling Forward), I would like to open two new folders

1 : SparseAutoencoder/layers
2 : SparseAutoencoder/activation_functions
to collect the useable layers and activation functions, the implementation will based on the layers and activation functions of ann

Besides, I think the function "GetNewFeatures" of SparseAutoencoder should call the activation function of the hidden layer rather the Sigmoid.

I will remove LazyLogisticFunction and remove the check of LogisticFunction as zoq mentioned

Any suggestions?

Edit :
After these changes, the FineTuneFunction(another pull request) has to accept the SparseAutoencoder functions rather than input data, since the GetNewFeatures function is depend on the policy of the HiddenLayer

@stereomatchingkiss
Copy link
Contributor Author

Sorry for the slow response

Don't mind, thanks for giving review, I know it takes time to do those things.

I guess since the forward and backward functions provide the same output as the old code, the test should run without any problems. If that's the case I'll be happy to merge the changes.

If you think so, then I would write a simple test case to test the output of forward and backward later on. After this has been done, we could deal with another problem

@zoq
Copy link
Member

zoq commented Jan 23, 2016

No need to write another test for the forward and backward function, if the existing sparse autoencoder test works with the new code it's absolutely fine. I guess sparse_autoencoder_test_2.cpp is the modified test and we can remove the sparse_autoencoder_test.cpp file and rename the other test.

@stereomatchingkiss
Copy link
Contributor Author

I guess sparse_autoencoder_test_2.cpp is the modified test and we can remove the sparse_autoencoder_test.cpp file and rename the other test.

I would like to do that too, but the problem is old sparseAutoencoder and the new one do not have the same api, so the tests need to have some change.

@stereomatchingkiss
Copy link
Contributor Author

I upload the test file, the name is "sparse_autoencoder_test_3.cpp".
If you think it is ok, you could rename it to "sparse_autoencoder_test".

The api of the FFN are different than the original sparseAutoencoder, therefore
we need to do some change on the original codes.

@stereomatchingkiss
Copy link
Contributor Author

Finally, all jobs have been done. I do some change on FFN(do not take the reference of outputLayer but add a public api to users, this make serialization become easier).

I use FeedForward to get the Evaluate value(Evaluate api do not give the value)
Call FeedForward and FeedBackWard to get the gradient value

Please check the test cases, they test with the same data(except of the gradient part) but with different api, I hope my implementation meet the requirement of the semantic.

If you think everything going well, you can merge it now. Thanks for taking time review

Edit :
After this case is finished, let us finish the part of serialization.Besides, do convolution layer provide padding options?

@rcurtin
Copy link
Member

rcurtin commented Jan 25, 2016

When the merge is done, I'll go ahead and add a reverse-compatibility layer for the 2.x.x releases and then remove the old SparseAutoencoder code.

zoq added a commit that referenced this pull request Jan 30, 2016
…ncoder

Improve speed of SparseAutoencoder and make it more flexible.
@zoq zoq merged commit 33082f0 into mlpack:master Jan 30, 2016
@zoq
Copy link
Member

zoq commented Feb 1, 2016

Thanks for the contribution. I made a couple of changes:

  • moved the main sparse autoencoder into a separate folder in 4ad39f8
  • minor formatting and comment fixes in 896937d03c5eb32a4e980
  • modified the sparse autoencoder test in 443ecdc, so that it uses the SparseAutoencoder class you already implemented. Since we already have a gradient test for each activation function, I removed the gradient sparse autoencoder test.

Let me know if I messed anything up.

Since, we have this nice SparseAutoencoder class, it should be easy provide a reverse-compatibility layer for the 2.x.x releases. I'll go and write the necessary code, if nobody really likes to do it.

We should also think about a test case that tests the code in combination with an optimizer. I run into a couple of problems, once I tested the code with the existing trainer class. I solved the issues in f34ae33. Another test could also test the ability to work with additional layer. We only test the standard sparse autoencoder model structure (input layer, hidden layer, output layer). Wich is fine since the former code uses this static model structure, but since we build the sparse autoencoder using the ann modules, we have the ability to add a bunch of interesting layer, e.g. Dropout layer.

@zoq
Copy link
Member

zoq commented Feb 1, 2016

Also, I think a command line program would be neat feature.

@stereomatchingkiss
Copy link
Contributor Author

Thanks for the fix.

I would provide a command line programs, with different activations(relu, tanh, sigmoid) and dropout(if it works) after serialization of FFN done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants