-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoencoder model #155
Autoencoder model #155
Conversation
It would make much more sense to me to implement an autoencoder as a layer, rather than a model. A layer that takes an input of arbitrary size, and has a single weight matrix as parameter. It would take the dot product with the weight matrix (projection) then the dot product with the transposed matrix (reconstruction). And return that. For one, it solves the problem of weight symmetry/reuse. And it allows to easily stack autoencoders. I haven't added an Autoencoder layer so far because I've never had to use autoencoders. If you're interested, feel free to add one. |
The way I understand that stacking autoencoders work is training a first autoencoder, then training a second autoencoder using the hidden units of the first autoencoder as inputs. This is one reason why I think separating the encoder and the decoder makes sense. [1] The other is that this allows for richer autoencoders. A purely linear autoencoder like you described will just learn the principal components. But there are more options, such as denoising autoencoders, as implemented in the test. Really the encoders and decoders can be any combination of layers you like. Convolutional autoencoders can also be implemented in this framework. Another reason why I think this should be a model onto itself is that it requires its own compilation and fitting stages, before being reused in a bigger network. [1] http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders |
Any more thoughts on this? I could write documentation, but I could also change things in the code if desired. For what its worth, Autoencoders in Torch are built in a very simiar way, but the autoencoder model lives in a separate package for unsupervised methods. |
A better solution based on the structure of keras is to implement a base Autoencoder class that inherits from Layer. Then DAE and such can inherit from this. We can then add a get_hidden(train) call to get the hidden layer of the autoencoder. I am almost done writing something up |
See #180 |
One question: I don't understand your use of |
In order to connect decoder and encoder, I had to construct a Sequential of the encoder and the decoder. But because both of those are models here, I needed to wrap them as Layers. That is what I used Merge for. Another option would be to have a separate layer to implement a separate Layer to only wrap a model, ie a container layer. |
I added a relatively generic autoencoder model. It allows the encoding stage to be reused as a layer in subsequent networks, and it can be frozen using a trick I got from @kenterao in #56 . In order to make this concise, I had to modify the Merge layer to allow 'merging' a single model, so basically turning it into a container layer. Also happy to do this in a separate Container layer, but I think that would include some duplication.
One thing that is typically done in autoencoders, which for now has to be done by hand, is tying of weights. I couldn't come up with a neat way of doing this, while allowing general encoders and decoders. So I left it up to the enduser to do this for now (see the test to see how it is done.)
Let me know what you think, and I am happy to write documentation for this.