Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #22

Open
2 of 4 tasks
pluskid opened this issue Dec 19, 2014 · 33 comments
Open
2 of 4 tasks

Roadmap #22

pluskid opened this issue Dec 19, 2014 · 33 comments

Comments

@pluskid
Copy link
Owner

pluskid commented Dec 19, 2014

Discussions and/or suggestions are welcome!

@jfsantos
Copy link

Even though restricted Boltzmann machines (and DBMs/DBNs) and autoencoders (DAE, CAE, stacked autoencoders) have a different principle as they are unsupervised, having an implementation that follows the Mocha architecture could be useful. We started discussing this for DBNs here, as we have a simple implementation for RBMs and DBNs and would like to make it compatible with Mocha.

@pluskid
Copy link
Owner Author

pluskid commented Dec 21, 2014

@jfsantos Thanks! I think autoencoders, although unsupervised, are still trained with SGD, we just specify the label to be the same as the input data, and then in principle we could already do this in Mocha. And we might need to add some special layers to support variants of autoencoders. But I might be wrong, as I haven't worked on autoencoders at all. Do you know the details?

As for Bayesian networks, yes, I agree they are very different paradigms. And especially we already have a package (dfdx/Boltzmann.jl#3) on that, I think it is better to keep them in two different packages. But definitely making them compatible should be a goal, and maybe some collaboration.

For using DBNs/DBMs to initialize the weights of DNNs, I think this might already be quite easy. If you could export the weights to HDF5 file with compatible naming, then Mocha should be able to load them, just like loading Caffe's exported models, and then start supervised training on that. We could make Mocha's loading interface richer by for example, allow the user to control in fine details which layer should load from which file a dataset with which name, etc. Also we could probably discuss about a common data format that suits both needs.

@jfsantos
Copy link

You are right about autoencoders being trained with SGD as MLPs. There are some "special" things, though:

  1. specific regularizers/cost functions (e.g., for contractive and sparse autoencoders)
  2. Tied weights: the case where the decoder's weight matrix is simply the transposed weight matrix of the encoder, so you only update the encoder weight matrix (however, each layer has its own biases).
  3. A "corruption layer" is needed for adding noise/zeroing elements of the input data in the case of denoising autoencoders.
  4. In case we want to support stacked autoencoders, they're a bit of a different animal (more like DBNs, in the sense you have to iteratively train them layer by layer).

I'll work on a draft implementation for initializing a DNN with a DBN from Boltzmann.jl and let you know as soon as I have something (hopefully, by submitting a pull request!).

@pluskid
Copy link
Owner Author

pluskid commented Dec 22, 2014

@jfsantos Thanks for the details! I see, it is kind of do-able but not trivial. I need to think about this further.

@philtomson
Copy link

Just wondering what the ETA for recurrence support might be?

@pluskid
Copy link
Owner Author

pluskid commented Dec 23, 2014

@philtomson That is definitely a plan/goal, but maybe after the auto-encoders. The reason is that I do not know RNN enough to start implement them right away. But I think many of the building-blocks are already there. Especially if you want to do a simple explicit unfolding of fixed-length history, I think one could already have a model like that by making use of the shared-parameter mechanism in Mocha. For variable-length RNN support, I need to think more, especially about how the interface should be organized.

That being said, suggestions are very welcome from people who already know RNN. For example, what is the simplest, representative and reproducible example for RNN (like MNIST for CNN)? Are there any nice existing library for RNN (whose way of organizing the user interface we should possible learn from)? etc.

@zhongwen
Copy link

@pluskid Maybe the followings are helpful:
Andrej Karpathy's Neuraltalk: https://github.com/karpathy/neuraltalk
Alex Graves's RNNLIB http://sourceforge.net/projects/rnnl/

@pluskid
Copy link
Owner Author

pluskid commented Dec 23, 2014

@zhongwen Thanks for the links!

@the-moliver
Copy link
Contributor

I'm planning to add time-delay neural networks. I have a working implementation ( https://github.com/the-moliver/NeuralNets.jl ) that I want to port to Mocha.

@philtomson
Copy link

It would be nice to have a Caffe file -> Mocha converter. Maybe I'll work on something like that. Should be doable, right? Or are there Caffe features that are not yet in Mocha?

@pluskid
Copy link
Owner Author

pluskid commented Jul 6, 2015

We already have the ability to load caffe models, but you still need to manually translate the model definition. Automatic translation of architecture is theoretically possible but I guess might by quite tedious to implement. (I'm thinking maybe there should be some universal Dnn architecture specification language coming out recently). Most of the core functionality in caffe has correspondence in mocha. But caffe also have many unofficial forks, which implemented some specific layers, for those, it is more difficult to convert.

@nikolaypavlov
Copy link

@pluskid
Copy link
Owner Author

pluskid commented Aug 18, 2015

@nikolaypavlov Thanks for the suggestions

  • Based on my understanding, maxout is simply a max pooling over some units. We can achieve this by using the existing PoolingLayer or ChannelPoolingLayer. Let me know if you are talking about something else.
  • Max-norm regularization is actually implemented, see for example filter_cons for ConvolutionLayer.

@nikolaypavlov
Copy link

@pluskid Great, I'll try to play with PoolingLayer.

@outlace
Copy link

outlace commented Aug 31, 2015

Is this project meant to be the Theano/Torch of Julia?

Is there ever going to be OpenCL support?

@pluskid
Copy link
Owner Author

pluskid commented Sep 6, 2015

@outlace, this is more like torch than theano in that sense. There is no planned Opencl support unless Julia gets better native support for gpu targets.

@nstiurca
Copy link
Contributor

nstiurca commented Oct 9, 2015

I would be very interested in OpenCL support as well. In fact, I have half a mind to take a stab at it myself. If I can leverage an OpenCL BLAS library (say, CLBLAS.jl), then I basically just have to write im2col.cl and a couple of pooling and neuron kernels, and structure everything else similarly to the CUDA backend.

If I did this, in the interest of clarity would you be OK with renaming GPUBackend -> CUDABackend (adding @deprecated typealias GPUBackend CUDABackend or similar for compatibility), and naming the new backend OCLBackend?

@pluskid
Copy link
Owner Author

pluskid commented Oct 9, 2015

@nstiurca Thanks! This could be cool! Yes, I'm OK with the renaming if we have a working OpenCL backend!

@nstiurca
Copy link
Contributor

nstiurca commented Oct 9, 2015

OK, I will get started this weekend. Should we open an issue for the sake of tracking? Development-wise, it will be simplest for me to create an opencl branch on the fork of your project that I already have. Do you prefer to have such a branch in your repo as well until OpenCL support is stable (assuming we get there...)? It might be good to do that for the sake of anyone else that wants to help develop OpenCL support.

@pluskid
Copy link
Owner Author

pluskid commented Oct 9, 2015

I would suggest do it in your branch, but open a pull request to here, with "[WIP]" in the title and description of the goal and current progress in the text (that you could updates periodically). I will not merge the pr until you have something reasonablely stable, but people will see the pr and could probably jump in to help.

@nstiurca
Copy link
Contributor

nstiurca commented Oct 9, 2015

That works for me. Look for it later today.

@outlace
Copy link

outlace commented Oct 9, 2015

I think this is great. I currently have to use Torch because it's the only mature package that has an OpenCL backend. Being able to run models on my Macbook is fantastic. Really looking forward to this getting OpenCL support.

@nstiurca
Copy link
Contributor

@outlace Caffe also has a fork with OpenCL support, but unfortunately for me I haven't been able to get either Torch nor Caffe to work on 32-bit ARM processor even though it has a fully compliant OpenCL 1.1.

Thus, I am going to start on rolling my own. See PR #155.

@lqh20
Copy link

lqh20 commented Oct 24, 2015

Any plans to implement batch normalization (http://jmlr.org/proceedings/papers/v37/ioffe15.pdf )? Looks like it's a great step forward in terms of trainging time!

@pluskid
Copy link
Owner Author

pluskid commented Oct 25, 2015

@lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example is already working quite nicely.

@philtomson
Copy link

Is MXNet.jl complementary to Mocha.jl or meant to replace it?
On Oct 24, 2015 5:32 PM, "Chiyuan Zhang" notifications@github.com wrote:

@lqh20 https://github.com/lqh20 I'm recently joining a new project
MXNet. We are building a julia interface called MXNet.jl
https://github.com/dmlc/MXNet.jl. It is still at relatively early
stage, but some features are already working. For example, batch
normalization and multi-GPU training in the cifar-10 example
https://github.com/dmlc/MXNet.jl/blob/master/examples/cifar10/cifar10.jl#L9
is already working quite nicely.


Reply to this email directly or view it on GitHub
#22 (comment).

@pluskid
Copy link
Owner Author

pluskid commented Oct 25, 2015

@philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.

@philtomson
Copy link

I wonder if mxnet could be an alternate backend for Mocha.jl? It seems like
that would preserve the advantages of Mocha.jl - simplicity, portability,
good documentation - while also allowing users to drop directly to your
MXNet.jl bindings if needed.

On Sat, Oct 24, 2015 at 5:50 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson It depends. Mocha.jl still
has its advantage of simplicity and portability. But in terms of
computational efficiency or feature richness, I think MXNet.jl should be
replacing Mocha.jl. Because it is built on top of libmxnet which is a
language agnostic general deep learning library that is designed to have,
for example, multi-GPU support. Moreover, the core component of libmxnet is
being actively developed by a team, so in terms of features it is much
better than Mocha.jl which is currently primarily developed by me in my
very little free time. libmxnet itself is actually a joint efforts of
authors from several different deep learning libraries.


Reply to this email directly or view it on GitHub
#22 (comment).

@pluskid
Copy link
Owner Author

pluskid commented Oct 26, 2015

@philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option. Though a something still needs to be improved, esp. documents.

@philtomson
Copy link

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson That could be one possible
option. I will wait and see if that is feasible. As using MXNet.jl
introduce an external dependency on libmxnet. If that dependency itself is
not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of
simple, declarative style of Mocha.jl while also being able to take
advantage of the performance of MXNet.jl. Sure libmxnet has an advantage
of being "language independent", however, that can also be a weakness. It
could mean that you can't readily take advantage of powerful language
features specific to Julia, like macros, for example (or at least it might
be more difficult to do so). I suspect there's a lot of boilerplate code
required when you use libmxnet that could be eliminated at a higher level
of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess
from what I understand of the docs mxnet allows for multiple GPUs whereas
Mocha.jl only allows for using one? If you compare performance between
Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty
close?

I can see where training with multiple GPUs can be an advantage, but some
users might be running pre-trained models on a laptop with only a single
GPU (or some of us don't even have that as we only have an Intel integrated
GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually
quite sufficient for doing this (people with this kind of setup wouldn't
notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a
problem for someone who tries moving from Mocha.jl to MXNet.jl. Using
Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could
probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl
to produce c++ code that make calls directly to libmxnet (or at least
paramatize it so that you could use openMP (as now) or libmxnet in the CPP
backend.


Reply to this email directly or view it on GitHub
#22 (comment).

@philtomson
Copy link

I just got around to installing MXNet.jl and playing with it some. So far
it doesn't seem too much more difficult to use than Mocha,

On Mon, Oct 26, 2015 at 4:01 PM, Phil Tomson philtomson@gmail.com wrote:

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson That could be one possible
option. I will wait and see if that is feasible. As using MXNet.jl
introduce an external dependency on libmxnet. If that dependency itself is
not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of
simple, declarative style of Mocha.jl while also being able to take
advantage of the performance of MXNet.jl. Sure libmxnet has an advantage
of being "language independent", however, that can also be a weakness. It
could mean that you can't readily take advantage of powerful language
features specific to Julia, like macros, for example (or at least it might
be more difficult to do so). I suspect there's a lot of boilerplate code
required when you use libmxnet that could be eliminated at a higher level
of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess
from what I understand of the docs mxnet allows for multiple GPUs whereas
Mocha.jl only allows for using one? If you compare performance between
Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty
close?

I can see where training with multiple GPUs can be an advantage, but some
users might be running pre-trained models on a laptop with only a single
GPU (or some of us don't even have that as we only have an Intel integrated
GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually
quite sufficient for doing this (people with this kind of setup wouldn't
notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a
problem for someone who tries moving from Mocha.jl to MXNet.jl. Using
Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could
probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl
to produce c++ code that make calls directly to libmxnet (or at least
paramatize it so that you could use openMP (as now) or libmxnet in the CPP
backend.


Reply to this email directly or view it on GitHub
#22 (comment).

@pluskid
Copy link
Owner Author

pluskid commented Oct 27, 2015

@philtomson Glad to hear that it works out nicely for you.

The single-GPU performance of Mocha.jl might be similar to MXNet.jl. MXNet.jl has a more flexible symbolic API to define network architectures, but internally optimizations are used to avoid unnecessary memory allocation & computation, etc. But multi-GPU is definitely a win on MXNet.jl side.

I agree that many users with small scale applications do not use GPUs. In this case, the default CPU only libmxnet.so should still be quite straightforward to compile (at least on Linux and OS X). And since libmxnet is actually relatively low level backend, many of the logics will still be built in Julia, and the interface is actually flexible and convenient enough to use.

One of the main goal of the joint-force under the dmlc/libmxnet is to avoid duplicated labors especially in the computational heavy backend. One layer implemented will be automatically available in Python, Julia, R frontends.

Currently I will be maintaining both Mocha.jl and MXNet.jl. In the future when MXNet.jl become more mature, I will try to advocate MXNet.jl as a successor of Mocha.jl.

@pluskid
Copy link
Owner Author

pluskid commented Nov 13, 2015

For those who is interested in RNN/LSTM in Julia. Here is an char-rnn LSTM implementation in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants