Revamp ann modules #831

zoq · 2016-12-15T16:42:04Z

This pull request revamps the ann modules. It should be a lot easier to create networks especially more complex networks that depend on the concat layer or the recurrent layer.

Changes:

The way parameters of different types are dealt with is greatly simplified: arma::Mat for all input and output parameters instead of arma::Cube to represent 3rd order tensors. Makes it possible to e.g. use the linear layer for convolutional and standard feed forward networks.
The introduced Recurrent module/layer is able to transform each non-recurrent layer/module into a recurrent layer.
In addition to the feedforward, recurrent and convolution network test, each layer is now tested by approximating the Jacobian matrix.
The introduction of the Concat module/layer allows us to split the network into subnetworks, which makes the implementation of e.g. the inception network a lot easier. (I'll open another PR for the inception layer once the PR is merged).
The way the network is created is greatly simplified instead of using:

LinearLayer<> inputLayer(trainData.n_rows, hiddenLayerSize);
BiasLayer<> inputBiasLayer(hiddenLayerSize);
BaseLayer<PerformanceFunction> inputBaseLayer;
LinearLayer<> hiddenLayer1(hiddenLayerSize, trainLabels.n_rows);
BiasLayer<> hiddenBiasLayer1(trainLabels.n_rows);
BaseLayer<LogSoftMaxLayer<>> outputLayer;
OutputLayerType classOutputLayer;
auto modules = std::tie(inputLayer, inputBiasLayer, inputBaseLayer,
                        hiddenLayer1, hiddenBiasLayer1, outputLayer);

FFN<decltype(modules), decltype(classOutputLayer), RandomInitialization,
    PerformanceFunctionType> net(modules, classOutputLayer);

you can create the same network with:

FFN<NegativeLogLikelihood<> > model;
model.Add<Linear<> >(trainData.n_rows, hiddenLayerSize);
model.Add<SigmoidLayer<> >();
model.Add<Linear<> >(hiddenLayerSize, outputSize);
model.Add<LogSoftMax<> >();

…utures errors like this.

This reverts commit 9d85b64.

…Pooling and MeanPooling class names for the actual module name.

… layers.

rcurtin · 2016-12-29T19:26:00Z

src/mlpack/methods/ann/layer/layer_types.hpp

+    Select<arma::mat, arma::mat>*,
+    Sequential<arma::mat, arma::mat>*,
+    VRClassReward<arma::mat, arma::mat>*
+>;


I was quite surprised, when I reviewed the visitor paradigm and then the boost::variant sources, to discover that boost::variant and static_visitor are completely implemented at compile time and there is no runtime overhead. It took a while to read the boost::variant code to verify (and I don't completely understand it, only "sort of"), and then taking a look at mlpack programs:

$ nm -g bin/mlpack_knn | c++filt | grep MonoSearchVisitor $

I'm quite impressed! I didn't think it could be done entirely at compile-time.

Anyway, one drawback of this approach is that it doesn't make it easy for users to add their own layers. One way you could do that is this:

using LayerTypes = boost::variant< Add<arma::mat, arma::mat>*, ..., #ifdef EXTRA_LAYER_TYPES EXTRA_LAYER_TYPES, #endif VRClassReward<arma::mat, arma::mat>* >;

And then a user can add their own layer by adding that to EXTRA_LAYER_TYPES before including mlpack.

Another possible issue that I have not thought of any solution to is this: if you know the structure of your network at compile time you could specify a layer as, e.g., Linear<arma::mat::fixed<10, 10>, arma::mat::fixed<10, 10>>, but that's not part of the boost::variant object, so you couldn't use it. Maybe in situations like that we can use dynamic dispatch of some sort? I'll keep thinking but I don't know if I will be able to come up with something straightforward---boost::any wouldn't be a good solution because then we lose the static visitor and any call to any layer will have to be a lookup somehow. Either way I think even if we don't have a solution for now, it would be great to use this as-is and try to improve later, because this whole idea is so much nicer for users than before too.

(I haven't reviewed the rest of the code yet---I've primarily focused only on this point so far. I'll add comments about the rest of the code hopefully in the next few days.)

Thanks for taking a closer look at the PR. I think the easiest solution to use an own layer is to inherit from one of the existing layer e.g. Linear<>.

class OwnLayer : public Liner<> { public: OwnLayer(); void Forward(...); void Backward(...): void Gradient(...); };

afterwards use the rest of the code as always. It's not ideal but it works, without rebuilding the library. I guess another solution would be to extend boost-variant something like:

template <typename LocalTypes> class FFN { typename boost::make_variant_over< typename boost::mpl::joint_view< LayerTypes, LocalTypes >::type >::type extendedTypes; ... }; FFN<OwnLayer<> > model; model.Add<OwnLayer<> >();

The problem I see is that you have to specify the type of the new layer.

Wouldn't you have to make Forward(), Backward(), and Gradient() virtual for inheriting from Layer<> to work?

Anyway the LocalTypes solution is nicer than what I proposed.

I don't see a way using boost::variant to collect the full list of types used by Add() at compile time. Anything I can think of is pretty ugly. But I don't think it's a huge problem to ask users to use a template parameter if they want to use their own custom layer type---we can just try to provide enough types that it's not a problem. And maybe if we think of something better later, we can refactor. What do you think?

rcurtin

This looks great to me. I think that if we have good documentation and examples for this, then publicize it, we will get a lot of interest---especially if we can show that it outperforms some other toolkits. I hope the comments I've made are useful. For the most part I've just assumed that all the layers are implemented correctly; there are so many, I think it would take me a very long time to read them all. :) I focused primarily on the structure of the code at a high-level and specifically on the FFN and RNN classes, plus the abstractions of the layers themselves.

I have a few more general comments and questions:

Is the CNN class meant to be replaced by FFN with convolutional layers? I think that's the case, just want to be sure I understand.
The FFN class holds arma::mats internally mostly, but the layers are general enough that they can support other types of input, like arma::Mat<int> or whatever else. Should the types held by FFN instead be a function of the element types held by the layers? It might be nice for that to be the case, especially because it could also solve Tham's question in Could we change the response type of ann from arma::mat to arma::Row<size_t>? #541 from many months ago. It would also allow training reduced-precision neural networks, like in https://arxiv.org/pdf/1502.02551.pdf for instance.
It seems like a lot of the parameters of the FFN and RNN classes are there specifically for use by the optimizer during training. That's fine with me, but I thought that another option might be to create some kind of "optimization helper" class that held the network itself and all of the training parameters. I think that would be a lot of refactoring work though since it would apply to the layers too; maybe it's only something to think about (and maybe it's not a good idea in the long run).
Since this is such a nice refactoring and has so much potential, I think we can make a big splash if we do some planning on how to release this with maximum impact to the machine learning and data science community. If you're up for that, we can plot out some simple roadmap of things to do before releasing this as part of mlpack 3.0.0.

Overall this is great! I am very excited. Sorry that my review took so long. :(

rcurtin · 2017-01-17T20:01:00Z

src/mlpack/methods/ann/ffn.hpp

+   *
+   * @param layer The Layer to be added to the model.
+   */
+  void Add(LayerTypes layer) { network.push_back(layer); }


It looks like there are two Add() overloads here---one constructs a new layer and adds it to the model, and the other takes a pointer to a layer that is actually owned at a higher scope and puts it in the network without copying it (it just takes the pointer). It seems to me like you could end up in a weird situation if you did something like this...

Linear<> l(10, 10); // modify the parameters of L somehow // ... { FFN<> f; f.Add(&l); } // destructs l // Try to re-use l in another network. FFN<> f2; f2.Add(&l); // But since l was already destructed, it's not the same!

Do you think it would be better to instead provide a "copy" and "move" version of Add()? Kind of like:

// Copy a layer and add it to the network. template<typename LayerType> void Add(const LayerType& l, /* SFINAE to check that LayerType is in LayerTypes */); // Take ownership of a layer and add it to the network. template<typename LayerType> void Add(LayerType&& l, /* SFINAE to check that LayerType is in LayerTypes */);

I don't know if that's the best way to accomplish it. It looks like for most of the layers, we can use the default copy and move constructors/operators (I think).

Good point, using the copy and move constructors/operators is a good idea.

rcurtin · 2017-01-17T20:06:11Z

src/mlpack/methods/ann/ffn_impl.hpp

+  for (size_t i = 0; i < network.size(); ++i)
+  {
+    offset += boost::apply_visitor(WeightSetVisitor(std::move(parameter),
+        offset), network[i]);


If I am understanding right, this clever design means that all parameters in the network are stored in contiguous memory. That is nice! It'll provide nice cache benefits and should also accelerate serialization.

Right, the drawback is, that it's not that easy to change the weight vector size during training, but I couldn't think of any situation where that's necessary.

rcurtin · 2017-01-17T20:17:26Z

src/mlpack/methods/ann/ffn_impl.hpp

-                OutputLayerType>::value,
-                "The type of outputLayer must be OutputLayerType.");
+  this->predictors = std::move(predictors);
+  this->responses = std::move(responses);


I didn't know until I looked it up that calling std::move() on a const reference actually just calls the copy constructor. If I understand correctly, we must hold predictors and responses as members, since the optimizer will be calling Evaluate() and Gradient() repeatedly. Another option is simply to hold a pointer to predictors and responses, then in Train(), just take the address of the given data, and when training is done, set the pointers to NULL.

rcurtin · 2017-01-17T20:22:44Z

src/mlpack/methods/ann/ffn_impl.hpp

+  {
+    this->deterministic = deterministic;
+    ResetDeterministic();
+  }


It seems like the two checks above are for safety, since the user might for some reason call Evaluate() or Gradient(). Typically though the optimizer will be calling those two functions, and in those cases I think you have already called ResetParameters() and ResetDeterministic() in Train() before calling the optimizer. I think it would be fine, if you wanted, to remove those two checks from here, and then provide a "user" overload of Evaluate() and Gradient() that don't need the parameters parameter (since it's unused anyway), and the documentation could direct users to use that one if they needed. I don't know if it would give much speedup, but if Evaluate() is called for every point during training, it might be noticeable---might be worth testing.

The problem is, that some layer act differently in training and testing mode. Like the Dropout layer randomly sets some values to zero in training mode and uses all values while testing. Since the optimizer evaluates the network after each batch/iteration we have to set the deterministic parameter accordingly and sitch it back for the next iteration.

// Evaluate the gradient for this iteration. if (shuffle) function.Gradient(iterate, visitationOrder[currentFunction], gradient); else function.Gradient(iterate, currentFunction, gradient);

for the Gradient function, we have to set the deterministic parameter to training

// Now add that to the overall objective function. if (shuffle) overallObjective += function.Evaluate(iterate, visitationOrder[currentFunction]); else overallObjective += function.Evaluate(iterate, currentFunction);

for the Evaluate function, we have to set the deterministic parameter to testing.

Ah, right, then there is no way around it.

rcurtin · 2017-01-17T20:23:12Z

src/mlpack/methods/ann/ffn_impl.hpp

-  Forward(arma::mat(predictors.colptr(i), predictors.n_rows, 1, false, true),
-      network);
+  currentInput = std::move(arma::mat(predictors.colptr(i),
+      predictors.n_rows, 1, false, true));


I think you can also get away with currentInput = predictors.unsafe_col(i) here and a few lines later for currentTarget.

You are right, maybe it's sufficient to just save the index, I'll have to check if that's works.

rcurtin · 2017-01-17T20:31:03Z

src/mlpack/methods/ann/layer/layer_visitor.hpp

+ * LoadOutputParameterVisitor restores the output parameter using the given
+ * parameter set.
+ */
+class LoadOutputParameterVisitor : public boost::static_visitor<void>


This file is very long; if you wanted to, I think it would be reasonable to split each visitor into its own file since there are so many of them. Your call, I think it's fine either way.

Lets do this, would definitely increase the readability.

rcurtin · 2017-01-17T20:34:44Z

src/mlpack/tests/convolution_test.cpp

 * mlpack is free software; you may redistribute it and/or modify it under the
 * terms of the 3-clause BSD license.  You should have received a copy of the
 * 3-clause BSD license along with mlpack.  If not, see
 * http://www.opensource.org/licenses/BSD-3-Clause for more information.
+ * Tests for various convolution strategies.


Ah, can we put this comment back up above the license? If we ever need to change the license text, this'll probably mess up the awk scripts I try to make. :)

Sure, let's make it easier for awk.

rcurtin · 2017-01-17T20:36:47Z

src/mlpack/tests/ann_layer_test.cpp

+    module.Backward(std::move(input), std::move(output), std::move(delta));
+    BOOST_REQUIRE_EQUAL(arma::accu(output), arma::accu(delta));
+  }
+}


It seems like a lot of layer types aren't tested; should we add some tests for those too?

Yeah, implementing more tests is definitely on my list, I think I used at least every layer in one test. Like there is no test for the base layer in ann_layer_test, since each activation function is tested in the activation_functions_test or the mean pooling layer is tested in pooling_test.

rcurtin · 2017-01-17T20:39:33Z

src/mlpack/methods/ann/ffn_impl.hpp

+          offset), network[i]);
+
+      boost::apply_visitor(resetVisitor, network[i]);
+    }


Do we also need to clear all of the locally-held training-time parameters like currentInput, currentTarget, etc., here?

If somebody calls the Gradient function right after loading the model before calling the Evaluate or Train function right.

rcurtin · 2017-01-17T20:48:57Z

src/mlpack/methods/ann/convolution_rules/fft_convolution.hpp

@@ -47,6 +47,8 @@ class FFTConvolution
   * @param input Input used to perform the convolution.
   * @param filter Filter used to perform the conolution.
   * @param output Output data that contains the results of the convolution.
+   * @param dW Stride of filter application in the x direction.
+   * @param dH Stride of filter application in the y direction.


Seems like dW and dH aren't parameters to this method; maybe this comment should be removed?

Nice catch!

rcurtin · 2017-01-17T21:16:54Z

Ah, I meant to add, I think it is fine to merge now too before the merge gets hard(er), and possibly open some issues for any of the relevant comments or anything that needs to be addressed later.

zoq · 2017-01-18T16:27:09Z

This looks great to me. I think that if we have good documentation and examples for this, then publicize it, we will get a lot of interest---especially if we can show that it outperforms some other toolkits. I hope the comments I've made are useful. For the most part I've just assumed that all the layers are implemented correctly; there are so many, I think it would take me a very long time to read them all. :) I focused primarily on the structure of the code at a high-level and specifically on the FFN and RNN classes, plus the abstractions of the layers themselves.

I have a few more general comments and questions:

Is the CNN class meant to be replaced by FFN with convolutional layers? I think that's the case, just want to be sure I understand.

The FFN class holds arma::mats internally mostly, but the layers are general enough that they can support other types of input, like arma::Mat or whatever else. Should the types held by FFN instead be a function of the element types held by the layers? It might be nice for that to be the case, especially because it could also solve Tham's question in #541 from many months ago. It would also allow training reduced-precision neural networks, like in https://arxiv.org/pdf/1502.02551.pdf for instance.

It seems like a lot of the parameters of the FFN and RNN classes are there specifically for use by the optimizer during training. That's fine with me, but I thought that another option might be to create some kind of "optimization helper" class that held the network itself and all of the training parameters. I think that would be a lot of refactoring work though since it would apply to the layers too; maybe it's only something to think about (and maybe it's not a good idea in the long run).

Since this is such a nice refactoring and has so much potential, I think we can make a big splash if we do some planning on how to release this with maximum impact to the machine learning and data science community. If you're up for that, we can plot out some simple roadmap of things to do before releasing this as part of mlpack 3.0.0.

Overall this is great! I am very excited. Sorry that my review took so long. :(

Thanks for taking the time to look over the PR.

Yes the CNN class is meant to be replaced by FFN, a convolutional neural network is also a feed-forward network, so it makes sense to use the same class. The previous design used the arma::cube type to handle 3rd order tensors which could be easily replaced by arma::mat as you pointed out in the split PR so there wasn't any reason to implement another class just to handle arma::cube.
Absolutely, should be easy to do.
I agree I guess, there is another thing regarding the optimizer that I haven't brought up that is kind of essential when it comes to the training process. None of the optimizers offer an interface to use a test or validation set for the evaluation/test process instead they reuse the training set.
Absolutely, I already have some things in mind that I like to see before we say it's ready to be included in a release.

zoq · 2017-01-18T16:30:33Z

Ah, I meant to add, I think it is fine to merge now too before the merge gets hard(er), and possibly open some issues for any of the relevant comments or anything that needs to be addressed later.

I agree, it looks like it's already not an straightforward merge, and this would open the possibility to create some easy issues.

rcurtin · 2017-01-18T19:23:44Z

I agree I guess, there is another thing regarding the optimizer that I haven't brought up that is kind of essential when it comes to the training process. None of the optimizers offer an interface to use a test or validation set for the evaluation/test process instead they reuse the training set.

Hmmm, now there is something interesting to think about. I see two ways to go---the first is to modify the optimizer abstraction so it supports a separate test set. The other is to make a "wrapper" optimizer that calls Evaluate() with the test set and Gradient() with the training set. I can see difficulties with each of those ideas. Definitely something to figure out how to do nicely later. :)

I don't have any more comments, I think it's ready for merge whenever you address the comments that you wanted to now and we can open more issues to be fixed later. I don't know if anyone else had any comments on the PR.

…roblems.

zoq · 2017-01-19T16:32:06Z

Thanks, I think it's ready.

zoq · 2017-01-26T18:01:28Z

Merged, I'll open another issue for the things discussed here.

zoq and others added 30 commits June 2, 2016 20:50

Fix explicitly specialized template issue.

4727298

Merge remote-tracking branch 'upstream/master'

05b36fc

edge_boxes: feature extraction

00e867f

Properly resetting auxBound. Start using a Reset() method, to avoid f…

c8d5766

…utures errors like this.

backported ind2sub and sub2ind

5149efd

backported ind2sub and sub2ind

61e63b9

Revert "edge_boxes: feature extraction"

5f01b84

This reverts commit 9d85b64.

backported sub2ind & ind2sub

8907d5a

fix doc tutorial

b8da5c6

Use appveyor cache (nuget and armadillo).

0d6d3af

fix typo

45e8cd6

added test for ind2sub and sub2ind

01e699c

Minor style fixes for ind2sub() test.

7e8abed

Add new contributors.

7bbd897

Try debugging symbols for AppVeyor build to see if it is faster.

29fcf0a

Merge remote-tracking branch 'upstream/master'

cbbd671

Merge remote-tracking branch 'upstream/master'

ee7ff36

Merge remote-tracking branch 'upstream/master'

80cc90f

Remove the RMVA model.

055775d

Remove unused ann functions.

6d73df2

Remove unused ann layer.

5051083

Increase the number of template arguments for the boost list class.

c5d74f1

Move pooling rules into the pooling class. So that we can use the Max…

120263d

…Pooling and MeanPooling class names for the actual module name.

Use the stride parameter inside the convolution function.

2016462

Increase the number of template arguments for the boost list class.

ac174b1

Remove stride paramater from svd and fft convolution rule.

e9f9eb4

Refactor ann layer.

63a6c4e

Remove the rmva model for the CmakeLists file.

ed538ba

Add visitor function set; which abstracts away the different types of…

2db9ef7

… layers.

Minor style fixes.

9134309

zoq added 14 commits December 15, 2016 18:21

Increase the number of template arguments for the boost list class.

d5a5b3a

Remove stride paramater from svd and fft convolution rule.

52b8a13

Refactor ann layer.

f025886

Minor style fixes.

8d9de82

Refactor recurrent network test.

1dfe0c6

Minor style fixes.

24748b0

Refactor neural visual attention modules.

f3d48b8

Use refactored rnn,ffn classes for the ann tests.

e4e73e6

Split layer modules into definition and implementation.

4ed0e6f

Remove merge relics.

552e74c

Vectorise isn't supported through all armadillo versions.

07945e6

Decrease the number of parallel builds.

53b4855

Add Train() function that uses a default optimizer to train the model.

eb7b266

Remove comment.

c79f26b

rcurtin reviewed Dec 29, 2016

View reviewed changes

zoq mentioned this pull request Jan 17, 2017

fixes: convolve with correct weight slice, pad input in forward pass #737

Closed

rcurtin reviewed Jan 17, 2017

View reviewed changes

zoq added 5 commits January 19, 2017 14:32

Minor style fix; remove extra space.

1dd7652

Store/Restore the input when saving/loading the network model.

5acbcd3

Simplify the input and target parameter.

7e759a2

Minor style fix; move up comment to avoid potential licence parsing p…

826a399

…roblems.

Remove unused parameter comments.

ec10c75

zoq closed this Jan 26, 2017

rcurtin mentioned this pull request Jan 11, 2018

Reorder data for RNN::Train() and other RNN functions #1192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp ann modules #831

Revamp ann modules #831

zoq commented Dec 15, 2016

rcurtin Dec 29, 2016

zoq Dec 30, 2016

rcurtin Jan 9, 2017

rcurtin left a comment

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin Jan 17, 2017

zoq Jan 18, 2017

rcurtin commented Jan 17, 2017

zoq commented Jan 18, 2017

zoq commented Jan 18, 2017

rcurtin commented Jan 18, 2017

zoq commented Jan 19, 2017

zoq commented Jan 26, 2017

Revamp ann modules #831

Revamp ann modules #831

Conversation

zoq commented Dec 15, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin commented Jan 17, 2017

zoq commented Jan 18, 2017

zoq commented Jan 18, 2017

rcurtin commented Jan 18, 2017

zoq commented Jan 19, 2017

zoq commented Jan 26, 2017