Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp ann modules #831

Closed
wants to merge 88 commits into from
Closed

Revamp ann modules #831

wants to merge 88 commits into from

Conversation

zoq
Copy link
Member

@zoq zoq commented Dec 15, 2016

This pull request revamps the ann modules. It should be a lot easier to create networks especially more complex networks that depend on the concat layer or the recurrent layer.

Changes:

  • The way parameters of different types are dealt with is greatly simplified: arma::Mat for all input and output parameters instead of arma::Cube to represent 3rd order tensors. Makes it possible to e.g. use the linear layer for convolutional and standard feed forward networks.

  • The introduced Recurrent module/layer is able to transform each non-recurrent layer/module into a recurrent layer.

  • In addition to the feedforward, recurrent and convolution network test, each layer is now tested by approximating the Jacobian matrix.

  • The introduction of the Concat module/layer allows us to split the network into subnetworks, which makes the implementation of e.g. the inception network a lot easier. (I'll open another PR for the inception layer once the PR is merged).

  • The way the network is created is greatly simplified instead of using:

LinearLayer<> inputLayer(trainData.n_rows, hiddenLayerSize);
BiasLayer<> inputBiasLayer(hiddenLayerSize);
BaseLayer<PerformanceFunction> inputBaseLayer;
LinearLayer<> hiddenLayer1(hiddenLayerSize, trainLabels.n_rows);
BiasLayer<> hiddenBiasLayer1(trainLabels.n_rows);
BaseLayer<LogSoftMaxLayer<>> outputLayer;
OutputLayerType classOutputLayer;
auto modules = std::tie(inputLayer, inputBiasLayer, inputBaseLayer,
                        hiddenLayer1, hiddenBiasLayer1, outputLayer);

FFN<decltype(modules), decltype(classOutputLayer), RandomInitialization,
    PerformanceFunctionType> net(modules, classOutputLayer);

you can create the same network with:

FFN<NegativeLogLikelihood<> > model;
model.Add<Linear<> >(trainData.n_rows, hiddenLayerSize);
model.Add<SigmoidLayer<> >();
model.Add<Linear<> >(hiddenLayerSize, outputSize);
model.Add<LogSoftMax<> >();

zoq and others added 30 commits June 2, 2016 20:50
…Pooling and MeanPooling class names for the actual module name.
Select<arma::mat, arma::mat>*,
Sequential<arma::mat, arma::mat>*,
VRClassReward<arma::mat, arma::mat>*
>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quite surprised, when I reviewed the visitor paradigm and then the boost::variant sources, to discover that boost::variant and static_visitor are completely implemented at compile time and there is no runtime overhead. It took a while to read the boost::variant code to verify (and I don't completely understand it, only "sort of"), and then taking a look at mlpack programs:

$ nm -g bin/mlpack_knn | c++filt | grep MonoSearchVisitor
$

I'm quite impressed! I didn't think it could be done entirely at compile-time.

Anyway, one drawback of this approach is that it doesn't make it easy for users to add their own layers. One way you could do that is this:

using LayerTypes = boost::variant<
    Add<arma::mat, arma::mat>*,
    ...,
#ifdef EXTRA_LAYER_TYPES
    EXTRA_LAYER_TYPES,
#endif
    VRClassReward<arma::mat, arma::mat>*
>;

And then a user can add their own layer by adding that to EXTRA_LAYER_TYPES before including mlpack.

Another possible issue that I have not thought of any solution to is this: if you know the structure of your network at compile time you could specify a layer as, e.g., Linear<arma::mat::fixed<10, 10>, arma::mat::fixed<10, 10>>, but that's not part of the boost::variant object, so you couldn't use it. Maybe in situations like that we can use dynamic dispatch of some sort? I'll keep thinking but I don't know if I will be able to come up with something straightforward---boost::any wouldn't be a good solution because then we lose the static visitor and any call to any layer will have to be a lookup somehow. Either way I think even if we don't have a solution for now, it would be great to use this as-is and try to improve later, because this whole idea is so much nicer for users than before too.

(I haven't reviewed the rest of the code yet---I've primarily focused only on this point so far. I'll add comments about the rest of the code hopefully in the next few days.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a closer look at the PR. I think the easiest solution to use an own layer is to inherit from one of the existing layer e.g. Linear<>.

class OwnLayer : public Liner<>
{
 public:
  OwnLayer();
  void Forward(...);
  void Backward(...):
  void Gradient(...);
};

afterwards use the rest of the code as always. It's not ideal but it works, without rebuilding the library. I guess another solution would be to extend boost-variant something like:

template <typename LocalTypes>
class FFN
{
  typename boost::make_variant_over<
      typename boost::mpl::joint_view<
      LayerTypes,
      LocalTypes
    >::type
  >::type extendedTypes;

  ...
};


FFN<OwnLayer<> > model;
model.Add<OwnLayer<> >();

The problem I see is that you have to specify the type of the new layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you have to make Forward(), Backward(), and Gradient() virtual for inheriting from Layer<> to work?

Anyway the LocalTypes solution is nicer than what I proposed.

I don't see a way using boost::variant to collect the full list of types used by Add() at compile time. Anything I can think of is pretty ugly. But I don't think it's a huge problem to ask users to use a template parameter if they want to use their own custom layer type---we can just try to provide enough types that it's not a problem. And maybe if we think of something better later, we can refactor. What do you think?

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. I think that if we have good documentation and examples for this, then publicize it, we will get a lot of interest---especially if we can show that it outperforms some other toolkits. I hope the comments I've made are useful. For the most part I've just assumed that all the layers are implemented correctly; there are so many, I think it would take me a very long time to read them all. :) I focused primarily on the structure of the code at a high-level and specifically on the FFN and RNN classes, plus the abstractions of the layers themselves.

I have a few more general comments and questions:

  • Is the CNN class meant to be replaced by FFN with convolutional layers? I think that's the case, just want to be sure I understand.

  • The FFN class holds arma::mats internally mostly, but the layers are general enough that they can support other types of input, like arma::Mat<int> or whatever else. Should the types held by FFN instead be a function of the element types held by the layers? It might be nice for that to be the case, especially because it could also solve Tham's question in Could we change the response type of ann from arma::mat to arma::Row<size_t>? #541 from many months ago. It would also allow training reduced-precision neural networks, like in https://arxiv.org/pdf/1502.02551.pdf for instance.

  • It seems like a lot of the parameters of the FFN and RNN classes are there specifically for use by the optimizer during training. That's fine with me, but I thought that another option might be to create some kind of "optimization helper" class that held the network itself and all of the training parameters. I think that would be a lot of refactoring work though since it would apply to the layers too; maybe it's only something to think about (and maybe it's not a good idea in the long run).

  • Since this is such a nice refactoring and has so much potential, I think we can make a big splash if we do some planning on how to release this with maximum impact to the machine learning and data science community. If you're up for that, we can plot out some simple roadmap of things to do before releasing this as part of mlpack 3.0.0.

Overall this is great! I am very excited. Sorry that my review took so long. :(

*
* @param layer The Layer to be added to the model.
*/
void Add(LayerTypes layer) { network.push_back(layer); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are two Add() overloads here---one constructs a new layer and adds it to the model, and the other takes a pointer to a layer that is actually owned at a higher scope and puts it in the network without copying it (it just takes the pointer). It seems to me like you could end up in a weird situation if you did something like this...

Linear<> l(10, 10);

// modify the parameters of L somehow
// ...

{
  FFN<> f;
  f.Add(&l);
} // destructs l

// Try to re-use l in another network.
FFN<> f2;
f2.Add(&l);
// But since l was already destructed, it's not the same!

Do you think it would be better to instead provide a "copy" and "move" version of Add()? Kind of like:

// Copy a layer and add it to the network.
template<typename LayerType>
void Add(const LayerType& l, /* SFINAE to check that LayerType is in LayerTypes */);
// Take ownership of a layer and add it to the network.
template<typename LayerType>
void Add(LayerType&& l, /* SFINAE to check that LayerType is in LayerTypes */);

I don't know if that's the best way to accomplish it. It looks like for most of the layers, we can use the default copy and move constructors/operators (I think).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, using the copy and move constructors/operators is a good idea.

for (size_t i = 0; i < network.size(); ++i)
{
offset += boost::apply_visitor(WeightSetVisitor(std::move(parameter),
offset), network[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding right, this clever design means that all parameters in the network are stored in contiguous memory. That is nice! It'll provide nice cache benefits and should also accelerate serialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the drawback is, that it's not that easy to change the weight vector size during training, but I couldn't think of any situation where that's necessary.

OutputLayerType>::value,
"The type of outputLayer must be OutputLayerType.");
this->predictors = std::move(predictors);
this->responses = std::move(responses);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know until I looked it up that calling std::move() on a const reference actually just calls the copy constructor. If I understand correctly, we must hold predictors and responses as members, since the optimizer will be calling Evaluate() and Gradient() repeatedly. Another option is simply to hold a pointer to predictors and responses, then in Train(), just take the address of the given data, and when training is done, set the pointers to NULL.

{
this->deterministic = deterministic;
ResetDeterministic();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the two checks above are for safety, since the user might for some reason call Evaluate() or Gradient(). Typically though the optimizer will be calling those two functions, and in those cases I think you have already called ResetParameters() and ResetDeterministic() in Train() before calling the optimizer. I think it would be fine, if you wanted, to remove those two checks from here, and then provide a "user" overload of Evaluate() and Gradient() that don't need the parameters parameter (since it's unused anyway), and the documentation could direct users to use that one if they needed. I don't know if it would give much speedup, but if Evaluate() is called for every point during training, it might be noticeable---might be worth testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, that some layer act differently in training and testing mode. Like the Dropout layer randomly sets some values to zero in training mode and uses all values while testing. Since the optimizer evaluates the network after each batch/iteration we have to set the deterministic parameter accordingly and sitch it back for the next iteration.

// Evaluate the gradient for this iteration.
if (shuffle)
  function.Gradient(iterate, visitationOrder[currentFunction], gradient);
else
  function.Gradient(iterate, currentFunction, gradient);

for the Gradient function, we have to set the deterministic parameter to training

// Now add that to the overall objective function.
if (shuffle)
  overallObjective += function.Evaluate(iterate,
      visitationOrder[currentFunction]);
else
  overallObjective += function.Evaluate(iterate, currentFunction);

for the Evaluate function, we have to set the deterministic parameter to testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, then there is no way around it.

Forward(arma::mat(predictors.colptr(i), predictors.n_rows, 1, false, true),
network);
currentInput = std::move(arma::mat(predictors.colptr(i),
predictors.n_rows, 1, false, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also get away with currentInput = predictors.unsafe_col(i) here and a few lines later for currentTarget.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, maybe it's sufficient to just save the index, I'll have to check if that's works.

* LoadOutputParameterVisitor restores the output parameter using the given
* parameter set.
*/
class LoadOutputParameterVisitor : public boost::static_visitor<void>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is very long; if you wanted to, I think it would be reasonable to split each visitor into its own file since there are so many of them. Your call, I think it's fine either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets do this, would definitely increase the readability.

* mlpack is free software; you may redistribute it and/or modify it under the
* terms of the 3-clause BSD license. You should have received a copy of the
* 3-clause BSD license along with mlpack. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
* Tests for various convolution strategies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, can we put this comment back up above the license? If we ever need to change the license text, this'll probably mess up the awk scripts I try to make. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's make it easier for awk.

module.Backward(std::move(input), std::move(output), std::move(delta));
BOOST_REQUIRE_EQUAL(arma::accu(output), arma::accu(delta));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a lot of layer types aren't tested; should we add some tests for those too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, implementing more tests is definitely on my list, I think I used at least every layer in one test. Like there is no test for the base layer in ann_layer_test, since each activation function is tested in the activation_functions_test or the mean pooling layer is tested in pooling_test.

offset), network[i]);

boost::apply_visitor(resetVisitor, network[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to clear all of the locally-held training-time parameters like currentInput, currentTarget, etc., here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If somebody calls the Gradient function right after loading the model before calling the Evaluate or Train function right.

@@ -47,6 +47,8 @@ class FFTConvolution
* @param input Input used to perform the convolution.
* @param filter Filter used to perform the conolution.
* @param output Output data that contains the results of the convolution.
* @param dW Stride of filter application in the x direction.
* @param dH Stride of filter application in the y direction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like dW and dH aren't parameters to this method; maybe this comment should be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@rcurtin
Copy link
Member

rcurtin commented Jan 17, 2017

Ah, I meant to add, I think it is fine to merge now too before the merge gets hard(er), and possibly open some issues for any of the relevant comments or anything that needs to be addressed later.

@zoq
Copy link
Member Author

zoq commented Jan 18, 2017

This looks great to me. I think that if we have good documentation and examples for this, then publicize it, we will get a lot of interest---especially if we can show that it outperforms some other toolkits. I hope the comments I've made are useful. For the most part I've just assumed that all the layers are implemented correctly; there are so many, I think it would take me a very long time to read them all. :) I focused primarily on the structure of the code at a high-level and specifically on the FFN and RNN classes, plus the abstractions of the layers themselves.

I have a few more general comments and questions:

Is the CNN class meant to be replaced by FFN with convolutional layers? I think that's the case, just want to be sure I understand.

The FFN class holds arma::mats internally mostly, but the layers are general enough that they can support other types of input, like arma::Mat or whatever else. Should the types held by FFN instead be a function of the element types held by the layers? It might be nice for that to be the case, especially because it could also solve Tham's question in #541 from many months ago. It would also allow training reduced-precision neural networks, like in https://arxiv.org/pdf/1502.02551.pdf for instance.

It seems like a lot of the parameters of the FFN and RNN classes are there specifically for use by the optimizer during training. That's fine with me, but I thought that another option might be to create some kind of "optimization helper" class that held the network itself and all of the training parameters. I think that would be a lot of refactoring work though since it would apply to the layers too; maybe it's only something to think about (and maybe it's not a good idea in the long run).

Since this is such a nice refactoring and has so much potential, I think we can make a big splash if we do some planning on how to release this with maximum impact to the machine learning and data science community. If you're up for that, we can plot out some simple roadmap of things to do before releasing this as part of mlpack 3.0.0.

Overall this is great! I am very excited. Sorry that my review took so long. :(

Thanks for taking the time to look over the PR.

  • Yes the CNN class is meant to be replaced by FFN, a convolutional neural network is also a feed-forward network, so it makes sense to use the same class. The previous design used the arma::cube type to handle 3rd order tensors which could be easily replaced by arma::mat as you pointed out in the split PR so there wasn't any reason to implement another class just to handle arma::cube.

  • Absolutely, should be easy to do.

  • I agree I guess, there is another thing regarding the optimizer that I haven't brought up that is kind of essential when it comes to the training process. None of the optimizers offer an interface to use a test or validation set for the evaluation/test process instead they reuse the training set.

  • Absolutely, I already have some things in mind that I like to see before we say it's ready to be included in a release.

@zoq
Copy link
Member Author

zoq commented Jan 18, 2017

Ah, I meant to add, I think it is fine to merge now too before the merge gets hard(er), and possibly open some issues for any of the relevant comments or anything that needs to be addressed later.

I agree, it looks like it's already not an straightforward merge, and this would open the possibility to create some easy issues.

@rcurtin
Copy link
Member

rcurtin commented Jan 18, 2017

I agree I guess, there is another thing regarding the optimizer that I haven't brought up that is kind of essential when it comes to the training process. None of the optimizers offer an interface to use a test or validation set for the evaluation/test process instead they reuse the training set.

Hmmm, now there is something interesting to think about. I see two ways to go---the first is to modify the optimizer abstraction so it supports a separate test set. The other is to make a "wrapper" optimizer that calls Evaluate() with the test set and Gradient() with the training set. I can see difficulties with each of those ideas. Definitely something to figure out how to do nicely later. :)

I don't have any more comments, I think it's ready for merge whenever you address the comments that you wanted to now and we can open more issues to be fixed later. I don't know if anyone else had any comments on the PR.

@zoq
Copy link
Member Author

zoq commented Jan 19, 2017

Thanks, I think it's ready.

@zoq
Copy link
Member Author

zoq commented Jan 26, 2017

Merged, I'll open another issue for the things discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants