Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment to see the advantage of boost::visitor for the ANN code #2647

Closed
rcurtin opened this issue Oct 2, 2020 · 17 comments
Closed

Experiment to see the advantage of boost::visitor for the ANN code #2647

rcurtin opened this issue Oct 2, 2020 · 17 comments

Comments

@rcurtin
Copy link
Member

rcurtin commented Oct 2, 2020

One of the things that we are finding in #2458 is that the ANN code takes a lot of time and a huge amount of memory to compile. This is probably because we use boost::visitor throughout that code to avoid the runtime overhead that may be associated with virtual inheritance. But, the problem is that the use of boost::visitor (and other boost libraries) is really hard on the compiler, and so we have a situation where a lot of people can't compile mlpack in a reasonable amount of time (if at all!). We're also removing other Boost dependencies (see #2440).

One possible way around this is to use virtual inheritance... so, the structure of the Linear<> class in src/mlpack/methods/ann/layer/linear.hpp would become something like this:

template<
    typename InputDataType = arma::mat,
    typename OutputDataType = arma::mat,
    typename RegularizerType = NoRegularizer
>
class Linear : Layer
{
 public:
  Linear(const size_t inSize, const size_t outSize, RegularizerType regularizer = RegularizerType());

  virtual void Reset();

  template<typename eT>
  virtual void Forward(const arma::Mat<eT>& input, const arma::Mat<eT>& output);

  template<typename eT>
  virtual void Backward(const arma::Mat<eT>& input, const arma::Mat<eT>& gy, arma::Mat<eT>& g);

  template<typename eT>
  virtual void Gradient(const arma::Mat<eT>& input, const arma::Mat<eT>& error, arma::Mat<eT>& gradient);

  ...
};

So, the question is, what kind of runtime effects would this have? (It would almost certainly have really big impacts on compilation time, reducing it significantly.)

We should run an experiment to see what the effect is. I would suggest that one way to do this would be to try and set up a situation where we are calling very many virtual functions, in order to incur the maximum overhead. Then we can compare this with the original code, and see if there is any significant runtime difference.

So, here's a way to do that:

  1. Change the Linear class in something like the way suggested above. You'll need to create a Layer base class, of course. You can probably ignore some functions like serialization, but, up to you. All we need the network to do in the end is train and predict. Ignore all the other layers (unless you want to try other layers too).

  2. Change the FFN class to use std::vector<Layer*> throughout the code, and replace uses of visitors with calls directly to a Layer* object. Note that we only need to do training and then evaluation, so you can probably leave some members of FFN commented out. Don't worry about RNN or WGAN or any other classes.

  3. Write a simple test program that creates an FFN with many small linear layers (maybe try 5, 10, 25, 100?). When training the network, use a reasonably small low-dimensional dataset, and a batch size of 1 for the optimizer. This will hopefully cause a lot of virtual function calls.

  4. Write that same network for the existing mlpack implementation.

  5. Run the programs from (3) and (4) and see how they compare in terms of runtime! You'll want to make sure the initialization and number of iterations is exactly the same, of course.

This is just an experiment, so you can leave a lot of stuff commented out for the sake of the experiment. Actually refactoring the ANN codebase will be a much larger effort (and we could split it up so lots of people could work on it, too), and all we want to see here is whether or not it would be a good idea to embark on this refactoring.

Anyway, I have been meaning to do this for some time, but increasingly I'm realizing that I'm not going to find the time anytime soon, so @shrit suggested that I write this up as an issue today. So here it is! 👍

Let me know if I can clarify anything, and if you're going to give it a shot, get ready for lots of compiler errors. 😄

@kartikdutt18
Copy link
Member

Hey @rcurtin, @shrit, Thanks for opening this issue. Maybe we can have a couple of people working on the experiment. We can have a person do changes in the Linear layer and the other one in FFN class simultaneously. This would reduce their workload and be easier to test. Then they can test it out.

@geekypathak21
Copy link
Member

Hey @rcurtin, @shrit, Thanks for opening this issue. Maybe we can have a couple of people working on the experiment. We can have a person do changes in the Linear layer and the other one in FFN class simultaneously. This would reduce their workload and be easier to test. Then they can test it out.

@kartikdutt18 I think it is better if only one person try this because we have to ignore many things here it is more like a hack. I would like to do this but currently I can't promise 😞 just busy with other stuff.

@Aakash-kaushik
Copy link
Contributor

Hey @kartikdutt18, @himanshupathak21061998 is it okay if i work on this experimentation the way that himashu suggested.

@rcurtin
Copy link
Member Author

rcurtin commented Oct 6, 2020

@Aakash-kaushik I think you should go for it---I do agree that it should be done kind of like a hack. All we need to verify is whether the approach has any runtime drawbacks, and then we can decide what to do. 👍

@Aakash-kaushik
Copy link
Contributor

@rcurtin I will start working on it. 🚀

@Aakash-kaushik
Copy link
Contributor

Aakash-kaushik commented Oct 8, 2020

I needed some help to discuss on how the base class should be implemented but i somehow can't figure out on how to implement the base class as a abstract class with virtual functions that take templates, one thing i did was to make the base class an abstract class is that i took the function template parameters in the class itself but then i had some declaration problems.
The second thing is as we make the base class members virtual and derive linear class from that we can't really use templates for the derived members there too.

Also an addition: i made the virtual functions pure virtual functions because i don't think we will need a empty layer object anywhere.

@shrit
Copy link
Member

shrit commented Oct 8, 2020

@Aakash-kaushik Would it possible to open a pull request with the modification you have made? even if it far to be complete, this will allow us to visualize better the work you have done and will provide you with better help and support 👍

@Aakash-kaushik
Copy link
Contributor

@shrit yup i will do that, thanks.

@mlpack-bot
Copy link

mlpack-bot bot commented Nov 8, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Nov 8, 2020
@Aakash-kaushik
Copy link
Contributor

Please keep this open. I was working on this but got extremely busy. And i will start working on this again.

@mlpack-bot mlpack-bot bot removed the s: stale label Nov 8, 2020
@Aakash-kaushik
Copy link
Contributor

@zoq can you keep this open ?

@zoq zoq added this to the mlpack 4.0.0 milestone Dec 13, 2020
@zoq
Copy link
Member

zoq commented Dec 13, 2020

Ideally we can sort this out and make this part of mlpack 4.0 @Aakash-kaushik do you think you could open a PR with the code you already have? Happy to help you solve the remaining issues.

@Aakash-kaushik
Copy link
Contributor

Ideally we can sort this out and make this part of mlpack 4.0 @Aakash-kaushik do you think you could open a PR with the code you already have? Happy to help you solve the remaining issues.

The thing right now is that I am going through my university exams, and to create a PR with the the code that i have right now i will have to sort a lot more bugs. If you can feel free to use the code from the repo i have and if we can wait until the starting of January i will be able to complete this experiment and if we don't see a performance drop or other criterion we were considering i can help with the porting of other methods too.

@zoq
Copy link
Member

zoq commented Dec 16, 2020

@Aakash-kaushik no problem, I can pick up what you did and continue, I'll keep you updated. Best of luck with your exams.

@Aakash-kaushik
Copy link
Contributor

@Aakash-kaushik no problem, I can pick up what you did and continue, I'll keep you updated. Best of luck with your exams.

Thank you so much.

@Aakash-kaushik
Copy link
Contributor

Hey @zoq surprisingly I have got time gaps between my university exams so I will keep updating that repo.

@rcurtin
Copy link
Member Author

rcurtin commented May 29, 2022

Since #2777 is merged, this one is completed now. 👍

@rcurtin rcurtin closed this as completed May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants