New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Alias layer #1091

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@ShangtongZhang
Member

ShangtongZhang commented Aug 11, 2017

I realized I need to modify all the visitors. So I just refactored one visitor first, if you @rcurtin @zoq both are happy with this kind of change, I will apply it to all the visitors. Just to make sure this change is exactly what we want, because once I have refactored all the visitors, any small change will lead to huge work.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 11, 2017

Member

Some concern from my side:

  • Should I use std::shared_ptr every where explicitly? Or I can have a typedef like using Wrapper = std::shared_ptr, then use Wrapper instead, in case we may want to switch to other kind of smart pointer in the future.

  • In function parameters, should I use reference?
    Now it's size_t operator()(std::shared_ptr<LayerType> layer), should I switch to size_t operator()(std::shared_ptr<LayerType>& layer). Value parameter will lead to extra copy, although I don't think copy of shared_ptr will lead to much overhead. However reference parameter cannot be bound to const object, I'm not sure whether it will happen or not.

Member

ShangtongZhang commented Aug 11, 2017

Some concern from my side:

  • Should I use std::shared_ptr every where explicitly? Or I can have a typedef like using Wrapper = std::shared_ptr, then use Wrapper instead, in case we may want to switch to other kind of smart pointer in the future.

  • In function parameters, should I use reference?
    Now it's size_t operator()(std::shared_ptr<LayerType> layer), should I switch to size_t operator()(std::shared_ptr<LayerType>& layer). Value parameter will lead to extra copy, although I don't think copy of shared_ptr will lead to much overhead. However reference parameter cannot be bound to const object, I'm not sure whether it will happen or not.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 11, 2017

Member

I thought about it a little bit; the original problem is that you want to have two layers in different FFNs share weights, but when the layers are destroyed then the layer gets deleted twice. I think a way to solve this could be to manually make each parameter matrix in one FFN an alias of the other FFN's parameters.

So, I think we could do this, if we are using only simple layer types (or one layer type):

Linear layer1(100, 50);
Linear layer2(1, 1); // Size doesn't matter for the constructor.
layer2.Parameters() = arma::mat(layer1.Parameters().memptr(),
    layer1.Parameters().n_rows,
    layer1.Parameters().n_cols,
    false /* don't copy memory */, true /* size cannot be changed */);

Then layer2 and layer1 will share parameters. For the Linear layer you might also need to do the same trick with the other matrices, like InputParameter(), OutputParameter(), Delta(), and Gradient() (not sure).

But that solution doesn't scale if you need to be able to share arbitrary layers across networks. In that case I would suggest the addition of a class:

// Holds another layer and shares weights with the given layer, but does not "own" it.
// (This documentation line I wrote there is bad, maybe the code idea demonstrates more.)
class AliasLayer
{
 public:
  AliasLayer(LayerTypes& layer) : layer(&layer) { }

  ~AliasLayer()
  {
    // Intentionally do nothing: we are not responsible for deleting the layer pointer.
    // I am not sure but maybe this explicit empty destructor isn't necessary.
    // (I am doing PR review on planes again so it is a little hard to check documentation.  It's amazing
    // just that I have internet at all somewhere out in the Pacific Ocean.  Looks like I am close to
    // somewhere over Tonga now?)
  }

  template<typename eT>
  void Forward(const arma::Mat<eT>&& input, arma::Mat<eT>&& output)
  {
    layer->Forward(std::move(input), std::move(output));
  }

  template<typename eT>
  void Backward(const arma::Mat<eT>&& input, arma::Mat<eT>&& gy, arma::Mat<eT>&& g)
  {
    layer->Backward(std::move(input), std::move(gy), std::move(g));
  }

  template<typename eT>
  void Gradient(const arma::Mat<eT>&& input, arma::Mat<eT>&& error, arma::Mat<eT>&& gradient)
  {
    layer->Gradient(std::move(input), std::move(error), std::move(gradient));
  }

 private:
  // Maybe this could be a reference, but it seems like then it would be hard to keep the object
  // from being destructed.
  LayerTypes* layer;
};

I think apply_visitor() is necessary in the implementations above, but you get the idea: the AliasLayer would just be a layer that wraps around an existing layer, allowing multiple networks the ability to share weights. I'm a little hesitant to apply shared_ptr everywhere in the code: it would make it harder for users to add their own layer types to LayerTypes (since now that also would need to be a shared_ptr), and there is also slight overhead (probably negligible here) to the reference counting necessary for shared_ptrs.

What do you think, could either of the alternate approaches I proposed work?

Member

rcurtin commented Aug 11, 2017

I thought about it a little bit; the original problem is that you want to have two layers in different FFNs share weights, but when the layers are destroyed then the layer gets deleted twice. I think a way to solve this could be to manually make each parameter matrix in one FFN an alias of the other FFN's parameters.

So, I think we could do this, if we are using only simple layer types (or one layer type):

Linear layer1(100, 50);
Linear layer2(1, 1); // Size doesn't matter for the constructor.
layer2.Parameters() = arma::mat(layer1.Parameters().memptr(),
    layer1.Parameters().n_rows,
    layer1.Parameters().n_cols,
    false /* don't copy memory */, true /* size cannot be changed */);

Then layer2 and layer1 will share parameters. For the Linear layer you might also need to do the same trick with the other matrices, like InputParameter(), OutputParameter(), Delta(), and Gradient() (not sure).

But that solution doesn't scale if you need to be able to share arbitrary layers across networks. In that case I would suggest the addition of a class:

// Holds another layer and shares weights with the given layer, but does not "own" it.
// (This documentation line I wrote there is bad, maybe the code idea demonstrates more.)
class AliasLayer
{
 public:
  AliasLayer(LayerTypes& layer) : layer(&layer) { }

  ~AliasLayer()
  {
    // Intentionally do nothing: we are not responsible for deleting the layer pointer.
    // I am not sure but maybe this explicit empty destructor isn't necessary.
    // (I am doing PR review on planes again so it is a little hard to check documentation.  It's amazing
    // just that I have internet at all somewhere out in the Pacific Ocean.  Looks like I am close to
    // somewhere over Tonga now?)
  }

  template<typename eT>
  void Forward(const arma::Mat<eT>&& input, arma::Mat<eT>&& output)
  {
    layer->Forward(std::move(input), std::move(output));
  }

  template<typename eT>
  void Backward(const arma::Mat<eT>&& input, arma::Mat<eT>&& gy, arma::Mat<eT>&& g)
  {
    layer->Backward(std::move(input), std::move(gy), std::move(g));
  }

  template<typename eT>
  void Gradient(const arma::Mat<eT>&& input, arma::Mat<eT>&& error, arma::Mat<eT>&& gradient)
  {
    layer->Gradient(std::move(input), std::move(error), std::move(gradient));
  }

 private:
  // Maybe this could be a reference, but it seems like then it would be hard to keep the object
  // from being destructed.
  LayerTypes* layer;
};

I think apply_visitor() is necessary in the implementations above, but you get the idea: the AliasLayer would just be a layer that wraps around an existing layer, allowing multiple networks the ability to share weights. I'm a little hesitant to apply shared_ptr everywhere in the code: it would make it harder for users to add their own layer types to LayerTypes (since now that also would need to be a shared_ptr), and there is also slight overhead (probably negligible here) to the reference counting necessary for shared_ptrs.

What do you think, could either of the alternate approaches I proposed work?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 11, 2017

Member

I prefer AliasLayer, it looks safer than the first one. And by

I think apply_visitor() is necessary

do you mean we add a template specialization of default boost::apply_visitor to make it work with existing visitor?

Member

ShangtongZhang commented Aug 11, 2017

I prefer AliasLayer, it looks safer than the first one. And by

I think apply_visitor() is necessary

do you mean we add a template specialization of default boost::apply_visitor to make it work with existing visitor?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 11, 2017

Member

Sounds good. About apply_visitor(), I just meant that I think since we have a LayerTypes we can't call Forward(), Backward(), Gradient(), etc. on it but instead we have to use visitors to do that.

Member

rcurtin commented Aug 11, 2017

Sounds good. About apply_visitor(), I just meant that I think since we have a LayerTypes we can't call Forward(), Backward(), Gradient(), etc. on it but instead we have to use visitors to do that.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 11, 2017

Member

My concern is those template member function checkers like HasParameters(). I think by a specialization of boost apply_visitor, we may be able to make it work with our new alias layer

Member

ShangtongZhang commented Aug 11, 2017

My concern is those template member function checkers like HasParameters(). I think by a specialization of boost apply_visitor, we may be able to make it work with our new alias layer

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 13, 2017

Member

I still can't figure out a way to make AliasLayer compatible with member function checkers.
I thought I could overload or specialize boost::apply_visitor before, but I realized I can't do that as it's the operant is boost::variant.
So it looks I need to update all visitors.
Say we have this:

class Alias
{
public:
  Alias(LayerTypes& layer) : layer(&layer) {}
  LayerTypes& Layer() { return *layer; }
private:
  LayerTypes* layer;
};

And we have the add visitor:

template<typename T>
inline AddVisitor::AddVisitor(T newLayer) :
    newLayer(std::move(newLayer))
{
  /* Nothing to do here. */
}

template<typename LayerType>
inline void AddVisitor::operator()(LayerType* layer) const
{
  LayerAdd<LayerType>(layer);
}

template<typename T>
inline typename std::enable_if<
    HasAddCheck<T, void(T::*)(LayerTypes)>::value, void>::type
AddVisitor::LayerAdd(T* layer) const
{
  layer->Add(newLayer);
}

template<typename T>
inline typename std::enable_if<
    !HasAddCheck<T, void(T::*)(LayerTypes)>::value, void>::type
AddVisitor::LayerAdd(T* /* layer */) const
{
  /* Nothing to do here. */
}

Then what I want to do is to overload operator() for AliasLayer and then pass the call to the internal layer of the alias layer, i.e.

inline void AddVisitor::operator()(AliasLayer* layer) const
{
  LayerAdd<???>(layer->Layer());
}

??? should be the internal type of the alias layer. But as the alias layer only holds a boost::variant object, we can't get the true layer type.
Making Alias a template class seems to be the only solution, i.e.

template<typename TrueLayer>
class Alias;

But that means we need to add all possible alias to the definition of LayerTypes, i.e.

using LayerTypes = boost::variant<Linear, Alias<Linear>, ReLU, Alias<Relu> ... >

It's even more complicated than shared_ptr solution for user to add a new layer type.

Member

ShangtongZhang commented Aug 13, 2017

I still can't figure out a way to make AliasLayer compatible with member function checkers.
I thought I could overload or specialize boost::apply_visitor before, but I realized I can't do that as it's the operant is boost::variant.
So it looks I need to update all visitors.
Say we have this:

class Alias
{
public:
  Alias(LayerTypes& layer) : layer(&layer) {}
  LayerTypes& Layer() { return *layer; }
private:
  LayerTypes* layer;
};

And we have the add visitor:

template<typename T>
inline AddVisitor::AddVisitor(T newLayer) :
    newLayer(std::move(newLayer))
{
  /* Nothing to do here. */
}

template<typename LayerType>
inline void AddVisitor::operator()(LayerType* layer) const
{
  LayerAdd<LayerType>(layer);
}

template<typename T>
inline typename std::enable_if<
    HasAddCheck<T, void(T::*)(LayerTypes)>::value, void>::type
AddVisitor::LayerAdd(T* layer) const
{
  layer->Add(newLayer);
}

template<typename T>
inline typename std::enable_if<
    !HasAddCheck<T, void(T::*)(LayerTypes)>::value, void>::type
AddVisitor::LayerAdd(T* /* layer */) const
{
  /* Nothing to do here. */
}

Then what I want to do is to overload operator() for AliasLayer and then pass the call to the internal layer of the alias layer, i.e.

inline void AddVisitor::operator()(AliasLayer* layer) const
{
  LayerAdd<???>(layer->Layer());
}

??? should be the internal type of the alias layer. But as the alias layer only holds a boost::variant object, we can't get the true layer type.
Making Alias a template class seems to be the only solution, i.e.

template<typename TrueLayer>
class Alias;

But that means we need to add all possible alias to the definition of LayerTypes, i.e.

using LayerTypes = boost::variant<Linear, Alias<Linear>, ReLU, Alias<Relu> ... >

It's even more complicated than shared_ptr solution for user to add a new layer type.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 13, 2017

Member

I think the compiler should be able to deduce the type, so you should be able to use: LayerAdd(layer); instead of LayerAdd<LayerType>(layer);. Let me know if that works for you.

Member

zoq commented Aug 13, 2017

I think the compiler should be able to deduce the type, so you should be able to use: LayerAdd(layer); instead of LayerAdd<LayerType>(layer);. Let me know if that works for you.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 13, 2017

Member

I don't think that will work. I think in that case compiler will deduce LayerType to boost::variant<...>. If that works, then boost::apply_visitor doesn't make any sense, we can call visitor directly rather than by apply_visitor. I think what apply_visitor does first is something like converting a variant type to its true type. But I think

inline void AddVisitor::operator()(AliasLayer* layer) const
{
  boost::apply_visitor(*this, layer->Layer());
}

should work. Another problem is it doesn't scale well. For two arguments, we need three extra functions to correctly delegate all possible combinations:

template <typename T>
void operator(T*, AliasLayer*);

template <typename T>
void operator(AliasLayer*, T*);

void operator(AliasLayer*, AliasLayer*);

For three arguments, we need 7 functions.
If we don't have three argument case, I'm fine to do this. I want to add another base class for all visitors to do this delegation. What do you think about this?

Member

ShangtongZhang commented Aug 13, 2017

I don't think that will work. I think in that case compiler will deduce LayerType to boost::variant<...>. If that works, then boost::apply_visitor doesn't make any sense, we can call visitor directly rather than by apply_visitor. I think what apply_visitor does first is something like converting a variant type to its true type. But I think

inline void AddVisitor::operator()(AliasLayer* layer) const
{
  boost::apply_visitor(*this, layer->Layer());
}

should work. Another problem is it doesn't scale well. For two arguments, we need three extra functions to correctly delegate all possible combinations:

template <typename T>
void operator(T*, AliasLayer*);

template <typename T>
void operator(AliasLayer*, T*);

void operator(AliasLayer*, AliasLayer*);

For three arguments, we need 7 functions.
If we don't have three argument case, I'm fine to do this. I want to add another base class for all visitors to do this delegation. What do you think about this?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 13, 2017

Member

Hang on, I am testing something now (currently debugging it). If it works I'll send a gist and we can use that.

Member

rcurtin commented Aug 13, 2017

Hang on, I am testing something now (currently debugging it). If it works I'll send a gist and we can use that.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 13, 2017

Member

Ok, here you go, this is a patch that gets across the idea:

https://gist.github.com/rcurtin/bdc867d28ed60e28b8174da9fe9faeb5

It's not quite done yet, but it compiles and you can run the test FeedForwardNetworkTest/AliasLayerTest (although it fails for now).

Basically, in each case, we are "forwarding" the function from AliasLayer to whatever it is holding by creating a visitor class on the spot and calling boost::apply_visitor(). There was some strange trickiness with forward declarations, etc., that means we can't actually hold a LayerTypes* in AliasLayer but instead we have to hold a void*.

However I think we can't do a "perfect" alias, because the optimizer depends on the memory of the parameters being contiguous. So right now, the test fails, because the optimizer will not update the weights in the aliased layer. If that is ok for your situation, then I guess we can leave it as-is and note that the AliasLayer is read-only (I guess maybe we can call it a ReadOnlyAliasLayer), although I think it is better to go the extra mile and make sure training works too.

To make training work, I think that we will have to reflect the size of the aliased layer's parameters matrix, so that the network (FFN or RNN) does allocate space for the layer in the parameters matrix. However, each time the parameters are updated, we will have to propagate the changes to the aliased layer's weights. That will be a little bit slow, but I don't see any way to avoid that.

Let me know what you think or if I overlooked anything, but I think this should get us like 90% of the way there.

Member

rcurtin commented Aug 13, 2017

Ok, here you go, this is a patch that gets across the idea:

https://gist.github.com/rcurtin/bdc867d28ed60e28b8174da9fe9faeb5

It's not quite done yet, but it compiles and you can run the test FeedForwardNetworkTest/AliasLayerTest (although it fails for now).

Basically, in each case, we are "forwarding" the function from AliasLayer to whatever it is holding by creating a visitor class on the spot and calling boost::apply_visitor(). There was some strange trickiness with forward declarations, etc., that means we can't actually hold a LayerTypes* in AliasLayer but instead we have to hold a void*.

However I think we can't do a "perfect" alias, because the optimizer depends on the memory of the parameters being contiguous. So right now, the test fails, because the optimizer will not update the weights in the aliased layer. If that is ok for your situation, then I guess we can leave it as-is and note that the AliasLayer is read-only (I guess maybe we can call it a ReadOnlyAliasLayer), although I think it is better to go the extra mile and make sure training works too.

To make training work, I think that we will have to reflect the size of the aliased layer's parameters matrix, so that the network (FFN or RNN) does allocate space for the layer in the parameters matrix. However, each time the parameters are updated, we will have to propagate the changes to the aliased layer's weights. That will be a little bit slow, but I don't see any way to avoid that.

Let me know what you think or if I overlooked anything, but I think this should get us like 90% of the way there.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 14, 2017

Member

I still prefer shared_ptr.

  • In general our Alias layer should be able to alias all kinds of layers, that means it must implement all possible member functions according to member function checkers. i.e. we have HasRho and HasModelCheck, so we need to add Rho() and Model() to Alias layer, which makes the checker always return true. Then we forward the call to the aliased layer by a new visitor. This is my understanding about how your Alias layer deals with member function checkers. So if user wants to add a new layer type, they need to know they may have to add some new functions in Alias class besides adding necessary new visitors, which I think is more inconvenient than shared_ptr solution. With shared_ptr, they only need to write using LayerTypes = boost::variant<shared_ptr<NewLayerType>...>, which I don't think is very inconvenient.

  • Unfortunately A3C does need the alias layer to be transparent to optimizer, i.e. optimizer should be able to update weights through alias layer. But as you mentioned, it will be a little bit slow. So it's not guaranteed it's still faster than shared_ptr

  • This Alias layer looks full of tricks, I really don't like void*, it looks like some c code from 10 years ago or we are writing an os kernel :< . I have a feeling (may be it's wrong) this Alias layer may bring us more trouble in the future than shared_ptr and it may constrain us from adding new things in our network framework in some day.

Member

ShangtongZhang commented Aug 14, 2017

I still prefer shared_ptr.

  • In general our Alias layer should be able to alias all kinds of layers, that means it must implement all possible member functions according to member function checkers. i.e. we have HasRho and HasModelCheck, so we need to add Rho() and Model() to Alias layer, which makes the checker always return true. Then we forward the call to the aliased layer by a new visitor. This is my understanding about how your Alias layer deals with member function checkers. So if user wants to add a new layer type, they need to know they may have to add some new functions in Alias class besides adding necessary new visitors, which I think is more inconvenient than shared_ptr solution. With shared_ptr, they only need to write using LayerTypes = boost::variant<shared_ptr<NewLayerType>...>, which I don't think is very inconvenient.

  • Unfortunately A3C does need the alias layer to be transparent to optimizer, i.e. optimizer should be able to update weights through alias layer. But as you mentioned, it will be a little bit slow. So it's not guaranteed it's still faster than shared_ptr

  • This Alias layer looks full of tricks, I really don't like void*, it looks like some c code from 10 years ago or we are writing an os kernel :< . I have a feeling (may be it's wrong) this Alias layer may bring us more trouble in the future than shared_ptr and it may constrain us from adding new things in our network framework in some day.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 14, 2017

Member

Yes, I think you are right that there are some drawbacks to the approach that I proposed (but I still had fun writing it). It is true that we would need to add all possible functions to the AliasLayer; however, I don't imagine that this is a huge problem, since adding any new function to any layer type also involves modifying ffn_impl.hpp or rnn_impl.hpp and then also creating a visitor for it.

In the past I have tried to be pretty resistant to allowing previously-unused features into the mlpack codebase, because it expands the set of C++ features that a developer must be familiar with in order to maintain the library. If we can keep the mlpack C++ subset small, then we increase the probability that someone is able to contribute without being scared of the code. (A lot of people who have contributed to mlpack that I have talked to have expressed fear over parts of the codebase that they don't know much about and that look complex---the tree code is a great example.) That's my first-level resistance to something like shared_ptr.

The second thing, which is more relevant here, is that you have to be really careful with shared_ptr to preserve performance. If you pass shared pointers by value---which this PR does right now---then the reference counter comes into play, so really you must always pass as const std::shared_ptr<T>& to avoid that happening. That can be a big performance penalty. There is some more here (although clearly this author does not like shared pointers at all, and doesn't provide any benefits of them, of which there are certainly some):

http://seanmiddleditch.com/dangers-of-stdshared_ptr/

It's easier to catch bugs that cause the program to fail entirely (like an invalid use of a void* that causes a segfault) than subtle bugs that cause performance penalties (like std::shared_ptr<T> vs const std::shared_ptr<T>& in a function call). The subtle bugs would be likely to make it past a code review and only get found way later. A segfault bug would be caught by Jenkins pretty quickly (and that's why we have unit tests anyway; but we can't unit test speed).

The bigger issue though, which we encounter regardless of what we do, is this:

Unfortunately A3C does need the alias layer to be transparent to optimizer, i.e. optimizer should be able to update weights through alias layer. But as you mentioned, it will be a little bit slow. So it's not guaranteed it's still faster than shared_ptr

The mlpack optimizers expect the parameters for all layers to be in a contiguous matrix. So the ANN code, when the network is reset with ResetParameters(), calls NetworkInit::Initialize() which causes each layer to store a contiguous subset of the memory allocated for the entire network.

This is a really nice design because it means the whole network is a contiguous block of memory; however, a drawback is that the design simply cannot support memory that is shared between multiple networks.

Therefore, even if we did use shared_ptr, we would still encounter the same problem that the optimization would not properly be shared between multiple networks, and we would need some way to make sure that a layer's weights are properly copied or updated between networks.

The last thing I want to point out about shared_ptr is that mlpack's ownership model is pretty clear right now: you either give a reference to a model that it shouldn't modify, or you tell the model it should take ownership with std::move(). By using shared_ptr, suddenly we have a situation where it is different:

Linear<> layer(...);
// Normally I would expect this to copy the layer, but with shared_ptr it would make an alias!
// So modifications to 'network' will cause modifications to 'layer', which is not what I would expect
// and isn't how the rest of mlpack behaves.
network.Add(layer);

This Alias layer looks full of tricks, I really don't like void*, it looks like some c code from 10 years ago or we are writing an os kernel :<

Unfortunately it's what's required here with the way I implemented it because I can't forward-declare a using declaration, but I agree, I don't really like void* either. If you don't like that then don't look at the bindings code; I had to do a lot of strange things there to make it all work. And unfortunately there was no other choice than void* in that case. :)

I thought about other ways that we could accomplish an alias, but even if we avoid shared_ptr and instead simply have FFN hold a std::vector<bool> of which layers it owns and which it doesn't, we still have the problem of contiguous memory for optimizers. I think that contiguous memory problem can only be solved with an intermediate type like AliasLayer.

Let me know if I misunderstood anything. I know that my response is very long, but I think it's important to discuss in complete detail to get to the right solution. :)

Member

rcurtin commented Aug 14, 2017

Yes, I think you are right that there are some drawbacks to the approach that I proposed (but I still had fun writing it). It is true that we would need to add all possible functions to the AliasLayer; however, I don't imagine that this is a huge problem, since adding any new function to any layer type also involves modifying ffn_impl.hpp or rnn_impl.hpp and then also creating a visitor for it.

In the past I have tried to be pretty resistant to allowing previously-unused features into the mlpack codebase, because it expands the set of C++ features that a developer must be familiar with in order to maintain the library. If we can keep the mlpack C++ subset small, then we increase the probability that someone is able to contribute without being scared of the code. (A lot of people who have contributed to mlpack that I have talked to have expressed fear over parts of the codebase that they don't know much about and that look complex---the tree code is a great example.) That's my first-level resistance to something like shared_ptr.

The second thing, which is more relevant here, is that you have to be really careful with shared_ptr to preserve performance. If you pass shared pointers by value---which this PR does right now---then the reference counter comes into play, so really you must always pass as const std::shared_ptr<T>& to avoid that happening. That can be a big performance penalty. There is some more here (although clearly this author does not like shared pointers at all, and doesn't provide any benefits of them, of which there are certainly some):

http://seanmiddleditch.com/dangers-of-stdshared_ptr/

It's easier to catch bugs that cause the program to fail entirely (like an invalid use of a void* that causes a segfault) than subtle bugs that cause performance penalties (like std::shared_ptr<T> vs const std::shared_ptr<T>& in a function call). The subtle bugs would be likely to make it past a code review and only get found way later. A segfault bug would be caught by Jenkins pretty quickly (and that's why we have unit tests anyway; but we can't unit test speed).

The bigger issue though, which we encounter regardless of what we do, is this:

Unfortunately A3C does need the alias layer to be transparent to optimizer, i.e. optimizer should be able to update weights through alias layer. But as you mentioned, it will be a little bit slow. So it's not guaranteed it's still faster than shared_ptr

The mlpack optimizers expect the parameters for all layers to be in a contiguous matrix. So the ANN code, when the network is reset with ResetParameters(), calls NetworkInit::Initialize() which causes each layer to store a contiguous subset of the memory allocated for the entire network.

This is a really nice design because it means the whole network is a contiguous block of memory; however, a drawback is that the design simply cannot support memory that is shared between multiple networks.

Therefore, even if we did use shared_ptr, we would still encounter the same problem that the optimization would not properly be shared between multiple networks, and we would need some way to make sure that a layer's weights are properly copied or updated between networks.

The last thing I want to point out about shared_ptr is that mlpack's ownership model is pretty clear right now: you either give a reference to a model that it shouldn't modify, or you tell the model it should take ownership with std::move(). By using shared_ptr, suddenly we have a situation where it is different:

Linear<> layer(...);
// Normally I would expect this to copy the layer, but with shared_ptr it would make an alias!
// So modifications to 'network' will cause modifications to 'layer', which is not what I would expect
// and isn't how the rest of mlpack behaves.
network.Add(layer);

This Alias layer looks full of tricks, I really don't like void*, it looks like some c code from 10 years ago or we are writing an os kernel :<

Unfortunately it's what's required here with the way I implemented it because I can't forward-declare a using declaration, but I agree, I don't really like void* either. If you don't like that then don't look at the bindings code; I had to do a lot of strange things there to make it all work. And unfortunately there was no other choice than void* in that case. :)

I thought about other ways that we could accomplish an alias, but even if we avoid shared_ptr and instead simply have FFN hold a std::vector<bool> of which layers it owns and which it doesn't, we still have the problem of contiguous memory for optimizers. I think that contiguous memory problem can only be solved with an intermediate type like AliasLayer.

Let me know if I misunderstood anything. I know that my response is very long, but I think it's important to discuss in complete detail to get to the right solution. :)

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 15, 2017

Member

Thanks for your detailed clarification! I overlooked the contiguous memory requirement before. You are right Alias layer is more helpful than shared_ptr here. I will work on your patch first to make Alias layer complete.

Member

ShangtongZhang commented Aug 15, 2017

Thanks for your detailed clarification! I overlooked the contiguous memory requirement before. You are right Alias layer is more helpful than shared_ptr here. I will work on your patch first to make Alias layer complete.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 15, 2017

Member

Sounds good. Sorry that solutions we have available to us are not perfect. :( If you think of a way to improve on the design of AliasLayer (or if you like you can name it Alias, I think maybe that is better) that mitigates some of the issues you pointed out, please feel free.

I thought about how we could refactor to optimize non-contiguous matrices, but that would be a massive amount of work. It almost seems easier to create a new Armadillo type that is a matrix that's non-contiguous in memory (which... honestly... would be pretty horrible work...).

Member

rcurtin commented Aug 15, 2017

Sounds good. Sorry that solutions we have available to us are not perfect. :( If you think of a way to improve on the design of AliasLayer (or if you like you can name it Alias, I think maybe that is better) that mitigates some of the issues you pointed out, please feel free.

I thought about how we could refactor to optimize non-contiguous matrices, but that would be a massive amount of work. It almost seems easier to create a new Armadillo type that is a matrix that's non-contiguous in memory (which... honestly... would be pretty horrible work...).

@ShangtongZhang ShangtongZhang changed the title from Use shared_ptr for all the layers to Implementation of Alias layer Aug 16, 2017

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 18, 2017

Member

Looks good so far, do you need any help fixing the merge conflict or something else?

Member

zoq commented Aug 18, 2017

Looks good so far, do you need any help fixing the merge conflict or something else?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 18, 2017

Member

not really. I'm looking into a very strange bug (bad memory access). I spent two hours on it yesterday but didn't make any progress. If I still can't make it today I will ask for help.

Member

ShangtongZhang commented Aug 18, 2017

not really. I'm looking into a very strange bug (bad memory access). I spent two hours on it yesterday but didn't make any progress. If I still can't make it today I will ask for help.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 18, 2017

Member

Okay, sounds good, here to help.

Member

zoq commented Aug 18, 2017

Okay, sounds good, here to help.

ShangtongZhang added some commits Aug 19, 2017

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 19, 2017

Member

Could anyone help me with this bad memory access? Originally it happens in line 51 of linear_impl.hpp, when I try to access weights. So I put two checkpoints (line 40 in alias.hpp and line 84 in ffn_impl.hpp). It passed the first checkout point in the CTOR of Alias, however it failed in the second with bad memory access. I really don't understand what happened between them. By checkout point I mean boost::apply_visitor(ParametersVisitor(std::move(tmp)), network.front());. BTW I'm considering to postpone this PR and complete my A3C implementation with non-shared actor and critic first, then turn back to this PR.

Member

ShangtongZhang commented Aug 19, 2017

Could anyone help me with this bad memory access? Originally it happens in line 51 of linear_impl.hpp, when I try to access weights. So I put two checkpoints (line 40 in alias.hpp and line 84 in ffn_impl.hpp). It passed the first checkout point in the CTOR of Alias, however it failed in the second with bad memory access. I really don't understand what happened between them. By checkout point I mean boost::apply_visitor(ParametersVisitor(std::move(tmp)), network.front());. BTW I'm considering to postpone this PR and complete my A3C implementation with non-shared actor and critic first, then turn back to this PR.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 20, 2017

Member

I'll take a look at the issue and let you know what I find out, also continuing with the A3C implementation first sounds like a good idea.

Member

zoq commented Aug 20, 2017

I'll take a look at the issue and let you know what I find out, also continuing with the A3C implementation first sounds like a good idea.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 22, 2017

Member

Hi @zoq , I created a summary for my project here . It will be much appreciated if you could give some feedbacks on that. Also, A3C with none-shared networks is almost ready, the only problem is it's not working :< I have a feeling I made some mistake in the backprop of kl loss (entropy). Say p=softmax(x), I need to compute d(plogp)/dx for kl loss. Once the async-sarsa PR is merged, I can release the A3C PR, could you help me double check it at that time? It's in layers/policy_impl.hpp.

Member

ShangtongZhang commented Aug 22, 2017

Hi @zoq , I created a summary for my project here . It will be much appreciated if you could give some feedbacks on that. Also, A3C with none-shared networks is almost ready, the only problem is it's not working :< I have a feeling I made some mistake in the backprop of kl loss (entropy). Say p=softmax(x), I need to compute d(plogp)/dx for kl loss. Once the async-sarsa PR is merged, I can release the A3C PR, could you help me double check it at that time? It's in layers/policy_impl.hpp.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 22, 2017

Member

Sure, I'll take a look at each and let you know what I find out. Also, can you fix the header of the blog post (each tag data on a separate line).

Member

zoq commented Aug 22, 2017

Sure, I'll take a look at each and let you know what I find out. Also, can you fix the header of the blog post (each tag data on a separate line).

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang
Member

ShangtongZhang commented Aug 22, 2017

Sure.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 22, 2017

Member

Fixed, not sure if my understanding of each tag data on a separate line is correct :> Looks it works.

Member

ShangtongZhang commented Aug 22, 2017

Fixed, not sure if my understanding of each tag data on a separate line is correct :> Looks it works.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 22, 2017

Member

Thanks, now I can see the post on: http://www.mlpack.org/gsocblog/index.html

Member

zoq commented Aug 22, 2017

Thanks, now I can see the post on: http://www.mlpack.org/gsocblog/index.html

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 23, 2017

Member

Okay, looked over the summary, really well written, includes everything I'd like to see. Just made a minor comment here: mlpack/blog@f7eb8ef#diff-5e91a91da0ad83e4a82023bbb53280e5R18

Member

zoq commented Aug 23, 2017

Okay, looked over the summary, really well written, includes everything I'd like to see. Just made a minor comment here: mlpack/blog@f7eb8ef#diff-5e91a91da0ad83e4a82023bbb53280e5R18

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 24, 2017

Member

I took a look into this, and I'm not done looking yet but I noticed a clue. The test creates a linear layer layer then wraps an alias layer around it:

Linear<> layer(10, 10);
network.Add<Alias>(layer);

So I took a look and found that the memory location of layer and the alias's internal layer are different:

// This is from the stack frame where the test is being run.
(gdb) inspect &layer
$10 = (mlpack::ann::Linear<arma::Mat<double>, arma::Mat<double> > *) 0x7fffffffbc00
// This is from the stack frame where ParametersVisitor::LayerParameters(T* layer, P& /* output */) is called.
(gdb) inspect layer
$11 = (mlpack::ann::Linear<arma::Mat<double>, arma::Mat<double> > *) 0x7fffffffcdc0

So when the boost::variant is being created for the Alias layer, is a copy of the Linear layer happening? If so, maybe this is the cause of the issue.

Member

rcurtin commented Aug 24, 2017

I took a look into this, and I'm not done looking yet but I noticed a clue. The test creates a linear layer layer then wraps an alias layer around it:

Linear<> layer(10, 10);
network.Add<Alias>(layer);

So I took a look and found that the memory location of layer and the alias's internal layer are different:

// This is from the stack frame where the test is being run.
(gdb) inspect &layer
$10 = (mlpack::ann::Linear<arma::Mat<double>, arma::Mat<double> > *) 0x7fffffffbc00
// This is from the stack frame where ParametersVisitor::LayerParameters(T* layer, P& /* output */) is called.
(gdb) inspect layer
$11 = (mlpack::ann::Linear<arma::Mat<double>, arma::Mat<double> > *) 0x7fffffffcdc0

So when the boost::variant is being created for the Alias layer, is a copy of the Linear layer happening? If so, maybe this is the cause of the issue.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 26, 2017

Member

Thanks for your clue, I think I found it.
See the following code:

#include "iostream"
#include "vector"
#include "boost/variant.hpp"

template<typename T = double>
class Linear {
public:
	Linear() {
		std::cout << "Linear::CTOR" << std::endl;
	}
	~Linear() {
		std::cout << "Linear::DTOR" << std::endl;
	}
};

class Alias;

using LayerTypes = boost::variant<Linear<double>*, Alias*>;

class Alias {
public:
	template<typename LayerType>
  	Alias(LayerType& layer) : layer(&layer) {}
	LayerTypes layer;
};

class FFN {
public:
	template <class LayerType, class... Args>
	void Add(Args... args) { network.push_back(new LayerType(args...)); }

	template <class LayerType, class... Args>
	void Add2(Args&... args) { network.push_back(new LayerType(args...)); }

	template <class LayerType>
	void AddAlias(LayerType& layer) { network.push_back(new Alias(layer)); }
	std::vector<LayerTypes> network;
};

int main() {
	Linear<> l;
	FFN model;
	// model.Add<Alias>(l);
	// model.AddAlias(l);
	model.Add2<Alias>(l);
	return 0;
}

model.Add<Alias>(l) will print

Linear::CTOR
Linear::DTOR
Linear::DTOR

while model.AddAlias(l) and model.Add2<Alias>(l) work well:

Linear::CTOR
Linear::DTOR

Copy happens when passing parameter to Add.
I have two solutions:

  • Change Add to Add2
  • Make sure the CTOR of Alias only accept pointer, so we will need to write model.Add<Alias>(&l)

I prefer the first one. What do you think about this?

Member

ShangtongZhang commented Aug 26, 2017

Thanks for your clue, I think I found it.
See the following code:

#include "iostream"
#include "vector"
#include "boost/variant.hpp"

template<typename T = double>
class Linear {
public:
	Linear() {
		std::cout << "Linear::CTOR" << std::endl;
	}
	~Linear() {
		std::cout << "Linear::DTOR" << std::endl;
	}
};

class Alias;

using LayerTypes = boost::variant<Linear<double>*, Alias*>;

class Alias {
public:
	template<typename LayerType>
  	Alias(LayerType& layer) : layer(&layer) {}
	LayerTypes layer;
};

class FFN {
public:
	template <class LayerType, class... Args>
	void Add(Args... args) { network.push_back(new LayerType(args...)); }

	template <class LayerType, class... Args>
	void Add2(Args&... args) { network.push_back(new LayerType(args...)); }

	template <class LayerType>
	void AddAlias(LayerType& layer) { network.push_back(new Alias(layer)); }
	std::vector<LayerTypes> network;
};

int main() {
	Linear<> l;
	FFN model;
	// model.Add<Alias>(l);
	// model.AddAlias(l);
	model.Add2<Alias>(l);
	return 0;
}

model.Add<Alias>(l) will print

Linear::CTOR
Linear::DTOR
Linear::DTOR

while model.AddAlias(l) and model.Add2<Alias>(l) work well:

Linear::CTOR
Linear::DTOR

Copy happens when passing parameter to Add.
I have two solutions:

  • Change Add to Add2
  • Make sure the CTOR of Alias only accept pointer, so we will need to write model.Add<Alias>(&l)

I prefer the first one. What do you think about this?

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 28, 2017

Member

What about making Add() take a universal reference?

template<class LayerType, class... Args>
void Add(Args&&... args) { network.push_back(new LayerType(std::forward<Args>(args)...)); }

That should allow Add() to pass the reference to the layer without creating a copy.

Member

rcurtin commented Aug 28, 2017

What about making Add() take a universal reference?

template<class LayerType, class... Args>
void Add(Args&&... args) { network.push_back(new LayerType(std::forward<Args>(args)...)); }

That should allow Add() to pass the reference to the layer without creating a copy.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 28, 2017

Member

It looks good, I'll do that.

Member

ShangtongZhang commented Aug 28, 2017

It looks good, I'll do that.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 29, 2017

Member
  void Update(arma::mat& iterate,
              const double stepSize,
              const arma::mat& gradient)
  {
    meanSquaredGradient *= alpha;
    meanSquaredGradient += (1 - alpha) * (gradient % gradient);
    iterate -= stepSize * gradient / (arma::sqrt(meanSquaredGradient) +
        epsilon);
  }

This is typically how an optimizer would apply gradients. iterate is all the parameters of the model, it's contiguous, so the fake weight of the Alias layer is included, but the true weight of the aliased layer is not included in this iterate. So we need to propagate the fake weight to the true weight. But I cannot find a proper time to do this. Any good idea or I missed something?

Member

ShangtongZhang commented Aug 29, 2017

  void Update(arma::mat& iterate,
              const double stepSize,
              const arma::mat& gradient)
  {
    meanSquaredGradient *= alpha;
    meanSquaredGradient += (1 - alpha) * (gradient % gradient);
    iterate -= stepSize * gradient / (arma::sqrt(meanSquaredGradient) +
        epsilon);
  }

This is typically how an optimizer would apply gradients. iterate is all the parameters of the model, it's contiguous, so the fake weight of the Alias layer is included, but the true weight of the aliased layer is not included in this iterate. So we need to propagate the fake weight to the true weight. But I cannot find a proper time to do this. Any good idea or I missed something?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 29, 2017

Member

Right now, the only solution I can think of is to propagating the weights before forwarding the Forward function.

Member

zoq commented Aug 29, 2017

Right now, the only solution I can think of is to propagating the weights before forwarding the Forward function.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 29, 2017

Member

But sometimes we may only run one pass and may want to get the weight immediately. Doubt whether there will be better mechanism.

Member

ShangtongZhang commented Aug 29, 2017

But sometimes we may only run one pass and may want to get the weight immediately. Doubt whether there will be better mechanism.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Aug 29, 2017

Member

Maybe you could add a boolean parameter to the Alias layer to control how often it syncs. If the parameter is true, then the sync is done every time after Forward() is called. If false, then the sync is only done by a special visitor at the end of a call to Train(). What do you think?

Member

rcurtin commented Aug 29, 2017

Maybe you could add a boolean parameter to the Alias layer to control how often it syncs. If the parameter is true, then the sync is done every time after Forward() is called. If false, then the sync is only done by a special visitor at the end of a call to Train(). What do you think?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 31, 2017

Member

It looks syncing after Forward is my only choice, although it's not perfect. But I don't want to use the boolean flag to control the frequency for syncing, because the aim of the alias layer is to be transparent, so if user does want to sync two layers in some frequency, he should do it himself. For Alias layer, I just want to sync every pass. BTW, as the new term starts, I'm getting a little busy, so I may not be able to reply as quick as in summer, but I'll keep working my two open PRs until they are merged.

Member

ShangtongZhang commented Aug 31, 2017

It looks syncing after Forward is my only choice, although it's not perfect. But I don't want to use the boolean flag to control the frequency for syncing, because the aim of the alias layer is to be transparent, so if user does want to sync two layers in some frequency, he should do it himself. For Alias layer, I just want to sync every pass. BTW, as the new term starts, I'm getting a little busy, so I may not be able to reply as quick as in summer, but I'll keep working my two open PRs until they are merged.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Sep 1, 2017

Member

Ok, syncing every pass sounds fine to me. I guess there is no good alternative here. Don't worry about being busy, there is no huge hurry. :) At the same time, feel free to ask for help if there is not time to get something done. I am still working with lots of GSoC PRs and other backlogged things, but I can probably help with this when those things are done.

Member

rcurtin commented Sep 1, 2017

Ok, syncing every pass sounds fine to me. I guess there is no good alternative here. Don't worry about being busy, there is no huge hurry. :) At the same time, feel free to ask for help if there is not time to get something done. I am still working with lots of GSoC PRs and other backlogged things, but I can probably help with this when those things are done.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Sep 1, 2017

Member

Sounds fine for me too, and as Ryan already said let us know if you need help or anything else.

Member

zoq commented Sep 1, 2017

Sounds fine for me too, and as Ryan already said let us know if you need help or anything else.

@mirraaj

This comment has been minimized.

Show comment
Hide comment
@mirraaj

mirraaj Apr 12, 2018

Contributor

Dear @ShangtongZhang , @rcurtin , @zoq ,

I have been looking into this lately for implementation of A3C on mlpack. I understood that using std::shared_ptr might introduce ownership issues for the object. However I see this as a convenient method as it makes things easy for the parameters update. I do not see where this may fail.

I also liked the idea that implementing an Alias layer might be a good idea to proceed with #1091 (comment). However I do not see this thing implemented on mlpack. I do not understand why it would fail. It would be really helpful if @ShangtongZhang could provide me inputs.

@ShangtongZhang , @rcurtin , @zoq if you have any suggestions please convey me. I am trying to solve this issue in the mean time.

@zoq I would like to know the reason behind referencing #1187 here. Thanks

Contributor

mirraaj commented Apr 12, 2018

Dear @ShangtongZhang , @rcurtin , @zoq ,

I have been looking into this lately for implementation of A3C on mlpack. I understood that using std::shared_ptr might introduce ownership issues for the object. However I see this as a convenient method as it makes things easy for the parameters update. I do not see where this may fail.

I also liked the idea that implementing an Alias layer might be a good idea to proceed with #1091 (comment). However I do not see this thing implemented on mlpack. I do not understand why it would fail. It would be really helpful if @ShangtongZhang could provide me inputs.

@ShangtongZhang , @rcurtin , @zoq if you have any suggestions please convey me. I am trying to solve this issue in the mean time.

@zoq I would like to know the reason behind referencing #1187 here. Thanks

@mirraaj

This comment has been minimized.

Show comment
Hide comment
@mirraaj

mirraaj Apr 12, 2018

Contributor

@ShangtongZhang I would like to know the meaning of fake weights in #1091 (comment) . Thanks

Contributor

mirraaj commented Apr 12, 2018

@ShangtongZhang I would like to know the meaning of fake weights in #1091 (comment) . Thanks

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Apr 12, 2018

Member

@luffy1996 If we have an alias layer to some other layer, intuitively they should share same weight in same memory and they can be used in two separate models. However for some reason (e.g., the memory of the layers of a model has to be contiguous, but I'm not 100% sure now as it has been about one year since this PR) we need to store a weight in the alias layer, this weight has same shape as the weight in the aliased layer and is called fake weight. We really do not want this but technically we have to add this. So the problem is how to sync the fake weight with the real weight.

Member

ShangtongZhang commented Apr 12, 2018

@luffy1996 If we have an alias layer to some other layer, intuitively they should share same weight in same memory and they can be used in two separate models. However for some reason (e.g., the memory of the layers of a model has to be contiguous, but I'm not 100% sure now as it has been about one year since this PR) we need to store a weight in the alias layer, this weight has same shape as the weight in the aliased layer and is called fake weight. We really do not want this but technically we have to add this. So the problem is how to sync the fake weight with the real weight.

@mirraaj

This comment has been minimized.

Show comment
Hide comment
@mirraaj

mirraaj Apr 12, 2018

Contributor

@ShangtongZhang , I would also like to know where particularly your implementation of AliasLayer failed. I understand that it might be difficult for you as a year has passed. However a small lead will be very beneficial for me. Thanks

Contributor

mirraaj commented Apr 12, 2018

@ShangtongZhang , I would also like to know where particularly your implementation of AliasLayer failed. I understand that it might be difficult for you as a year has passed. However a small lead will be very beneficial for me. Thanks

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Apr 12, 2018

Member

@luffy1996 I guess I haven't completed the aforementioned sync part.
Regarding implementation of A3C, I have a PR about A3C before, unfortunately there is some bug and I cannot figure it out (I believe something is wrong in parallelization). So if you want to implement A3C I think it's better to look into that PR #934 rather than this one, after all shared layers are not necessary for now (toy tasks). Furthermore, I don't recommend to work on A3C. If you read recent deep RL papers, I think there is a clear trend that A3C is being replaced by A2C. So I think it is better to implement A2C, and A2C has less issues about parallelization compared with A3C.

Member

ShangtongZhang commented Apr 12, 2018

@luffy1996 I guess I haven't completed the aforementioned sync part.
Regarding implementation of A3C, I have a PR about A3C before, unfortunately there is some bug and I cannot figure it out (I believe something is wrong in parallelization). So if you want to implement A3C I think it's better to look into that PR #934 rather than this one, after all shared layers are not necessary for now (toy tasks). Furthermore, I don't recommend to work on A3C. If you read recent deep RL papers, I think there is a clear trend that A3C is being replaced by A2C. So I think it is better to implement A2C, and A2C has less issues about parallelization compared with A3C.

@mirraaj

This comment has been minimized.

Show comment
Hide comment
@mirraaj

mirraaj Apr 12, 2018

Contributor

Hie @ShangtongZhang , I am grateful to you for your inputs. I am planning to go with A2C and other actor critic methods for my GSOC proposal which I have submiited. I had this idea that A2C has less issues than A3C. I was just looking into your codes to get an idea how to proceed. Thanks

Contributor

mirraaj commented Apr 12, 2018

Hie @ShangtongZhang , I am grateful to you for your inputs. I am planning to go with A2C and other actor critic methods for my GSOC proposal which I have submiited. I had this idea that A2C has less issues than A3C. I was just looking into your codes to get an idea how to proceed. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment