New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a framework of DQN and asynchronous learning methods #934

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@ShangtongZhang
Member

ShangtongZhang commented Mar 13, 2017

DQN with experience replay and target network.
Tested in Mountain Car task.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 14, 2017

Member

I use the latest version of armadillo. But it seems some new functions don't exist in old version (based on the error log of Jenkins). Which version of armadillo is recommended?
Thanks,

Member

ShangtongZhang commented Mar 14, 2017

I use the latest version of armadillo. But it seems some new functions don't exist in old version (based on the error log of Jenkins). Which version of armadillo is recommended?
Thanks,

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Mar 15, 2017

Member

Hi Shangtong,

The minimum required version of Armadillo for mlpack is currently Armadillo 4.200.0. I think that is the version that Travis builds with. You can take a look at this page to get an idea of what has changed over different versions: http://arma.sourceforge.net/docs.html#api_additions

Member

rcurtin commented Mar 15, 2017

Hi Shangtong,

The minimum required version of Armadillo for mlpack is currently Armadillo 4.200.0. I think that is the version that Travis builds with. You can take a look at this page to get an idea of what has changed over different versions: http://arma.sourceforge.net/docs.html#api_additions

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 15, 2017

Member

Hi Ryan,
Thanks for your response. I think I should then install armadillo 4.2 and try to make it compiled in Travis CI.

Member

ShangtongZhang commented Mar 15, 2017

Hi Ryan,
Thanks for your response. I think I should then install armadillo 4.2 and try to make it compiled in Travis CI.

@mlpack-jenkins

This comment has been minimized.

Show comment
Hide comment
@mlpack-jenkins

mlpack-jenkins Mar 26, 2017

Can one of the admins verify this patch?

mlpack-jenkins commented Mar 26, 2017

Can one of the admins verify this patch?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Mar 26, 2017

Member

Hello @ShangtongZhang just noticed that implemented the CartPole task, do you think we could use the code Bang has written last year? https://github.com/BangLiu/mlpack/blob/ne/src/mlpack/methods/ne/tasks.hpp If you think that's a good idea, I can merge it in, so that we can test it. Let me know what you think.

Member

zoq commented Mar 26, 2017

Hello @ShangtongZhang just noticed that implemented the CartPole task, do you think we could use the code Bang has written last year? https://github.com/BangLiu/mlpack/blob/ne/src/mlpack/methods/ne/tasks.hpp If you think that's a good idea, I can merge it in, so that we can test it. Let me know what you think.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 26, 2017

Member

Hi @zoq , maybe not. Although the dynamics are same, but the interfaces are quite different. And his implementation seems to be coupled with NE. Also I think the code style and readability of that implementation is poor...

Member

ShangtongZhang commented Mar 26, 2017

Hi @zoq , maybe not. Although the dynamics are same, but the interfaces are quite different. And his implementation seems to be coupled with NE. Also I think the code style and readability of that implementation is poor...

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 26, 2017

Member

Hi @zoq, would you like to apply this commit first? Currently if I train FFN with SGD it will lead to compile error as SGD has two template arguments.

Member

ShangtongZhang commented Mar 26, 2017

Hi @zoq, would you like to apply this commit first? Currently if I train FFN with SGD it will lead to compile error as SGD has two template arguments.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Mar 26, 2017

Member

Yeah, I was going to fix this, can you open a separate PR for the change or should I cherry pick the commit?

Member

zoq commented Mar 26, 2017

Yeah, I was going to fix this, can you open a separate PR for the change or should I cherry pick the commit?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 26, 2017

Member

I think cherry pick is ok

Member

ShangtongZhang commented Mar 26, 2017

I think cherry pick is ok

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 26, 2017

Member

Oh.. My fix works for gcc and clang, but fails with msvc.. So I just pushed a new fix.
https://godbolt.org/g/cqIuhD
http://webcompiler.cloudapp.net/ (Need to copy the code manually)

Member

ShangtongZhang commented Mar 26, 2017

Oh.. My fix works for gcc and clang, but fails with msvc.. So I just pushed a new fix.
https://godbolt.org/g/cqIuhD
http://webcompiler.cloudapp.net/ (Need to copy the code manually)

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 26, 2017

Member

(Double) DQN works well in my local environment with MountainCar and CartPole. I don't know why VanillaNetworkTest always fails in travis-ci -- it also succeeds in my local environment.

Member

ShangtongZhang commented Mar 26, 2017

(Double) DQN works well in my local environment with MountainCar and CartPole. I don't know why VanillaNetworkTest always fails in travis-ci -- it also succeeds in my local environment.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Mar 27, 2017

Member

Maybe Travis found an efficient method to intercept the test process, just to run each test with a bad seed selected from a project depended pool of bad seeds. I guess in this case they have to update the pool, since I changed the code: ae93d74

Member

zoq commented Mar 27, 2017

Maybe Travis found an efficient method to intercept the test process, just to run each test with a bad seed selected from a project depended pool of bad seeds. I guess in this case they have to update the pool, since I changed the code: ae93d74

@ShangtongZhang ShangtongZhang changed the title from Implement a framework of Q-Learning to Implement a framework of DQN and asynchronous learning methods Apr 2, 2017

@rcurtin

I did a quick review, not too in-depth. It seems like there are not very many comments throughout the code; could you add comments throughout describing what each file, class, method, and parts of methods do? Many people who read the code may not be familiar with the techniques being implemented, so it is helpful to add many comments throughout so that they can follow it if they have a basic knowledge of machine learning. I can do a more in-depth review then, after comments are added, but in the end @zoq will have the best insight about this PR.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Apr 22, 2017

Member

Sure, no hurry! Good luck on your finals. :)

Member

rcurtin commented Apr 22, 2017

Sure, no hurry! Good luck on your finals. :)

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Apr 24, 2017

Member

I just added more detailed comments for all the code, looking forward to any feedback.

Member

ShangtongZhang commented Apr 24, 2017

I just added more detailed comments for all the code, looking forward to any feedback.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Apr 24, 2017

Member

Hello @ShangtongZhang do you mind to split this PR into multiple parts? One PR for the environments and another PR for the DQN? I think it's easier to catch potential issues and it could be beneficial for others if e.g. they could work with the environments. let me know what you think and then we can go from there.

Member

zoq commented Apr 24, 2017

Hello @ShangtongZhang do you mind to split this PR into multiple parts? One PR for the environments and another PR for the DQN? I think it's easier to catch potential issues and it could be beneficial for others if e.g. they could work with the environments. let me know what you think and then we can go from there.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Apr 24, 2017

Member

Hi @zoq , I think it's a good idea. I can split this PR into Environment, DQN and A3C. As for environment part, do you think I should still put it in reinforcement learning directory, or just create a new directory? I prefer creating a new directory directly under methods

Member

ShangtongZhang commented Apr 24, 2017

Hi @zoq , I think it's a good idea. I can split this PR into Environment, DQN and A3C. As for environment part, do you think I should still put it in reinforcement learning directory, or just create a new directory? I prefer creating a new directory directly under methods

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Apr 24, 2017

Member

I don't see the environments as an independent algorithms/method more as a tool, so I would create the directory under the reinforcement learning director, at least for now.

Member

zoq commented Apr 24, 2017

I don't see the environments as an independent algorithms/method more as a tool, so I would create the directory under the reinforcement learning director, at least for now.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Apr 24, 2017

Member

Yeah it looks wired to treat it as a method. But in the future neuron evolutionary methods also need such kind of environment. So maybe it's better to have an independent directory.

Member

ShangtongZhang commented Apr 24, 2017

Yeah it looks wired to treat it as a method. But in the future neuron evolutionary methods also need such kind of environment. So maybe it's better to have an independent directory.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Apr 24, 2017

Member

You are absolutely right, but I think there is no problem to include the files from the rl directory, in fact we already do this e.g. we use different svd method inside the pca class or different kernels inside the kernel pca, etc.

Member

zoq commented Apr 24, 2017

You are absolutely right, but I think there is no problem to include the files from the rl directory, in fact we already do this e.g. we use different svd method inside the pca class or different kernels inside the kernel pca, etc.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang May 9, 2017

Member

This PR will only serve for a track of my progress. I will break it down to small incremental PRs to make it easy to review. BTW, does anyone know how to make Log::Info work in test cases? My current cmake config is -D DEBUG=ON -D PROFILE=OFF -D FORCE_CXX11=ON -D TEST_VERBOSE=ON and environment is CXX=/usr/local/bin/clang++-omp;CC=/usr/local/bin/clang-omp;CXXFLAGS=-std=c++11. But it doesn't output anything to console. Did I miss something?

Member

ShangtongZhang commented May 9, 2017

This PR will only serve for a track of my progress. I will break it down to small incremental PRs to make it easy to review. BTW, does anyone know how to make Log::Info work in test cases? My current cmake config is -D DEBUG=ON -D PROFILE=OFF -D FORCE_CXX11=ON -D TEST_VERBOSE=ON and environment is CXX=/usr/local/bin/clang++-omp;CC=/usr/local/bin/clang-omp;CXXFLAGS=-std=c++11. But it doesn't output anything to console. Did I miss something?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 9, 2017

Member

Thanks for breaking it down into smaller parts. Regarding Log::Info, you can set mlpack::Log::Info.ignoreInput = false; in the test case or Log::Debug should work if you build with DEBUG=ON.

Member

zoq commented May 9, 2017

Thanks for breaking it down into smaller parts. Regarding Log::Info, you can set mlpack::Log::Info.ignoreInput = false; in the test case or Log::Debug should work if you build with DEBUG=ON.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang May 17, 2017

Member

Hi @zoq , would you like to look into this. Currently if you add std::cout << gaussianModel.Parameters(); at init_rules_test.cpp before line 284. You will find the gaussianModel isn't initialized properly -- only the last 12 parameters are initialized, all others are zero. And it happens only for gaussian init (i.e. random init and ovis init work well). I did a tentative fix in the above commit. It works. But I don't know why it works. I don't understand how an assignment operator could affect the memory -- I assume assignment operator should only copy values.

Member

ShangtongZhang commented May 17, 2017

Hi @zoq , would you like to look into this. Currently if you add std::cout << gaussianModel.Parameters(); at init_rules_test.cpp before line 284. You will find the gaussianModel isn't initialized properly -- only the last 12 parameters are initialized, all others are zero. And it happens only for gaussian init (i.e. random init and ovis init work well). I did a tentative fix in the above commit. It works. But I don't know why it works. I don't understand how an assignment operator could affect the memory -- I assume assignment operator should only copy values.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 17, 2017

Member

Nice catch, that's a really strange issue, not only that it looks like it's allocating new memory; on top of that, it works for the last layer. I have to take a deeper look into the issue. But for now, do you like to open a PR for the issue or should I fix it?

Member

zoq commented May 17, 2017

Nice catch, that's a really strange issue, not only that it looks like it's allocating new memory; on top of that, it works for the last layer. I have to take a deeper look into the issue. But for now, do you like to open a PR for the issue or should I fix it?

@mlpack-jenkins

This comment has been minimized.

Show comment
Hide comment
@mlpack-jenkins

mlpack-jenkins May 24, 2017

Can one of the admins verify this patch?

mlpack-jenkins commented May 24, 2017

Can one of the admins verify this patch?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 10, 2017

Member

I'm back ^>^
This week I'll work on implementing async one-step q-learning.
I have two questions now:

  • Can this PR #1034 be merged? My new work will depend on that PR, if that's not merged, I cannot make new PR

  • Can I use inheritance in my implementation (without virtual function)?

Member

ShangtongZhang commented Jul 10, 2017

I'm back ^>^
This week I'll work on implementing async one-step q-learning.
I have two questions now:

  • Can this PR #1034 be merged? My new work will depend on that PR, if that's not merged, I cannot make new PR

  • Can I use inheritance in my implementation (without virtual function)?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 10, 2017

Member

Welcome back, how was it?

Member

zoq commented Jul 10, 2017

Welcome back, how was it?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 10, 2017

Member

@zoq Thanks for merging that PR and the summer school was pretty good!
I have several classes, they share same member variables and same constructor. The only difference is a member function called Episode. Maybe I can use tag dispatch

class WorkerTypeA {};

class WorkerTypeB {};

template <typename WorkerType, typename T1, typename T2>
class Worker {
public:
	int data;
	Worker(int i) { data = i; }

	void foo() { foo(WorkerType()); }

	void foo(WorkerTypeA tag) {}
	void foo(WorkerTypeB tag) {}
};

int main() {
	Worker<WorkerTypeA, int, int> worker(1);
	worker.foo();
	return 0;
}
Member

ShangtongZhang commented Jul 10, 2017

@zoq Thanks for merging that PR and the summer school was pretty good!
I have several classes, they share same member variables and same constructor. The only difference is a member function called Episode. Maybe I can use tag dispatch

class WorkerTypeA {};

class WorkerTypeB {};

template <typename WorkerType, typename T1, typename T2>
class Worker {
public:
	int data;
	Worker(int i) { data = i; }

	void foo() { foo(WorkerType()); }

	void foo(WorkerTypeA tag) {}
	void foo(WorkerTypeB tag) {}
};

int main() {
	Worker<WorkerTypeA, int, int> worker(1);
	worker.foo();
	return 0;
}
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 10, 2017

Member

I guess depending on the use case, yes we could use the tag dispatch design pattern or maybe we could use the policy design pattern: http://www.drdobbs.com/policy-based-design-in-the-real-world/184401861

That way, we could add more classes, without modifying the main code, what do you think?

Member

zoq commented Jul 10, 2017

I guess depending on the use case, yes we could use the tag dispatch design pattern or maybe we could use the policy design pattern: http://www.drdobbs.com/policy-based-design-in-the-real-world/184401861

That way, we could add more classes, without modifying the main code, what do you think?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 11, 2017

Member

I thought something like:

class OneStepQLearningWorker
{
  static double Episode(...)
  {
  }
}

template<typename WorkerType>
class AsyncLearning
{
  void Episode(...)
  {
    WorkerType::Episode(...);
  }

  
}

in this case you would have to implement the logic inside the Worker, or at least part of the logic.

Member

zoq commented Jul 11, 2017

I thought something like:

class OneStepQLearningWorker
{
  static double Episode(...)
  {
  }
}

template<typename WorkerType>
class AsyncLearning
{
  void Episode(...)
  {
    WorkerType::Episode(...);
  }

  
}

in this case you would have to implement the logic inside the Worker, or at least part of the logic.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 11, 2017

Member

The problem is WorkerType::Episode needs to have access to many member variables of AsyncLearning, if I pass them as function parameters, the parameter list will be very long (about 10 or more). If I pass those parameters to CTOR of worker directly rather than CTOR of AsyncLearning, then those workers will share a common CTOR, which will also be very long.

Member

ShangtongZhang commented Jul 11, 2017

The problem is WorkerType::Episode needs to have access to many member variables of AsyncLearning, if I pass them as function parameters, the parameter list will be very long (about 10 or more). If I pass those parameters to CTOR of worker directly rather than CTOR of AsyncLearning, then those workers will share a common CTOR, which will also be very long.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 11, 2017

Member

I see I guess another solution would be to write a utility class e.g. AsyncLearningInfo that holds the parameter and then you could pass an instantiation of AsyncLearningInfo. I was also thinking about template specialization but since the class also takes a bunch of other template parameters that's not really an option. What do you think?

Member

zoq commented Jul 11, 2017

I see I guess another solution would be to write a utility class e.g. AsyncLearningInfo that holds the parameter and then you could pass an instantiation of AsyncLearningInfo. I was also thinking about template specialization but since the class also takes a bunch of other template parameters that's not really an option. What do you think?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 11, 2017

Member

A config class looks good, I think I can do that.

Member

ShangtongZhang commented Jul 11, 2017

A config class looks good, I think I can do that.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 12, 2017

Member

I think the compiler error (lack of arma::clamp) can be solved by #1050. BTW, how do you make your irc always online? Is there a free solution?

Member

ShangtongZhang commented Jul 12, 2017

I think the compiler error (lack of arma::clamp) can be solved by #1050. BTW, how do you make your irc always online? Is there a free solution?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 12, 2017

Member

ah, right, arma::clamp was introduced in 4.400.

I'm not aware of a free solution, I use irssi (https://irssi.org/) + tmux on a remote machine, which is always online.

Member

zoq commented Jul 12, 2017

ah, right, arma::clamp was introduced in 4.400.

I'm not aware of a free solution, I use irssi (https://irssi.org/) + tmux on a remote machine, which is always online.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 16, 2017

Member

I run the test with RandomSeed(std::time(NULL)); in my mac 20 times, it all passed. The problem is the algorithm is multi-thread (18 threads in total), and isn't synchronized. So the performance of the physical machine will heavily influence the test result. I guess that's why it gets timeout in Travis CI. Does anyone have any idea about how to deal with this?

Member

ShangtongZhang commented Jul 16, 2017

I run the test with RandomSeed(std::time(NULL)); in my mac 20 times, it all passed. The problem is the algorithm is multi-thread (18 threads in total), and isn't synchronized. So the performance of the physical machine will heavily influence the test result. I guess that's why it gets timeout in Travis CI. Does anyone have any idea about how to deal with this?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 16, 2017

Member

We should add the test to the list of long running parallel tests in https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/CMakeLists.txt

That should speed things up a little bit, but not sure it's enough, can we decrease the amount of worker?

Member

zoq commented Jul 16, 2017

We should add the test to the list of long running parallel tests in https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/CMakeLists.txt

That should speed things up a little bit, but not sure it's enough, can we decrease the amount of worker?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 16, 2017

Member

I can have a try to see what's the minimum amount of the workers.

Member

ShangtongZhang commented Jul 16, 2017

I can have a try to see what's the minimum amount of the workers.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 17, 2017

Member

Sounds good, let me know if you need any help.

Member

zoq commented Jul 17, 2017

Sounds good, let me know if you need any help.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 18, 2017

Member

I think once #1056 is merged, this PR will be ready for review -- it's difficult to break this PR down further. The failed AppVeyor is due to the failure of downloading boost 1.60

Member

ShangtongZhang commented Jul 18, 2017

I think once #1056 is merged, this PR will be ready for review -- it's difficult to break this PR down further. The failed AppVeyor is due to the failure of downloading boost 1.60

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 18, 2017

Member

About the AppVeyor build: https://appveyor.statuspage.io/incidents/m2vdvw39kdk8, I'll restart the build once the issue is solved.

Member

zoq commented Jul 18, 2017

About the AppVeyor build: https://appveyor.statuspage.io/incidents/m2vdvw39kdk8, I'll restart the build once the issue is solved.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 18, 2017

Member

Also, thanks for tunning the AsyncLearningTest test. I'm not sure it's feasible to run the test on a regular basis, it looks like it takes about 10 minutes, which is a bummer, since I can't see a good way to test the code without it. But if it takes about 10 minutes in release mode it probably takes a lot more time in debug mode.

Member

zoq commented Jul 18, 2017

Also, thanks for tunning the AsyncLearningTest test. I'm not sure it's feasible to run the test on a regular basis, it looks like it takes about 10 minutes, which is a bummer, since I can't see a good way to test the code without it. But if it takes about 10 minutes in release mode it probably takes a lot more time in debug mode.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Jul 18, 2017

Member

oops, I didn't realize it takes so much time in the server. In my mac, it only takes 2 seconds in debug mode. So it may be hard to tune it further. And I believe in the future we will have at least 3 more test like this. It will be absolutely unacceptable to run them in 40 minutes.

Member

ShangtongZhang commented Jul 18, 2017

oops, I didn't realize it takes so much time in the server. In my mac, it only takes 2 seconds in debug mode. So it may be hard to tune it further. And I believe in the future we will have at least 3 more test like this. It will be absolutely unacceptable to run them in 40 minutes.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 10, 2017

Member

I meet a problem about sharing layers between two FFN objects. Say we have two FFN instances, i.e. actor and critic, I want to share some layers, so I have following code:

    for (auto& layer : sharedLayers)
    {
      actor.Add(layer);
      critic.Add(layer);
    }
    actor.Add<Linear<>>(featureSize, actionSize);
    actor.Add<LogSoftMax<>>();
    critic.Add<Linear<>>(featureSize, 1);

The problem happens when actor and critic deconstructs

FFN<OutputLayerType, InitializationRuleType>::~FFN()
{
  std::for_each(network.begin(), network.end(),
      boost::apply_visitor(deleteVisitor));
}

The shared layers will be deleted twice.
The solution I can come up with is to use shared_ptr everywhere, i.e.

using LayerTypes = boost::variant<std::shared_ptr<Add<arma::mat, arma::mat>>, ... >

with this approach we can also remove the DTOR of FFN.
But I assume it will be a huge change. Should I do this? Or Did I just miss some better solution?
Looking forward to any feedback @rcurtin @zoq

Member

ShangtongZhang commented Aug 10, 2017

I meet a problem about sharing layers between two FFN objects. Say we have two FFN instances, i.e. actor and critic, I want to share some layers, so I have following code:

    for (auto& layer : sharedLayers)
    {
      actor.Add(layer);
      critic.Add(layer);
    }
    actor.Add<Linear<>>(featureSize, actionSize);
    actor.Add<LogSoftMax<>>();
    critic.Add<Linear<>>(featureSize, 1);

The problem happens when actor and critic deconstructs

FFN<OutputLayerType, InitializationRuleType>::~FFN()
{
  std::for_each(network.begin(), network.end(),
      boost::apply_visitor(deleteVisitor));
}

The shared layers will be deleted twice.
The solution I can come up with is to use shared_ptr everywhere, i.e.

using LayerTypes = boost::variant<std::shared_ptr<Add<arma::mat, arma::mat>>, ... >

with this approach we can also remove the DTOR of FFN.
But I assume it will be a huge change. Should I do this? Or Did I just miss some better solution?
Looking forward to any feedback @rcurtin @zoq

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 10, 2017

Member

I see, the only solution I can see right now is to use shared_ptr as you suggested.

Member

zoq commented Aug 10, 2017

I see, the only solution I can see right now is to use shared_ptr as you suggested.

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Aug 10, 2017

Member

Ok. I think I can make a separate PR to apply shared_ptr.

Member

ShangtongZhang commented Aug 10, 2017

Ok. I think I can make a separate PR to apply shared_ptr.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 10, 2017

Member

Great, thanks!

Member

zoq commented Aug 10, 2017

Great, thanks!

ShangtongZhang added some commits Aug 22, 2017

A3C
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Mar 3, 2018

Member

This PR will only serve for a track of my progress. I will break it down to small incremental PRs to make it easy to review.

@ShangtongZhang should we close this for now to avoid any confusions?

Member

zoq commented Mar 3, 2018

This PR will only serve for a track of my progress. I will break it down to small incremental PRs to make it easy to review.

@ShangtongZhang should we close this for now to avoid any confusions?

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang Mar 3, 2018

Member

Sure I think so.

Member

ShangtongZhang commented Mar 3, 2018

Sure I think so.

@sshkhr sshkhr referenced this pull request Jun 14, 2018

Open

policy gradients #1432

0 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment