New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random replay for DQN #1001

Merged
merged 6 commits into from May 25, 2017

Conversation

Projects
None yet
3 participants
@ShangtongZhang
Member

ShangtongZhang commented May 18, 2017

Random replay with a simple unit test

@zoq

Thanks for the great PR, just pointed out some minor issues also thanks for breaking the code into smaller parts makes the review way easier.

@@ -0,0 +1,142 @@
/**
* @file random_replay.hpp

This comment has been minimized.

@zoq

zoq May 19, 2017

Member

I would put this file in the rl folder and not in a separate folder. Unless you plan to implement other replay techniques in the near future, I guess it makes it cleaner to put the files into the same root folder e.g. the header inclusion is probably cleaner. Let me know what you think.

@zoq

zoq May 19, 2017

Member

I would put this file in the rl folder and not in a separate folder. Unless you plan to implement other replay techniques in the near future, I guess it makes it cleaner to put the files into the same root folder e.g. the header inclusion is probably cleaner. Let me know what you think.

This comment has been minimized.

@ShangtongZhang

ShangtongZhang May 19, 2017

Member

Yeah there is also prioritized replay but I'm not going to implement it in the near future. But I sill think a separate folder is better because once we add a new replay method, a separate folder can provide backward compatibility automatically. Otherwise user may have to modify their #include clause.

@ShangtongZhang

ShangtongZhang May 19, 2017

Member

Yeah there is also prioritized replay but I'm not going to implement it in the near future. But I sill think a separate folder is better because once we add a new replay method, a separate folder can provide backward compatibility automatically. Otherwise user may have to modify their #include clause.

This comment has been minimized.

@zoq

zoq May 20, 2017

Member

I guess if we would use the root folder, we would also put other replay methods in the root folder. But if you prefer the extra folder, that's fine.

@zoq

zoq May 20, 2017

Member

I guess if we would use the root folder, we would also put other replay methods in the root folder. But if you prefer the extra folder, that's fine.

/**
* Implementation of random experience replay.
*

This comment has been minimized.

@zoq

zoq May 19, 2017

Member

I think it would be a good idea to say a little bit more about the class and maybe it#s a good idea to cite the paper: Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993. using bibtex as done in https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/block_krylov_svd/randomized_block_krylov_svd.hpp

@zoq

zoq May 19, 2017

Member

I think it would be a good idea to say a little bit more about the class and maybe it#s a good idea to cite the paper: Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993. using bibtex as done in https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/block_krylov_svd/randomized_block_krylov_svd.hpp

Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
Show outdated Hide outdated src/mlpack/tests/rl_components_test.cpp

ShangtongZhang added some commits May 21, 2017

@ShangtongZhang

This comment has been minimized.

Show comment
Hide comment
@ShangtongZhang

ShangtongZhang May 21, 2017

Member

I ended up with preallocated memory. Today when I play Atari game with DQN in PyTorch I realized even it's a very large memory buffer, it will still be filled up easily. So there is no need for dynamic allocation.

Member

ShangtongZhang commented May 21, 2017

I ended up with preallocated memory. Today when I play Atari game with DQN in PyTorch I realized even it's a very large memory buffer, it will still be filled up easily. So there is no need for dynamic allocation.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 21, 2017

Member

Great that you tested the approach. I think even if someone overestimates the capacity parameter the performance is still comparable with the dynamic approach.

Member

zoq commented May 21, 2017

Great that you tested the approach. I think even if someone overestimates the capacity parameter the performance is still comparable with the dynamic approach.

Show outdated Hide outdated src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp
nextStates.col(position) = nextState.Encode();
isTerminal(position) = isEnd;
position++;
if (position == capacity)

This comment has been minimized.

@zoq

zoq May 23, 2017

Member

Shouldn't we use if ((position % capacity) == 0)here, we only reset position once, so we could run into a index out of bounds error for the second run, maybe I missed something?

@zoq

zoq May 23, 2017

Member

Shouldn't we use if ((position % capacity) == 0)here, we only reset position once, so we could run into a index out of bounds error for the second run, maybe I missed something?

if (position == capacity)
{
full = true;
position = 0;

This comment has been minimized.

@ShangtongZhang

ShangtongZhang May 23, 2017

Member

Every time position reaches capacity, it will be rest to 0

@ShangtongZhang

ShangtongZhang May 23, 2017

Member

Every time position reaches capacity, it will be rest to 0

This comment has been minimized.

@zoq

zoq May 23, 2017

Member

You are absolutely right, somehow I thought you check against full. Thanks for the clarification.

@zoq

zoq May 23, 2017

Member

You are absolutely right, somehow I thought you check against full. Thanks for the clarification.

@mlpack-jenkins

This comment has been minimized.

Show comment
Hide comment
@mlpack-jenkins

mlpack-jenkins May 24, 2017

Can one of the admins verify this patch?

mlpack-jenkins commented May 24, 2017

Can one of the admins verify this patch?

@zoq zoq merged commit 167e43c into mlpack:master May 25, 2017

3 checks passed

Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq May 25, 2017

Member

Thanks for another great contribution that bring us one step closer.

Member

zoq commented May 25, 2017

Thanks for another great contribution that bring us one step closer.

@zoq zoq added the T: enhancement label May 25, 2017

@ShangtongZhang ShangtongZhang deleted the ShangtongZhang:rl-replay branch May 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment