New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async n-step q-learning and one step sarsa #1084
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you meant with "share many common code snippets", and I think it's fine to duplicate some code here, to improve the readability.
@@ -137,6 +139,22 @@ template < | |||
> | |||
class OneStepQLearningWorker; | |||
|
|||
template < | |||
typename EnvironmentType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on the template parameter?
Would there be more comments on this PR? |
* @tparam EnvironmentType The type of the reinforcement learning task. | ||
* @tparam NetworkType The type of the network model. | ||
* @tparam UpdaterType The type of the optimizer. | ||
* @tparam PolicyType The type of the behavior policy. * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the *
at the end?
using ActionType = typename EnvironmentType::Action; | ||
using TransitionType = std::tuple<StateType, ActionType, double, StateType>; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to add a method description here; something like should do:
Construct N step Q-Learning worker with the given parameters and environment.
network.Backward(actionValue, gradients); | ||
|
||
// Accumulate gradients. | ||
totalGradients += gradients; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should initialize totalGradients
with zero.
* @tparam EnvironmentType The type of the reinforcement learning task. | ||
* @tparam NetworkType The type of the network model. | ||
* @tparam UpdaterType The type of the optimizer. | ||
* @tparam PolicyType The type of the behavior policy. * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the *
at the end?
ActionType>; | ||
|
||
/** | ||
* @param updater The optimizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a method description here?
if (terminal || pendingIndex >= config.UpdateInterval()) | ||
{ | ||
// Initialize the gradient storage. | ||
arma::mat totalGradients(learningNetwork.Parameters().n_rows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should initialize totalGradients
with zeros here.
double targetActionValue = actionValue[std::get<4>(transition)]; | ||
if (terminal && i == pending.size() - 1) | ||
targetActionValue = 0; | ||
targetActionValue = std::get<2>(transition) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use an else case here? That way we can slightly simplify the expression if (terminal && i == pending.size() - 1)
is true.
config(config), | ||
deterministic(deterministic), | ||
pending(config.UpdateInterval()) | ||
{ reset(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use Upper camel casing for all method names?
Thanks for your feedback. Hope it's ready to merge now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go; let's go ahead and wait two more days before merging it in, in case anyone else has comments.
Thanks for another great contribution. |
No description provided.