Async n-step q-learning and one step sarsa #1084

ShangtongZhang · 2017-08-08T15:28:36Z

No description provided.

zoq

I see what you meant with "share many common code snippets", and I think it's fine to duplicate some code here, to improve the readability.

zoq · 2017-08-09T19:01:01Z

src/mlpack/methods/reinforcement_learning/async_learning.hpp

@@ -137,6 +139,22 @@ template <
 >
 class OneStepQLearningWorker;

+template <
+  typename EnvironmentType,


Can you comment on the template parameter?

ShangtongZhang · 2017-08-18T17:54:02Z

Would there be more comments on this PR?

zoq · 2017-08-18T19:34:53Z

src/mlpack/methods/reinforcement_learning/worker/n_step_q_learning_worker.hpp

+ * @tparam EnvironmentType The type of the reinforcement learning task.
+ * @tparam NetworkType The type of the network model.
+ * @tparam UpdaterType The type of the optimizer.
+ * @tparam PolicyType The type of the behavior policy. *


Can you remove the * at the end?

zoq · 2017-08-18T19:38:03Z

src/mlpack/methods/reinforcement_learning/worker/n_step_q_learning_worker.hpp

+  using ActionType = typename EnvironmentType::Action;
+  using TransitionType = std::tuple<StateType, ActionType, double, StateType>;
+
+  /**


Do you mind to add a method description here; something like should do:

Construct N step Q-Learning worker with the given parameters and environment.

zoq · 2017-08-18T19:44:27Z

src/mlpack/methods/reinforcement_learning/worker/n_step_q_learning_worker.hpp

+        network.Backward(actionValue, gradients);
+
+        // Accumulate gradients.
+        totalGradients += gradients;


We should initialize totalGradients with zero.

zoq · 2017-08-18T19:45:52Z

src/mlpack/methods/reinforcement_learning/worker/one_step_sarsa_worker.hpp

+ * @tparam EnvironmentType The type of the reinforcement learning task.
+ * @tparam NetworkType The type of the network model.
+ * @tparam UpdaterType The type of the optimizer.
+ * @tparam PolicyType The type of the behavior policy. *


Can you remove the * at the end?

zoq · 2017-08-18T19:58:13Z

src/mlpack/methods/reinforcement_learning/worker/one_step_sarsa_worker.hpp

+      ActionType>;
+
+  /**
+   * @param updater The optimizer.


Can you add a method description here?

zoq · 2017-08-18T19:59:04Z

src/mlpack/methods/reinforcement_learning/worker/one_step_sarsa_worker.hpp

+    if (terminal || pendingIndex >= config.UpdateInterval())
+    {
+      // Initialize the gradient storage.
+      arma::mat totalGradients(learningNetwork.Parameters().n_rows,


We should initialize totalGradients with zeros here.

zoq · 2017-08-18T20:03:17Z

src/mlpack/methods/reinforcement_learning/worker/one_step_sarsa_worker.hpp

+        double targetActionValue = actionValue[std::get<4>(transition)];
+        if (terminal && i == pending.size() - 1)
+          targetActionValue = 0;
+        targetActionValue = std::get<2>(transition) +


Should we use an else case here? That way we can slightly simplify the expression if (terminal && i == pending.size() - 1) is true.

zoq · 2017-08-18T20:04:04Z

src/mlpack/methods/reinforcement_learning/worker/n_step_q_learning_worker.hpp

+      config(config),
+      deterministic(deterministic),
+      pending(config.UpdateInterval())
+  { reset(); }


Can you use Upper camel casing for all method names?

ShangtongZhang · 2017-08-19T02:45:39Z

Thanks for your feedback. Hope it's ready to merge now.

zoq

I think this is ready to go; let's go ahead and wait two more days before merging it in, in case anyone else has comments.

zoq · 2017-08-23T12:59:14Z

Thanks for another great contribution.

Async n-step q-learning and one step sarsa

5ec51cd

zoq reviewed Aug 9, 2017

View reviewed changes

ShangtongZhang added 2 commits August 9, 2017 20:29

Add comments

4e6e7ee

Minor style fixes

ea77ed2

zoq reviewed Aug 18, 2017

View reviewed changes

Minor fixes

89d2043

zoq approved these changes Aug 20, 2017

View reviewed changes

zoq merged commit c8110ab into mlpack:master Aug 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async n-step q-learning and one step sarsa #1084

Async n-step q-learning and one step sarsa #1084

ShangtongZhang commented Aug 8, 2017

zoq left a comment

zoq Aug 9, 2017

ShangtongZhang commented Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

zoq Aug 18, 2017

ShangtongZhang commented Aug 19, 2017

zoq left a comment

zoq commented Aug 23, 2017

Async n-step q-learning and one step sarsa #1084

Async n-step q-learning and one step sarsa #1084

Conversation

ShangtongZhang commented Aug 8, 2017

zoq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShangtongZhang commented Aug 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShangtongZhang commented Aug 19, 2017

zoq left a comment

Choose a reason for hiding this comment

zoq commented Aug 23, 2017