Addition of q_networks #2317

nishantkr18 · 2020-03-19T16:07:59Z

Hi! I would like to suggest some changes in the structure of DQN structure. Instead of passing FFN model directly into agent, we can pass it to a QNetwork, and that in turn will be passed to the agent. This would have two advantages:

It will be easier to add duelingDQN; just by adding two extra layers for Value and Advantage functions and making Predict, Forward, Backward and Update functions. We could use the same q_learning for vanillaDQN, DoubleDQN and DuelingDQN.
We may pass a default FFN as model in agent for DQN.
The tests would then change from

  FFN<MeanSquaredError<>, GaussianInitialization> model(MeanSquaredError<>(),
      GaussianInitialization(0, 0.001));
  model.Add<Linear<>>(4, 128);
  model.Add<ReLULayer<>>();
  model.Add<Linear<>>(128, 128);
  model.Add<ReLULayer<>>();
  model.Add<Linear<>>(128, 2);

  // Set up the policy and replay method.
  GreedyPolicy<CartPole> policy(1.0, 1000, 0.1, 0.99);
  RandomReplay<CartPole> replayMethod(10, 10000);

  TrainingConfig config;
  config.StepSize() = 0.01;
  config.Discount() = 0.9;
  config.TargetNetworkSyncInterval() = 100;
  config.ExplorationSteps() = 100;
  config.DoubleQLearning() = false;
  config.StepLimit() = 200;

  // Set up DQN agent.
  QLearning<CartPole, decltype(model), AdamUpdate, decltype(policy)>
      agent(std::move(config), std::move(model), std::move(policy),
      std::move(replayMethod));

to

    // Set up the policy and replay method.
    GreedyPolicy<CartPole> policy(1.0, 1000, 0.1, 0.99);
    RandomReplay<CartPole> replayMethod(10, 10000);
    QNetwork<> network();

    TrainingConfig config;
    config.StepSize() = 0.01;
    config.Discount() = 0.9;
    config.TargetNetworkSyncInterval() = 100;
    config.ExplorationSteps() = 100;
    config.DoubleQLearning() = false;
    config.StepLimit() = 200;

    // Set up DQN agent.
    QLearning<CartPole, decltype(network), AdamUpdate, decltype(policy)>
        agent(std::move(config), std::move(network), std::move(policy),
              std::move(replayMethod));

I have added the Qnetwork file without proper documentation and completion as of now, just to get some reviews and suggestions..

src/mlpack/methods/reinforcement_learning/q_network.hpp

bisakhmondal

Hi @nishantkr18, just some minor changes. Nice work 👍

src/mlpack/methods/reinforcement_learning/q_network.hpp

kartikdutt18 · 2020-03-21T13:31:57Z

Hey @nishantkr18, The macOS failure seems unrelated. I think if you rebased it should be fixed. Thanks a lot. 👍

…gDQN

This reverts commit 5786015.

nishantkr18 · 2020-03-29T05:26:41Z

I have created a new folder for q_networks which currently contains the simple DQN by the name vanillaDQN. I'll add the other types of Q networks in a separate PR. Would that be fine?

Please have a look.

zoq

This looks nice!

src/mlpack/methods/reinforcement_learning/q_networks/vanilla_dqn.hpp

zoq · 2020-04-04T19:54:37Z

src/mlpack/tests/reward_clipping_test.cpp

-    model.Add<Linear<>>(64, 32);
-    model.Add<ReLULayer<>>();
-    model.Add<Linear<>>(32, 3);
+    VanillaDQN<> model(4, 64, 32, 3);


Do you think we should show in a single test, it's also possible to manually specifiy the network, completely without using VanillaDQN?

Hmm. I think u mean this:

FFN<MeanSquaredError<>, GaussianInitialization> network(MeanSquaredError<>(), GaussianInitialization(0, 0.001)); network.Add<Linear<>>(6, 256); network.Add<ReLULayer<>>(); network.Add<Linear<>>(256, 3); // Create custom network type VanillaDQN<decltype(network)> model(std::move(network));

This is taken from the test for DoublePoleCartWithDQN, showing how we can manually specify the network. But I could add a separate test for CartPole with DQN as well, showing how to manually specify.?

I'm afraid we wont be able to directly pass the network to QLearning(ie without using VanillaDQN), as I've used the method ResetParametersIfEmpty() in QLearning which is not present in FNN. Would that be fine?

We could rename ResetParametersIfEmpty to ResetParameters and revert the change, that way we can still pass a vanilla network, maybe I missed something? There is already a test that uses the copy constructor, but would be great if we could provide backward compatibility.

Yeah, you are right.. Actually I didn't notice that Parameters() had already been added to SimpleDQN..
Anyways, now I've done the necessary changes.. Kindly have a look..

…qn.hpp Co-Authored-By: Marcus Edel <marcus.edel@fu-berlin.de>

birm

Great work incorporating the changes!

sriramsk1999 · 2020-04-07T04:36:15Z

Hi @nishantkr18 , I was having a look at the code and I had a few questions. Wouldn't structuring the network this way be too restrictive? As in a simpleDQN would always be a two-layer network with Linear and ReLu?

What would the process be if I wanted to use a DQN of a different architecture, say 3-layer Convolution with LeakyReLU? I hope my question is clear :)

nishantkr18 · 2020-04-07T10:55:19Z

Hi @nishantkr18 , I was having a look at the code and I had a few questions. Wouldn't structuring the network this way be too restrictive? As in a simpleDQN would always be a two-layer network with Linear and ReLu?

What would the process be if I wanted to use a DQN of a different architecture, say 3-layer Convolution with LeakyReLU? I hope my question is clear :)

Hi @sriramsk1999 ! One can easily pass custom network architectures directly into QLearning or via SimpleDQN, as is done here.
The purpose of q_networks is that when other extensions are added to DQN, whose network structures are different from each other, instead of creating the entire network structure in the tests itself, one can easily use the preexisting networks from q_networks. But again, custom network support is still available..
I hope that makes it clear. Let me know if there are any other questions :)

sriramsk1999 · 2020-04-07T11:04:55Z

Ah okay, I think I understand now. I thought that q_networks was supposed to replace the existing way to add a network to the QLearning agent.

If I understood you correctly, it is supplementary to the existing method and acts as a shortcut when using the commonly used architectures. Thanks for the clarification and nice work. :)

zoq

Thanks for putting this together, no more comments from my side.

added q_network

276539e

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Mar 19, 2020

birm added c: methods t: added feature and removed s: unlabeled labels Mar 19, 2020

birm reviewed Mar 19, 2020

View reviewed changes