HER : new functionality, enables demo based training #474

jangirrishabh · 2018-07-23T12:33:11Z

Add, initialize, normalize and sample from a demo buffer -
Create a demo buffer similar to the primary buffer and initialize it, normalize the data and sample from it while training with the appropriate parameter configuration.
Modify losses and add cloning loss -
When training with demonstrations, use behavior cloning loss as an auxiliary loss as described in the paper.
Add demo file parameter to train.py -
Additional demo_file parameter for inputting the demonstration data file path.
Introduce new params in config.py for demo based training -
Define new parameters to enable demo based training when selected.
Change logger.warning to logger.warn in rollout.py;bug -
In relation to pull request fixed HER nan warning #464
Add data generation file for Fetch environments -
A script to generate episodic data for Fetch Pick and Place task.
Update README file -
Description about how to use the new functionality, what parameters to change and some results.

* Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file

jangirrishabh · 2018-07-23T12:36:05Z

Hello @matthiasplappert , I have tried to implement this paper on "Overcoming exploration with demonstrations in RL" Ashwin et al. over the HER baselines and got improved results and faster convergence with the toughest of the Fetch tasks, Pick and Place by training the agent with demonstration data.

The changes do not affect the original execution of the code in any manner. The only changes that need to be made to the code to run with demonstration data is some parameters in the config.py file, where I have introduced additional parameters and provided proper description alongside. Description provided in the README file.

Functionalities implemented from the paper -

Demonstration Buffer : A separate buffer is created and initialized with demonstration data.
Behavior cloning loss : The loss functions are modified with the addition of behavior cloning loss as an auxiliary loss on the Actor's outputs.
Q-filter : Selecting only the updates from the demonstration examples which actually improve on the Q values.

Execution - The results are reported with a configuration as described in the README file, and are trained with --num_cpu = 1 . Reproducing the results is simple, instructions are given in the README. The original configuration is kept intact such that there are no problems when training with vanilla HER.

Results - Training with demonstrations helps overcome the exploration problem and achieves a faster and better convergence. The graph presented in REAME contrasts the performances of training with and without demonstrations.

Call for Feedback - Kindly try to train the agent after generating demonstrations (with the provided script) and give your valuable feedback

pzhokhov · 2018-09-20T23:15:10Z

@machinaut @matthiasplappert if no comments from your side, I am inclined to merge. We are planning a refactor of HER to conform to the new policy/runner API of baselines, would be good to have all the cool features in before that. Incidentally, @jangirrishabh, if you feel like taking a stab at refactoring, don't let me stop you :)

tabula-rosa · 2018-11-29T04:13:33Z

If I'm not mistaken, the changes to the actor loss that occur in _create_network when processing HER+DDPG training demonstrations does not take effect due to left over lines immediately after the if branching (https://github.com/openai/baselines/pull/474/files#diff-b51ab5fd189c1a23eae02efcc928126fR370). I made a pull request (#740) to fix this issue. Let me know if you have any concerns! Thanks.

jangirrishabh · 2018-11-29T20:12:09Z

Yes indeed, thank you @Timeous for pointing it out, I must have made that error while porting the code to make it Baselines ready from my private repo.
@pzhokhov Please merge pull request from #740 I have checked it (Y)

* Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file

Jxic · 2019-05-21T23:15:12Z

@jangirrishabh Hi, may I ask why the 'prm_loss_weight' and 'aux_loss_weight' are set to 0.001 and 0.0078 respectively? Wouldn't that make training slow down a lot since they don't add up to 1?

jangirrishabh · 2019-05-22T08:42:36Z

Hi @Jxic, these values are essentially taken from the original paper and I just regarded them as hyper-parametric details. That being said, scaling the loss in any way would indeed result in scaling the gradients but the gradient direction remains unchanged, so I am guessing this won't have any affect on the training speed which is dependent on the learning rates instead. But a quick check on the internet reveals this is not as straight forward and is dependent on the type of optimization being used.

Feel free to experiment with them and get your own intuition, it always helps.

* Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file

pzhokhov requested a review from matthiasplappert August 13, 2018 20:31

pzhokhov merged commit 8513d73 into openai:master Oct 23, 2018

tabula-rosa mentioned this pull request Nov 29, 2018

Fix to training HER + DDPG with demos #740

Merged

ljjTYJR mentioned this pull request May 17, 2020

Question about result for Pick Environment ucsdarclab/dVRL#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HER : new functionality, enables demo based training #474

HER : new functionality, enables demo based training #474

jangirrishabh commented Jul 23, 2018

jangirrishabh commented Jul 23, 2018

pzhokhov commented Sep 20, 2018

tabula-rosa commented Nov 29, 2018

jangirrishabh commented Nov 29, 2018

Jxic commented May 21, 2019

jangirrishabh commented May 22, 2019

HER : new functionality, enables demo based training #474

HER : new functionality, enables demo based training #474

Conversation

jangirrishabh commented Jul 23, 2018

jangirrishabh commented Jul 23, 2018

pzhokhov commented Sep 20, 2018

tabula-rosa commented Nov 29, 2018

jangirrishabh commented Nov 29, 2018

Jxic commented May 21, 2019

jangirrishabh commented May 22, 2019