Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HER : new functionality, enables demo based training #474

Merged
merged 1 commit into from Oct 23, 2018

Conversation

jangirrishabh
Copy link
Contributor

  • Add, initialize, normalize and sample from a demo buffer -
    Create a demo buffer similar to the primary buffer and initialize it, normalize the data and sample from it while training with the appropriate parameter configuration.

  • Modify losses and add cloning loss -
    When training with demonstrations, use behavior cloning loss as an auxiliary loss as described in the paper.

  • Add demo file parameter to train.py -
    Additional demo_file parameter for inputting the demonstration data file path.

  • Introduce new params in config.py for demo based training -
    Define new parameters to enable demo based training when selected.

  • Change logger.warning to logger.warn in rollout.py;bug -
    In relation to pull request fixed HER nan warning #464

  • Add data generation file for Fetch environments -
    A script to generate episodic data for Fetch Pick and Place task.

  • Update README file -
    Description about how to use the new functionality, what parameters to change and some results.

* Add, initialize, normalize and sample from a demo buffer

* Modify losses and add cloning loss

* Add demo file parameter to train.py

* Introduce new params in config.py for demo based training

* Change logger.warning to logger.warn in rollout.py;bug

* Add data generation file for Fetch environments

* Update README file
@jangirrishabh
Copy link
Contributor Author

Hello @matthiasplappert , I have tried to implement this paper on "Overcoming exploration with demonstrations in RL" Ashwin et al. over the HER baselines and got improved results and faster convergence with the toughest of the Fetch tasks, Pick and Place by training the agent with demonstration data.

The changes do not affect the original execution of the code in any manner. The only changes that need to be made to the code to run with demonstration data is some parameters in the config.py file, where I have introduced additional parameters and provided proper description alongside. Description provided in the README file.

Functionalities implemented from the paper -

  • Demonstration Buffer : A separate buffer is created and initialized with demonstration data.
  • Behavior cloning loss : The loss functions are modified with the addition of behavior cloning loss as an auxiliary loss on the Actor's outputs.
  • Q-filter : Selecting only the updates from the demonstration examples which actually improve on the Q values.

Execution - The results are reported with a configuration as described in the README file, and are trained with --num_cpu = 1 . Reproducing the results is simple, instructions are given in the README. The original configuration is kept intact such that there are no problems when training with vanilla HER.

Results - Training with demonstrations helps overcome the exploration problem and achieves a faster and better convergence. The graph presented in REAME contrasts the performances of training with and without demonstrations.

Call for Feedback - Kindly try to train the agent after generating demonstrations (with the provided script) and give your valuable feedback

@pzhokhov
Copy link
Collaborator

@machinaut @matthiasplappert if no comments from your side, I am inclined to merge. We are planning a refactor of HER to conform to the new policy/runner API of baselines, would be good to have all the cool features in before that. Incidentally, @jangirrishabh, if you feel like taking a stab at refactoring, don't let me stop you :)

@pzhokhov pzhokhov merged commit 8513d73 into openai:master Oct 23, 2018
@tabula-rosa
Copy link
Contributor

If I'm not mistaken, the changes to the actor loss that occur in _create_network when processing HER+DDPG training demonstrations does not take effect due to left over lines immediately after the if branching (https://github.com/openai/baselines/pull/474/files#diff-b51ab5fd189c1a23eae02efcc928126fR370). I made a pull request (#740) to fix this issue. Let me know if you have any concerns! Thanks.

@jangirrishabh
Copy link
Contributor Author

Yes indeed, thank you @Timeous for pointing it out, I must have made that error while porting the code to make it Baselines ready from my private repo.
@pzhokhov Please merge pull request from #740 I have checked it (Y)

huiwenn pushed a commit to huiwenn/baselines that referenced this pull request Mar 20, 2019
* Add, initialize, normalize and sample from a demo buffer

* Modify losses and add cloning loss

* Add demo file parameter to train.py

* Introduce new params in config.py for demo based training

* Change logger.warning to logger.warn in rollout.py;bug

* Add data generation file for Fetch environments

* Update README file
@Jxic
Copy link

Jxic commented May 21, 2019

@jangirrishabh Hi, may I ask why the 'prm_loss_weight' and 'aux_loss_weight' are set to 0.001 and 0.0078 respectively? Wouldn't that make training slow down a lot since they don't add up to 1?

@jangirrishabh
Copy link
Contributor Author

Hi @Jxic, these values are essentially taken from the original paper and I just regarded them as hyper-parametric details. That being said, scaling the loss in any way would indeed result in scaling the gradients but the gradient direction remains unchanged, so I am guessing this won't have any affect on the training speed which is dependent on the learning rates instead. But a quick check on the internet reveals this is not as straight forward and is dependent on the type of optimization being used.

Feel free to experiment with them and get your own intuition, it always helps.

kkonen pushed a commit to kkonen/baselines-1 that referenced this pull request Sep 26, 2019
* Add, initialize, normalize and sample from a demo buffer

* Modify losses and add cloning loss

* Add demo file parameter to train.py

* Introduce new params in config.py for demo based training

* Change logger.warning to logger.warn in rollout.py;bug

* Add data generation file for Fetch environments

* Update README file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants