Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decentralized Distributed PPO #245

Open
wants to merge 13 commits into
base: master
from

Conversation

@erikwijmans
Copy link
Contributor

erikwijmans commented Nov 4, 2019

Motivation and Context

Implementation of Decentralized Distributed PPO, a simple, synchronous method for distributed RL.

Paper: https://arxiv.org/abs/1911.00357

How Has This Been Tested

Locally -- adding tests via pytest is kinda odd due to not all builds of pytorch supporting distributed mode.

Types of changes

  • New feature (non-breaking change which adds functionality)
erikwijmans added 4 commits Nov 4, 2019
* Refactor baselines to allow for extension

* Fix eval

* Make clip an after-backward

* Put ddppo on top of refactor

* Making things work as intended again

* Update README

* Update readme more

* Fix table

* Update

* Add task

* readme

* before_step not after_backward

* Fixes

* Add val SPL to weights table

* Add running mean and var code

* Loop over rollout step instead of function for full rollout

* Fix merge

* DD-PPO v2

* Make it work again!

* Little changes

* Fixes

* Make it work!

* Rebased!

* Update authors

* Whoops

* Move EXIT into the inner most loop

* Add contextlib.suppress()

* Hmmmm

* Format

* Fix bug

* Update readme

* Fix env_utils bug

* Update README

* Eval on val and add an option to not take the checkpoints config

* black

* @mathfac comments

* Episode iterator upgrades

* Make default slightly lower

* Fix env.py

* fix is true

* Update docstring

* Fix formatting

* Turn off shuffle!

* Added pre-commit, added isort-seed, changed CI black and isort, added pyproject config

* Make things interruptible and support pretrained

* Precommit and isort

* Preemption seems to work

* Loading pretrained weights works!

* Should be good to go!

* Make config a little better looking

* Add visual encoder caching

* Fix shuffling

* Little refactor
erikwijmans added 2 commits Nov 4, 2019
Fix
Copy link

Akarshit left a comment

Just a small typo. Great paper!

habitat_baselines/config/default.py Outdated Show resolved Hide resolved
habitat_baselines/config/pointnav/ddppo_pointnav.yaml Outdated Show resolved Hide resolved
@mathfac

This comment has been minimized.

Copy link
Contributor

mathfac commented Nov 13, 2019

I am getting run.py: error: unrecognized arguments: --local_rank=0 when I try to run single_node.sh.

@erikwijmans

This comment has been minimized.

Copy link
Contributor Author

erikwijmans commented Nov 13, 2019

Looks like I hadn't updated that since pytorch changed --use_env

erikwijmans added 2 commits Nov 19, 2019
…api into ddppo-single-commit
… ddppo-single-commit
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Dec 4, 2019

Codecov Report

Merging #245 into master will decrease coverage by 4.35%.
The diff coverage is 31.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #245      +/-   ##
==========================================
- Coverage    79.3%   74.95%   -4.36%     
==========================================
  Files          86       95       +9     
  Lines        5896     6456     +560     
==========================================
+ Hits         4676     4839     +163     
- Misses       1220     1617     +397
Impacted Files Coverage Δ
habitat_baselines/rl/ppo/__init__.py 100% <100%> (ø) ⬆️
habitat_baselines/rl/ddppo/__init__.py 100% <100%> (ø)
habitat_baselines/rl/ddppo/policy/__init__.py 100% <100%> (ø)
habitat_baselines/__init__.py 100% <100%> (ø) ⬆️
habitat_baselines/rl/ppo/ppo.py 95.65% <100%> (ø) ⬆️
habitat_baselines/config/default.py 98.95% <100%> (+0.14%) ⬆️
habitat_baselines/rl/ddppo/algo/__init__.py 100% <100%> (ø)
habitat_baselines/rl/ddppo/algo/ddppo_trainer.py 17.36% <17.36%> (ø)
..._baselines/rl/ddppo/policy/running_mean_and_var.py 22.58% <22.58%> (ø)
habitat_baselines/rl/ddppo/policy/resnet_policy.py 22.77% <22.77%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b80fc4...42944f9. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.