-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Activate DreamerV3 weekly release test (on Pong-v5 with the 100k setup). #45654
[RLlib] Activate DreamerV3 weekly release test (on Pong-v5 with the 100k setup). #45654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: Very excited about the tests.
) | ||
.env_runners( | ||
num_env_runners=(args.num_env_runners or 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that we use only a single env runner to collect new samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The training ratio for this setup (Atari 100k) is 1024, which is huge anyways.
You sample 1 step, you update all models on one 1024 sized batch (B=16 x T=64) from the buffer.
So parallelizing the env collection does not make sense. You don't win any performance out of this. DreamerV3 is all about parallelizing on the learner side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I would have thought that you learn more about the environment if you roll out the new policy more broadly, which in turn would have improved the dynamics model faster to dream better.
Thanks for the clarification @sven1977
runtime_env: | ||
- RLLIB_TEST_NO_JAX_IMPORT=1 | ||
- LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ray/.mujoco/mujoco210/bin | ||
cluster_compute: 1gpu_4cpus.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is using p2.xlarge
which is about $1/hour; from cost perspective, i think it's absolutely fine to use this node for 12 hours per week
from the weekly release perspective, we are thinking about reduce many 24-hour tests to be 8-hour tests, but no concrete plan yet so should not be a blocker
Signed-off-by: sven1977 <svenmika1977@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(leaving for @can-anyscale to review the release test)
…mer_v3_add_release_test Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/tuned_examples/dreamerv3/atari_100k.py
…mer_v3_add_release_test
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…00k setup). (ray-project#45654) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…00k setup). (ray-project#45654) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…00k setup). (ray-project#45654) Signed-off-by: Richard Liu <ricliu@google.com>
Activate DreamerV3 weekly release test (on Pong-v5 with the 100k setup).
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.