-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Create a set of performance benchmark tests to run nightly. #19945
Conversation
355721b
to
16aa3ad
Compare
actor_learning_rate: 0.0003 | ||
critic_learning_rate: 0.0003 | ||
entropy_learning_rate: 0.0003 | ||
num_workers: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we not use any workers for SAC halfcheetah?
Hi Sven, I'd love your opinion about the set of tests I want to run nightly. |
0c0ca1c
to
b51e5fa
Compare
@@ -0,0 +1,131 @@ | |||
apex-breakoutnoframeskip-v4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we sort these by alphabet?
desired_throughput = None | ||
# TODO(Jun): Stop checking throughput for now. | ||
# desired_throughput = checks[experiment]["min_throughput"] | ||
desired_throughput = checks[experiment]["min_throughput"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for a later PR: Should we measure env-throughput and learner-throughput separately?
rllib/utils/test_utils.py
Outdated
else: | ||
keys.append(re.sub("^(\\w+)-", "\\1-torch-", k)) | ||
experiments[keys[0]] = e | ||
# Generate `checks` dict for all experiments (tf, tf2 and/or torch). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question: This means we'll include tf2 in our weekly learning tests as well as our nightly multi-gpu tests, correct? I think the multi-gpu one would fail since RLlib+tf2 does not support multi-gpu yet. Could you check this? We should probably have a way to specify, which frameworks to test in the yaml files.
Like:
...
config:
frameworks: [tf, torch] # instead of "framework": expands `experiments_to_run`, then removes the "frameworks" key from the struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions about things that may break with this change wrt nightly multi-gpu tests and weekly release tests.
0675d20
to
c0aeb70
Compare
all done. thanks for the thoughtful review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this cool PR. Loving it! :)
Why are these changes needed?
Track performances of the most important algorithms nightly.
The way these tests are set up is that we run them for a fixed amount of time every day without passing criteria.
We will then record the track the average reward achieved and throughput over time.
Run all of the RLlib nightly and weekly tests in tf2 framework too.
Write performance metrics in the output json file.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.