-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark regression tests #258
Conversation
Pull Request Test Coverage Report for Build 1912
💛 - Coveralls |
config["eval_prob"] = 0.05 | ||
config["noise_stdev"] = grid_search([0.01, 0.02]) | ||
config["stepsize"] = grid_search([0.01, 0.02]) | ||
|
||
config["noise_stdev"] = 0.02 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should set the hp based on the best performing for each benchmark (see paper for details)
flow/benchmarks/rllib/es_runner.py
Outdated
@@ -60,6 +110,6 @@ | |||
"max_failures": 999, | |||
"stop": {"training_iteration": 500}, | |||
"num_samples": 1, | |||
"upload_dir": "s3://<BUCKET NAME>" | |||
"upload_dir": "s3://" + upload_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if you're not running experiments on aws? this should be defaulted to no uploading there
flow/benchmarks/rllib/ppo_runner.py
Outdated
@@ -68,6 +112,6 @@ | |||
"training_iteration": 500 | |||
}, | |||
"num_samples": 3, | |||
"upload_dir": "s3://<BUCKET NAME>" | |||
"upload_dir": "s3://" + upload_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again
flow/benchmarks/rllib/ppo_runner.py
Outdated
config["lambda"] = grid_search([0.97, 1.0]) | ||
config["lr"] = grid_search([5e-4, 5e-5]) | ||
config["lambda"] = 0.97 | ||
config["lr"] = 5e-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again
@@ -0,0 +1,40 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! can we document this somewhere though? (either a the start of this script or in the benchmarks' README.md
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peep the README.md
scripts/benchmark_autoscale.yaml
Outdated
MarketType: spot | ||
#Additional options can be found in the boto docs, e.g. | ||
# SpotOptions: | ||
# MaxPrice: "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should prob keep this on
flow/benchmarks/rllib/es_runner.py
Outdated
config["episodes_per_batch"] = N_ROLLOUTS | ||
config["num_workers"] = N_ROLLOUTS | ||
config["num_workers"] = min(num_cpus, num_rollouts) | ||
config["episodes_per_batch"] = num_rollouts | ||
config["eval_prob"] = 0.05 | ||
config["noise_stdev"] = grid_search([0.01, 0.02]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically, we can remove these duplicate lines?
@kanaadp can we move this to the scripts folder and add some documentation to a README? |
@kanaadp tests are failing here so we can't merge it |
…flow into benchmark_regression_tests
@kanaadp all comments seem addressed; can we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. should we merge it?
Modified benchmark run scripts to be more general, added script to run all benchmarks. We still need a script to pull & evaluate trained benchmarks.