Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark regression tests #258

Merged
merged 11 commits into from
Dec 12, 2018
Merged

Conversation

kanaadp
Copy link
Collaborator

@kanaadp kanaadp commented Nov 18, 2018

Modified benchmark run scripts to be more general, added script to run all benchmarks. We still need a script to pull & evaluate trained benchmarks.

@coveralls
Copy link

coveralls commented Nov 19, 2018

Pull Request Test Coverage Report for Build 1912

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 345 unchanged lines in 19 files lost coverage.
  • Overall coverage increased (+3.9%) to 84.67%

Files with Coverage Reduction New Missed Lines %
flow/scenarios/bottleneck.py 1 94.12%
flow/envs/test.py 1 93.75%
flow/utils/rllib.py 2 97.22%
tests/slow_tests/test_benchmarks.py 2 97.26%
tests/fast_tests/test_scenario_base_class.py 2 99.27%
tests/fast_tests/test_environment_base_class.py 2 99.02%
flow/envs/loop/loop_accel.py 2 96.08%
flow/controllers/routing_controllers.py 3 92.68%
flow/envs/bay_bridge/base.py 3 64.96%
flow/envs/loop/lane_changing.py 4 90.48%
Totals Coverage Status
Change from base Build 1382: 3.9%
Covered Lines: 6755
Relevant Lines: 7978

💛 - Coveralls

config["eval_prob"] = 0.05
config["noise_stdev"] = grid_search([0.01, 0.02])
config["stepsize"] = grid_search([0.01, 0.02])

config["noise_stdev"] = 0.02
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should set the hp based on the best performing for each benchmark (see paper for details)

@@ -60,6 +110,6 @@
"max_failures": 999,
"stop": {"training_iteration": 500},
"num_samples": 1,
"upload_dir": "s3://<BUCKET NAME>"
"upload_dir": "s3://" + upload_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you're not running experiments on aws? this should be defaulted to no uploading there

@@ -68,6 +112,6 @@
"training_iteration": 500
},
"num_samples": 3,
"upload_dir": "s3://<BUCKET NAME>"
"upload_dir": "s3://" + upload_dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

config["lambda"] = grid_search([0.97, 1.0])
config["lr"] = grid_search([5e-4, 5e-5])
config["lambda"] = 0.97
config["lr"] = 5e-4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again

@@ -0,0 +1,40 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! can we document this somewhere though? (either a the start of this script or in the benchmarks' README.md)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peep the README.md

MarketType: spot
#Additional options can be found in the boto docs, e.g.
# SpotOptions:
# MaxPrice: "1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should prob keep this on

config["episodes_per_batch"] = N_ROLLOUTS
config["num_workers"] = N_ROLLOUTS
config["num_workers"] = min(num_cpus, num_rollouts)
config["episodes_per_batch"] = num_rollouts
config["eval_prob"] = 0.05
config["noise_stdev"] = grid_search([0.01, 0.02])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifically, we can remove these duplicate lines?

@eugenevinitsky
Copy link
Member

@kanaadp can we move this to the scripts folder and add some documentation to a README?

@eugenevinitsky
Copy link
Member

@kanaadp tests are failing here so we can't merge it

@eugenevinitsky
Copy link
Member

@kanaadp all comments seem addressed; can we merge?

Copy link
Member

@AboudyKreidieh AboudyKreidieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. should we merge it?

@eugenevinitsky eugenevinitsky merged commit 69b6693 into master Dec 12, 2018
@eugenevinitsky eugenevinitsky deleted the benchmark_regression_tests branch December 12, 2018 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants