Benchmark regression tests #258

kanaadp · 2018-11-18T23:13:07Z

Modified benchmark run scripts to be more general, added script to run all benchmarks. We still need a script to pull & evaluate trained benchmarks.

… be more general

coveralls · 2018-11-19T00:11:31Z

Pull Request Test Coverage Report for Build 1912

0 of 0 changed or added relevant lines in 0 files are covered.
345 unchanged lines in 19 files lost coverage.
Overall coverage increased (+3.9%) to 84.67%

Files with Coverage Reduction	New Missed Lines	%
flow/scenarios/bottleneck.py	1	94.12%
flow/envs/test.py	1	93.75%
flow/utils/rllib.py	2	97.22%
tests/slow_tests/test_benchmarks.py	2	97.26%
tests/fast_tests/test_scenario_base_class.py	2	99.27%
tests/fast_tests/test_environment_base_class.py	2	99.02%
flow/envs/loop/loop_accel.py	2	96.08%
flow/controllers/routing_controllers.py	3	92.68%
flow/envs/bay_bridge/base.py	3	64.96%
flow/envs/loop/lane_changing.py	4	90.48%

Totals
Change from base Build 1382:	3.9%
Covered Lines:	6755
Relevant Lines:	7978

💛 - Coveralls

AboudyKreidieh · 2018-11-20T08:02:42Z

flow/benchmarks/rllib/es_runner.py

    config["eval_prob"] = 0.05
    config["noise_stdev"] = grid_search([0.01, 0.02])
    config["stepsize"] = grid_search([0.01, 0.02])
+
+    config["noise_stdev"] = 0.02


we should set the hp based on the best performing for each benchmark (see paper for details)

AboudyKreidieh · 2018-11-20T08:02:47Z

flow/benchmarks/rllib/es_runner.py

@@ -60,6 +110,6 @@
            "max_failures": 999,
            "stop": {"training_iteration": 500},
            "num_samples": 1,
-            "upload_dir": "s3://<BUCKET NAME>"
+            "upload_dir": "s3://" + upload_dir


what if you're not running experiments on aws? this should be defaulted to no uploading there

AboudyKreidieh · 2018-11-20T08:03:01Z

flow/benchmarks/rllib/ppo_runner.py

@@ -68,6 +112,6 @@
                "training_iteration": 500
            },
            "num_samples": 3,
-            "upload_dir": "s3://<BUCKET NAME>"
+            "upload_dir": "s3://" + upload_dir


AboudyKreidieh · 2018-11-20T08:03:07Z

flow/benchmarks/rllib/ppo_runner.py

-    config["lambda"] = grid_search([0.97, 1.0])
-    config["lr"] = grid_search([5e-4, 5e-5])
+    config["lambda"] = 0.97
+    config["lr"] = 5e-4


AboudyKreidieh · 2018-11-20T08:04:13Z

flow/benchmarks/run_all_benchmarks.sh

@@ -0,0 +1,40 @@
+#!/bin/bash


nice! can we document this somewhere though? (either a the start of this script or in the benchmarks' README.md)

peep the README.md

AboudyKreidieh · 2018-11-20T08:04:40Z

scripts/benchmark_autoscale.yaml

+        MarketType: spot
+         #Additional options can be found in the boto docs, e.g.
+#        SpotOptions:
+#            MaxPrice: "1.0"


nit: should prob keep this on

eugenevinitsky · 2018-11-20T09:01:03Z

flow/benchmarks/rllib/es_runner.py

-    config["episodes_per_batch"] = N_ROLLOUTS
-    config["num_workers"] = N_ROLLOUTS
+    config["num_workers"] = min(num_cpus, num_rollouts)
+    config["episodes_per_batch"] = num_rollouts
    config["eval_prob"] = 0.05
    config["noise_stdev"] = grid_search([0.01, 0.02])


specifically, we can remove these duplicate lines?

eugenevinitsky · 2018-11-29T19:01:22Z

@kanaadp can we move this to the scripts folder and add some documentation to a README?

eugenevinitsky · 2018-11-30T19:46:05Z

@kanaadp tests are failing here so we can't merge it

…flow into benchmark_regression_tests

eugenevinitsky · 2018-12-07T21:31:54Z

@kanaadp all comments seem addressed; can we merge?

AboudyKreidieh

LGTM. should we merge it?

kanaadp added 2 commits November 18, 2018 15:09

added regression test autoscale script, and modified rllib runners to…

41335c5

… be more general

added run all benchmarks script

3c65b22

kanaadp requested review from AboudyKreidieh, cathywu and eugenevinitsky as code owners November 18, 2018 23:13

pep8

f249e60

added special case for grid0, grid1, clean up runscript

65151a9

AboudyKreidieh reviewed Nov 20, 2018

View reviewed changes

eugenevinitsky reviewed Nov 20, 2018

View reviewed changes

addressed PR comments

da659c8

kanaadp added 3 commits November 30, 2018 20:59

Merge branch 'benchmark_regression_tests' of github.com:flow-project/…

a1a75c4

…flow into benchmark_regression_tests

spot price

e53e73d

flake8

724330c

eugenevinitsky added 3 commits December 7, 2018 13:33

Update README.md

9359e11

Update benchmark_autoscale.yaml

b76ba90

Update es_runner.py

cf31bc0

AboudyKreidieh approved these changes Dec 12, 2018

View reviewed changes

eugenevinitsky merged commit 69b6693 into master Dec 12, 2018

eugenevinitsky deleted the benchmark_regression_tests branch December 12, 2018 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark regression tests #258

Benchmark regression tests #258

kanaadp commented Nov 18, 2018

coveralls commented Nov 19, 2018 •

edited

AboudyKreidieh Nov 20, 2018

AboudyKreidieh Nov 20, 2018

AboudyKreidieh Nov 20, 2018

AboudyKreidieh Nov 20, 2018

AboudyKreidieh Nov 20, 2018

kanaadp Dec 1, 2018

AboudyKreidieh Nov 20, 2018

eugenevinitsky Nov 20, 2018

eugenevinitsky commented Nov 29, 2018

eugenevinitsky commented Nov 30, 2018

eugenevinitsky commented Dec 7, 2018

AboudyKreidieh left a comment

Benchmark regression tests #258

Benchmark regression tests #258

Conversation

kanaadp commented Nov 18, 2018

coveralls commented Nov 19, 2018 • edited

Pull Request Test Coverage Report for Build 1912

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eugenevinitsky commented Nov 29, 2018

eugenevinitsky commented Nov 30, 2018

eugenevinitsky commented Dec 7, 2018

AboudyKreidieh left a comment

Choose a reason for hiding this comment

coveralls commented Nov 19, 2018 •

edited