Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] refactor search spaces (old) #10401

Closed
wants to merge 43 commits into from

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Aug 28, 2020

After a failed rebase the discussion has been moved here: #10444

Why are these changes needed?

This introduces a new search space representation that makes it possible to convert a Tune search space to other search algorithm definitions.

This also introduces new sampling methods, like quantized variants uniform and loguniform, called quniform and qloguniform, respectively.

With these abstractions we get a natural way to distinguish between allowed parameter values (called Domains) and the sampling methods (e.g. uniform, loguniform, normal). Theoretically users can introduce their own domains and custom samplers (like sampling from a Beta distribution or so). The underlying API is quite flexible, e.g. Float(1e-4, 1e-2).loguniform().quantized(5e-3). This API is currently hidden behind the tune sampler functions, like tune.qloguniform(1e-4, 1e-2, 5e-3).

Converting Tune search space definitions to search spaces for external search algorithms, like AxSearch, HyperOpt, BayesOpt, etc. ist straightforward. If a search algorithm doesn't support specific sampling methods, they can be dropped with a warning, or an error can be raised. For instance, BayesOpt doesn't support custom sampling methods, and is only interested in parameter bounds. If someone passes Float(1e-4, 1e-2).qloguniform(5e-3) to BayesOpt, it will be converted to the parameter bounds (1e-4, 1e-2) and a warning will be raised stating that the custom sampler has been dropped.

Generally, this refactoring will introduce flexibility in defining and converting search spaces, while keeping full backwards compatibility.

Example usage:

External API:

config = {
    "a": tune.choice([2, 3, 4]),
    "b": {
        "x": tune.qrandint(0, 5, 2),
        "y": 4,
        "z": tune.loguniform(1e-4, 1e-2)
    }
}
converted_config = HyperOptSearch.convert_search_space(config)

Lower-level API equivalent:

config = {
    "a": tune.sample.Categorical([2, 3, 4]).uniform(),
    "b": {
        "x": tune.sample.Integer(0, 5).quantized(2),
        "y": 4,
        "z": tune.sample.Float(1e-4, 1e-2).loguniform()
    }
}
converted_config = HyperOptSearch.convert_search_space(config)

Related issue number

Concerns #9969

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@krfricke krfricke added the tune Tune-related issues label Aug 28, 2020
edoakes and others added 19 commits August 28, 2020 13:51
* fix profiling for docker

* small fixes

* use name

* do not import pwd on windows
* Improve reporter module

* Add test_node_physical_stats to test_reporter.py

* Add test_class_method_route_table to test_dashboard.py

* Add stats_collector module for dashboard

* Subscribe actor table data

* Add log module for dashboard

* Only enable test module in some test cases

* CI run all dashboard tests

* Reduce test timeout to 10s

* Use fstring

* Remove unused code

* Remove blank line

* Fix dashboard tests

* Fix asyncio.create_task not available in py36; Fix lint

* Add format_web_url to ray.test_utils

* Update dashboard/modules/reporter/reporter_head.py

Co-authored-by: Max Fitton <mfitton@berkeley.edu>

* Add DictChangeItem type for Dict change

* Refine logger.exception

* Refine GET /api/launch_profiling

* Remove disable_test_module fixture

* Fix test_basic may fail

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <mfitton@berkeley.edu>
* update cloudpickle to 1.6.0

* fix CI timeout
…ay-project#10323)

* Patch error that occurred when there was an entry in the dashboard logs or errors internal data structures, and a worker was removed from the cluster. This would crash the cluster with a KeyError.

* lint

Co-authored-by: Max Fitton <max@semprehealth.com>
* Enable large tests.

* Lint.

* Fix issue.

* Skip all tests.
Copy link
Member

@sumanthratna sumanthratna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is amazing! I left a bunch of comments about list/dict comprehensions which might improve perf but aren't really necessary and might decrease code readability

also, I wonder if we should create a tune.set_seed function to set the seed for numpy and python, similar to pytorch_lightning.seed_everything

Comment on lines 123 to 130
class Iterative(Domain):
def __init__(self, iterator: Iterator):
self.iterator = iterator

def uniform(self):
new = copy(self)
new.set_sampler(Uniform())
return new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember reading this in the design doc—what's the use-case for Iterative? I think this warrants an explanation in the docs too

python/ray/tune/suggest/ax.py Outdated Show resolved Hide resolved
python/ray/tune/suggest/ax.py Show resolved Hide resolved
python/ray/tune/suggest/bayesopt.py Outdated Show resolved Hide resolved
python/ray/tune/suggest/bayesopt.py Outdated Show resolved Hide resolved
python/ray/tune/suggest/bayesopt.py Outdated Show resolved Hide resolved
@@ -91,6 +102,8 @@ def __init__(

self._space = space

self._config = config or {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not set the default of the config kwarg in the initialization function to {} instead of None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws a PEP8 error for me (default argument is mutable)

python/ray/tune/suggest/optuna.py Outdated Show resolved Hide resolved
Comment on lines +239 to +241
item = out
for k in path[:-1]:
item = item[k]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this be shortened to item = item[path[-2]]? I think I'm misunderstanding something here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path here is usually a tuple like (a, b, c). We would like to set dt[a][b][c]=x and do this by setting item=dt, then item = item[a], then item=item[b] and finally item[c]=x. Does that make sense?
See also assign_value() in variant_generator.py

* requesting new workers only when pipelines to existing ones are full

* linting

* added unit testing & linting

* finished refactoring to consolidate all the fields that belong to a SchedulingKey into a single hashmap

* linting

* fixed bugs introduced by rebasing from new upstream master

* changes as part of the PR review process

* Fix typo in src/ray/core_worker/transport/direct_task_transport.cc

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>

* Fixed comment in src/ray/core_worker/transport/direct_task_transport.cc

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* second revision, with linting. all tests are passing locally

* Renamed SafeToDeleteEntry method in SchedulingKeyEntry

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* all new revisions but the memory leak check. performed linting.

* added checks to make sure scheduling_key_entries does not leak memory

* linting. all checks passing locally

* edited CheckNoSchedulingKeyEntries function

* linting

* fixed build error on mac

* created public version of CheckNoSchedulingKeyEntries to acquire the lock

* linting

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
@krfricke
Copy link
Contributor Author

Thanks for the benchmarks and suggestions, I applied them.

…rch-space

# Conflicts:
#	python/ray/tune/sample.py
#	python/ray/tune/suggest/ax.py
#	python/ray/tune/suggest/bayesopt.py
#	python/ray/tune/suggest/optuna.py
#	python/ray/tune/tests/test_sample.py
@krfricke
Copy link
Contributor Author

Oh well, something went wrong with the latest merge. I'll try to fix that

@krfricke krfricke closed this Aug 31, 2020
@krfricke
Copy link
Contributor Author

Weird, when I create a PR from the exact same branch it doesn't blow. Let's continue the discussion here: #10444

@krfricke krfricke changed the title [tune] refactor search spaces [tune] refactor search spaces (old) Aug 31, 2020
@krfricke krfricke mentioned this pull request Aug 31, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tune Tune-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet