Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the tune driver into a remote task #13778

Merged
merged 22 commits into from
Feb 3, 2021
Merged

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jan 29, 2021

Why are these changes needed?

This allows Tune to work efficiently (and work correct log dirs) when running in Ray client against a remote cluster.

TBD: whether this is on by default

@ericl ericl changed the title [WIP] Move the tune driver into a remote task Move the tune driver into a remote task Feb 1, 2021
Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me; cc @krfricke and @amogkam

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 2, 2021
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Some tests are failing for initialization reasons, this is probably a one line fix

if _remote is None:
_remote = ray.util.client.ray.is_connected()
if _remote and not trial_executor:
_ray_auto_init()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to the else block?

get_fn = ray.get
exec_impl = ray.remote(num_cpus=0)(_tune_run_impl).remote
else:
get_fn = lambda x: x # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw a warning/exception if remote=True and a trial executor have been passed explicitly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feel free to ignore though since this is a private parameter for now)

@krfricke
Copy link
Contributor

krfricke commented Feb 2, 2021

Actually, instead of using the _impl_ functions, can we do something like this?

def run(trainable,
        config=None,
        trial_executor=None,
        # ...
        _remote: Optional[bool] = None
):
    if _remote is True and trial_executor:
        raise ValueError("cannot use custom trial executor")
    elif _remote is None:
        _remote = ray.util.client.ray.is_connected()

    _ray_auto_init()

    if _remote:
        return ray.get(ray.remote(num_cpus=0)(run).remote(
            trainable,
            config=config,
            _remote=False))

    # Rest of the current tune.run code

This way we keep the functionality in the run() and run_experiments() definition, avoid argument duplication in those methods, and it might even be more readable. What do you think?

@ericl
Copy link
Contributor Author

ericl commented Feb 2, 2021

Great suggestion; updated.

@ericl ericl merged commit d335ce2 into ray-project:master Feb 3, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants