Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] move logger and syncer handling to callbacks #11699

Closed
wants to merge 24 commits into from

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Oct 29, 2020

Why are these changes needed?

In order to prepare instantiating loggers (instead of passing logger classes), we need to refactor the logger interface. This PR introduces an ExperimentLogger class which will replace the current (per-trial) Logger classes. An ExperimentLogger lives on the driver and handles logging for all trials. In this first PR, it serves as a per-trial wrapper of Logger objects, but the default loggers will be refactored into their own ExperimentLogger classes in a follow-up PR.

Currently, log and checkpoint syncing lives in the UnifiedLogger class. This has to be refactored as well, since the UnifiedLogger will be depracated. It also makes more sense to separate syncing from logging. Thus, a SyncerCallback is introduced that takes care of syncing.

By default, default loggers and a default syncer is created.

This is a draft PR for testing and development.

Todos for this PR (non exhaustive):

  • Ensure that the SyncerCallback is always called last after any loggers

Todos for follow up PR:

  • Move existing Logger classes to new ExperimentLogger classes
  • Doc changes. Since the current PR preserves (almost all) existing behavior, inclde the doc changes only once we introduced the new ExperimentLogger classes

Breaking changes:

  • The Experiment.loggers spec is now deprecated. Loggers cannot be passed to Experiment objects, as they live as callbacks in the trial runner - per-trial or per-experiment specific loggers are thus not supported anymore.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

python/ray/tune/syncer.py Outdated Show resolved Hide resolved
@@ -594,6 +441,8 @@ def _create_logger(self, config, logger_creator=None):
self._result_logger = logger_creator(config)
self._logdir = self._result_logger.logdir
else:
from ray.tune.logger import UnifiedLogger

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local import to avoid all kinds of circular import problems. Also, it seems like trainable-specific loggers are not really used? I couldn't find a case in the ray code where it has not been instantiated with a noop logger. Should we still keep this for backwards compatibility with custom trainable loggers? Or might this be used by RLLib trainables? cc @ericl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, Trainable specific loggers are used in RLlib, so I think we need to keep this.

@@ -618,6 +436,8 @@ def add_trial(self, trial):
self.trial_executor.try_checkpoint_metadata(trial)

def debug_string(self, delim="\n"):
from ray.tune.progress_reporter import trial_progress_str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid circular imports. also makes sense here because debug_string is just here for, well, debugging

@krfricke krfricke marked this pull request as ready for review October 30, 2020 07:53
Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! left a couple comments, mainly concerned with

  • backwards compat
  • proper functionality

Also, not sure about if we are flushing logs properly - would be great to verify!

@@ -103,7 +103,7 @@ You can also obtain profiling information:

.. code-block:: python

>>> from ray.tune.logger import pretty_print
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(backwards compat) can we keep this for this PR?

doc/source/tune/user-guide.rst Outdated Show resolved Hide resolved
@@ -799,7 +799,7 @@ def on_trial_result(self, trial_runner: "trial_runner.TrialRunner",
if reset_successful:
trial_executor.restore(trial, checkpoint, block=True)
else:
trial_executor.stop_trial(trial, stop_logger=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be keeping this here?

python/ray/tune/syncer.py Outdated Show resolved Hide resolved
@@ -594,6 +441,8 @@ def _create_logger(self, config, logger_creator=None):
self._result_logger = logger_creator(config)
self._logdir = self._result_logger.logdir
else:
from ray.tune.logger import UnifiedLogger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, Trainable specific loggers are used in RLlib, so I think we need to keep this.

@@ -724,6 +544,7 @@ def _process_trial(self, trial):
"""
try:
result = self.trial_executor.fetch_result(trial)
result.update(trial_id=trial.trial_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is getting super complicated :) we should document this (in a separate pr..)

python/ray/tune/trial_runner.py Outdated Show resolved Hide resolved
python/ray/tune/tune.py Outdated Show resolved Hide resolved
python/ray/tune/logger.py Outdated Show resolved Hide resolved
python/ray/tune/logger.py Show resolved Hide resolved
@richardliaw
Copy link
Contributor

Actually, another major concern I have is that this PR has a lot of moving parts. Is it possible to split this PR up?

One strawman would be to do this in 3 parts:

  • Code refactor (move code around into new files)
  • Syncer refactor
  • New logger (Legacy Logger)

Especially the event loop and other weird places (like syncer, flushing, closing/opening loggers) - it's a bit hard to reason about all of these changes at once. This will reduce the chance of needing to revert or causing breaking functionality.

@krfricke
Copy link
Contributor Author

krfricke commented Nov 2, 2020

That might be a good solution. I'll address your comments in this PR first and then create sub PRs.

@krfricke
Copy link
Contributor Author

krfricke commented Nov 2, 2020

I split the PR as proposed. Let's continue the review and merge process in the respective PRs:

#11746
#11748
#11749

@krfricke krfricke closed this Nov 2, 2020
@krfricke krfricke deleted the tune-logger-callback branch September 22, 2023 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants