Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce time-based batch metrics logging and change XGBoost to use it #3619

Merged

Conversation

andrewnitu
Copy link
Collaborator

@andrewnitu andrewnitu commented Nov 2, 2020

What changes are proposed in this pull request?

  • Introduce a new utility for batch logging metrics: it can be used like a normal metrics logger but it automatically keeps them in memory until the log condition is met, and then does a batch log once the condition is met. The current run condition is when the accumulated training time reaches 10x the time of the accumulated batch logging time (thus logging will introduce a 10% overhead to the training).
  • Rewrite XGBoost metrics logging to use this new utility. This is intended as a reference implementation for other ML libraries.

How is this patch tested?

  • Unit tests
  • Manual tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

XGBoost autologging will now incrementally log iterations as they are trained, instead of all at the end of the training.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
@github-actions github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs. labels Nov 2, 2020
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
@andrewnitu andrewnitu changed the title Anitu/xgboost log on every iteration Introduce time-based batch metrics logging and change XGBoost to use it Nov 2, 2020
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
mlflow/entities/metric.py Outdated Show resolved Hide resolved
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewnitu Awesome! Left a few comments



# we pass the batch_metrics_handler through, such that the callback can access it
def _timed_log_batch(batch_metrics_handler, run_id, metrics):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into BatchMetricsHandler? Seems to make sense given that the method refers to an instance of batch_metrics_handler.

batch_metrics_handler.num_log_batch += 1


class BatchMetricsHandler: # BatchMetricsLogger maybe?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Let's call this BatchMetricsLogger.

mlflow/utils/autologging_utils.py Outdated Show resolved Hide resolved
Comment on lines 273 to 276
if step in self.data:
self.data[step].append([int(time_wrapper_for_timestamp() * 1000), metrics])
else:
self.data[step] = [[int(time_wrapper_for_timestamp() * 1000), metrics]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we construct Metric objects here and just append metric objects to a list, rather than keeping track of things by step? Seems like we ultimately collapse everything into a list at purge time anyway. If we want to maintain a sorted order based on step, timestamp, etc, we can use the sorted function within the purge routine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't see why not. Seems kinda wasteful to group them by step then ungroup then again as i'm doing now

self.total_training_time += training_time

if step in self.data:
self.data[step].append([int(time_wrapper_for_timestamp() * 1000), metrics])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can time_wrapper_for_timestamp() and these other timing functions give us times in millis so we don't have to convert them?

Copy link
Collaborator

@dbczumar dbczumar Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I'm not sure we need these timer wrappers. See #3619 (comment)

Comment on lines 213 to 216
metrics_slices = [
metrics[i * MAX_METRICS_PER_BATCH : (i + 1) * MAX_METRICS_PER_BATCH]
for i in range((len(metrics) + MAX_METRICS_PER_BATCH - 1) // MAX_METRICS_PER_BATCH)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the step parameter for range() will simplify things here. e.g:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

(Credits to https://stackoverflow.com/a/312464/11952869)

Comment on lines 198 to 206
def time_wrapper_for_log():
return time.time()


def time_wrapper_for_current():
return time.time()


def time_wrapper_for_timestamp():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wrapping time.time() for mocking, can we just manipulate the total_log_batch_time and total_training_time properties of BatchMetricsHandler in our test cases?

If we're concerned about the measurement of total_training_time and total_log_batch_time, we can always construct another test case that performs sleeps to simulate training / logging and then verifies that total_training_time / total_log_batch_time exceed expected thresholds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to test this but for some reason at least in pytest sleep doesnt increase the system clock, even though the test is obviously taking longer (so the sleep is running)

@@ -340,6 +341,9 @@ def record_eval_results(eval_results):
"""

def callback(env):
batch_metrics_handler.record_metrics(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add batch_metrics_handler as an argument to record_eval_results and thread it through to this callback to ensure that we're not accidentally referencing some state left over from a previous BatchMetricsHandler, for example?



@contextlib.contextmanager
def with_batch_metrics_handler():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add docs here?

Comment on lines 226 to 227
# data is an array of tuples of the form (timestamp, metrics at timestamp)
self.data = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a dictionary to me! (Though I think it should be a list of Metric objects - see comment below)

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Comment on lines 268 to 276
Context manager that yields a BatchMetricsLogger object, which metrics can be logged against.
The BatchMetricsLogger will keep metrics in a list until it decides they should be logged, at
which point the accumulated metrics will be batch logged. The BatchMetricsLogger will ensure
that logging imposes no more than a 10% overhead on the training, where the training is
measured by adding up the time elapsed between consecutive calls to record_metrics.

Once the context is closed, any metrics that have yet to be logged will be logged.

:param run_id: ID of the run that the metrics will be logged to.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Instead of future tense, we should favor using present tense where possible (e.g., The BatchMetricsLogger keeps metrics in a list ... instead of The BatchMetricsLogger will keep metrics in a list...)

# logging metrics on each iteration.
for idx, metrics in enumerate(eval_results):
try_mlflow_log(mlflow.log_metrics, metrics, step=idx)
run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use mlflow.active_run().info.run_id instead? A run should have already been created on line 350 (either that or an active run already existed prior to autologging)

Comment on lines 223 to 224
if self.total_log_batch_time == 0: # we don't yet have data on how long logging takes
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I don't think we need this :)

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Found a few more small issues



@contextlib.contextmanager
def with_batch_metrics_logger(run_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's drop the leading with so that usages become with batch_metrics_logger rather than with with_batch_metrics_logger:

Suggested change
def with_batch_metrics_logger(run_id):
def batch_metrics_logger(run_id):

Comment on lines 269 to 271
with_batch_metrics_logger = BatchMetricsLogger(run_id)
yield with_batch_metrics_logger
with_batch_metrics_logger._purge()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with_batch_metrics_logger = BatchMetricsLogger(run_id)
yield with_batch_metrics_logger
with_batch_metrics_logger._purge()
metrics_logger = BatchMetricsLogger(run_id)
yield metrics_logger
metrics_logger._purge()

self.total_log_batch_time += end - start

def _should_purge(self):
log_batch_time_fudge_factor = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fudge factor seems like the wrong term here. This is the desired ratio of training time to batch logging time. Perhaps target_training_to_logging_time_ratio?


log_batch_mock.reset_mock() # resets the 'calls' of this mock

# the above 'training' took 1 second. So with fudge factor of 10x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't think that fudge factor is accurate terminology for this use case

run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id
with with_batch_metrics_logger(run_id) as batch_metrics_logger:
batch_metrics_logger.record_metrics({"x": 1}, step=0) # data doesn't matter
# first metrics should be skipped to record a previous timestamp and batch log time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped seems to imply that we're dropping metrics or not logging them; I think we mean that we're logging them immediately (i.e. "skipping waiting")



def test_batch_metrics_logger_logs_all_metrics(start_run): # pylint: disable=unused-argument
with mock.patch.object(MlflowClient, "log_batch") as log_batch_mock:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mock log_batch? Ideally, it would be nice to test that the metrics are actually logged by leaving this unmocked and querying the run data after the iterative calls to record_metrics() complete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do this for every test? or is it sufficient to do it for one test and know that it works end-to-end, then save time by keeping the mock for the rest of the tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it seems this really only applies to one of the tests.. the rest care about the intermediate state AS the BatchMetricsLogger is logging, not the final outcome, so it makes more sense to use mocks there

self.total_training_time += training_time

for key, value in metrics.items():
self.data.append(Metric(key, value, current_timestamp, step))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metric timestamp must be an integer value with millisecond resolution; e.g., this should be the following (based on what we do in the fluent API -

timestamp = int(time.time() * 1000)
):

Suggested change
self.data.append(Metric(key, value, current_timestamp, step))
self.data.append(Metric(key, value, int(current_timestamp * 1000), step))

I found this by running our XGBoost example, where I encountered warning logs from the file store about operating on timestamp content in float form, rather than integer form:

/Users/czumar/mlflow/mlflow/xgboost.py:412: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec()
  all_arg_names = inspect.getargspec(original)[0]  # pylint: disable=W1505
[0]     train-mlogloss:0.74723
[1]     train-mlogloss:0.54060
[2]     train-mlogloss:0.40276
[3]     train-mlogloss:0.30789
[4]     train-mlogloss:0.24052
[5]     train-mlogloss:0.19087
[6]     train-mlogloss:0.15471
[7]     train-mlogloss:0.12807
[8]     train-mlogloss:0.10722
[9]     train-mlogloss:0.09053
/Users/czumar/mlflow/mlflow/xgboost.py:387: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131'
  try_mlflow_log(mlflow.log_artifact, filepath)
/Users/czumar/mlflow/mlflow/xgboost.py:465: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131'
  try_mlflow_log(mlflow.log_artifact, filepath)
/Users/czumar/mlflow/mlflow/xgboost.py:501: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131'
  input_example=input_example,

^ Interestingly, the error is only encountered when subsequent artifact logging operations are called, at which point the file store reads the logged metric files (not sure why it needs to do this, but it does)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure to add a test case for this, ensuring that timestamps have millisecond resolution and are integer values?

for i in range(0, len(self.data), MAX_METRICS_PER_BATCH)
]
for metrics_slice in metrics_slices:
MlflowClient().log_batch(run_id=self.run_id, metrics=metrics_slice)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap this in a call to try_mlflow_log() to ensure that failures don't prevent future metrics from being logged?

We should document this behavior and, if possible, we should add a test case for it too.

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
@@ -261,6 +264,9 @@ def batch_metrics_logger(run_id):
that logging imposes no more than a 10% overhead on the training, where the training is
measured by adding up the time elapsed between consecutive calls to record_metrics.

If logging a batch fails, a log will be emitted and subsequent metrics will continue to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If logging a batch fails, a log will be emitted and subsequent metrics will continue to
If logging a batch fails, a warning will be emitted and subsequent metrics will continue to

Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one tiny comment - thanks @andrewnitu !

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
@andrewnitu andrewnitu merged commit 8eed7c4 into mlflow:master Nov 11, 2020
eedeleon pushed a commit to eedeleon/mlflow that referenced this pull request Nov 13, 2020
…it (mlflow#3619)

* xgboost log on every iteration with timing

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* get avg time

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* fix

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* batch send all at the end of training

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* stash

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* rename promise to future

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* remove batch_log_interval

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* make should_purge have no side effects

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* do not assume step anymore

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* add test case

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* stash

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* autofmt

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* linting

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* some cleanup and gather batch log time on initial iteration

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* more cleanup

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* reimport time

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* revert changes to xgboost example

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* add chunking test and clean up tests

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* refactor chunking test

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* revert adding __eq__ method to metric entity

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* remove commented-out code

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* fix xgboost autolog tests

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* remove unused import

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* remove unused import

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* code review

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* fix line lenght

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* change to total log batch time instead of average

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* make test go through two cycles of batch logging

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* code review

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* some code review

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* code review

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* remove extra param from xgboost example

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

* nit fix

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants