Introduce time-based batch metrics logging and change XGBoost to use it #3619

andrewnitu · 2020-11-02T17:49:41Z

What changes are proposed in this pull request?

Introduce a new utility for batch logging metrics: it can be used like a normal metrics logger but it automatically keeps them in memory until the log condition is met, and then does a batch log once the condition is met. The current run condition is when the accumulated training time reaches 10x the time of the accumulated batch logging time (thus logging will introduce a 10% overhead to the training).
Rewrite XGBoost metrics logging to use this new utility. This is intended as a reference implementation for other ML libraries.

How is this patch tested?

Unit tests
Manual tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

XGBoost autologging will now incrementally log iterations as they are trained, instead of all at the end of the training.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

mlflow/entities/metric.py

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar

@andrewnitu Awesome! Left a few comments

dbczumar · 2020-11-03T17:37:03Z

mlflow/utils/autologging_utils.py

+
+
+# we pass the batch_metrics_handler through, such that the callback can access it
+def _timed_log_batch(batch_metrics_handler, run_id, metrics):


Can we move this into BatchMetricsHandler? Seems to make sense given that the method refers to an instance of batch_metrics_handler.

dbczumar · 2020-11-03T17:37:21Z

mlflow/utils/autologging_utils.py

+    batch_metrics_handler.num_log_batch += 1
+
+
+class BatchMetricsHandler:  # BatchMetricsLogger maybe?


+1. Let's call this BatchMetricsLogger.

mlflow/utils/autologging_utils.py

dbczumar · 2020-11-03T17:48:29Z

mlflow/utils/autologging_utils.py

+        if step in self.data:
+            self.data[step].append([int(time_wrapper_for_timestamp() * 1000), metrics])
+        else:
+            self.data[step] = [[int(time_wrapper_for_timestamp() * 1000), metrics]]


Can we construct Metric objects here and just append metric objects to a list, rather than keeping track of things by step? Seems like we ultimately collapse everything into a list at purge time anyway. If we want to maintain a sorted order based on step, timestamp, etc, we can use the sorted function within the purge routine.

yeah i don't see why not. Seems kinda wasteful to group them by step then ungroup then again as i'm doing now

dbczumar · 2020-11-03T17:49:03Z

mlflow/utils/autologging_utils.py

+        self.total_training_time += training_time
+
+        if step in self.data:
+            self.data[step].append([int(time_wrapper_for_timestamp() * 1000), metrics])


Can time_wrapper_for_timestamp() and these other timing functions give us times in millis so we don't have to convert them?

On second thought, I'm not sure we need these timer wrappers. See #3619 (comment)

dbczumar · 2020-11-03T18:05:34Z

mlflow/utils/autologging_utils.py

+    metrics_slices = [
+        metrics[i * MAX_METRICS_PER_BATCH : (i + 1) * MAX_METRICS_PER_BATCH]
+        for i in range((len(metrics) + MAX_METRICS_PER_BATCH - 1) // MAX_METRICS_PER_BATCH)
+    ]


I think using the step parameter for range() will simplify things here. e.g:

def chunks(lst, n): """Yield successive n-sized chunks from lst.""" for i in range(0, len(lst), n): yield lst[i:i + n]

(Credits to https://stackoverflow.com/a/312464/11952869)

dbczumar · 2020-11-03T18:13:48Z

mlflow/utils/autologging_utils.py

+def time_wrapper_for_log():
+    return time.time()
+
+
+def time_wrapper_for_current():
+    return time.time()
+
+
+def time_wrapper_for_timestamp():


Instead of wrapping time.time() for mocking, can we just manipulate the total_log_batch_time and total_training_time properties of BatchMetricsHandler in our test cases?

If we're concerned about the measurement of total_training_time and total_log_batch_time, we can always construct another test case that performs sleeps to simulate training / logging and then verifies that total_training_time / total_log_batch_time exceed expected thresholds.

i tried to test this but for some reason at least in pytest sleep doesnt increase the system clock, even though the test is obviously taking longer (so the sleep is running)

dbczumar · 2020-11-03T18:15:56Z

mlflow/xgboost.py

@@ -340,6 +341,9 @@ def record_eval_results(eval_results):
            """

            def callback(env):
+                batch_metrics_handler.record_metrics(


Can we add batch_metrics_handler as an argument to record_eval_results and thread it through to this callback to ensure that we're not accidentally referencing some state left over from a previous BatchMetricsHandler, for example?

dbczumar · 2020-11-03T18:16:29Z

mlflow/utils/autologging_utils.py

+
+
+@contextlib.contextmanager
+def with_batch_metrics_handler():


Can we add docs here?

dbczumar · 2020-11-03T18:16:55Z

mlflow/utils/autologging_utils.py

+        # data is an array of tuples of the form (timestamp, metrics at timestamp)
+        self.data = {}


Seems like a dictionary to me! (Though I think it should be a list of Metric objects - see comment below)

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

mlflow/utils/autologging_utils.py

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar · 2020-11-04T20:20:16Z

mlflow/utils/autologging_utils.py

+    Context manager that yields a BatchMetricsLogger object, which metrics can be logged against.
+    The BatchMetricsLogger will keep metrics in a list until it decides they should be logged, at
+    which point the accumulated metrics will be batch logged. The BatchMetricsLogger will ensure
+    that logging imposes no more than a 10% overhead on the training, where the training is
+    measured by adding up the time elapsed between consecutive calls to record_metrics.
+
+    Once the context is closed, any metrics that have yet to be logged will be logged.
+
+    :param run_id: ID of the run that the metrics will be logged to.


Nit: Instead of future tense, we should favor using present tense where possible (e.g., The BatchMetricsLogger keeps metrics in a list ... instead of The BatchMetricsLogger will keep metrics in a list...)

dbczumar · 2020-11-04T20:26:56Z

mlflow/xgboost.py

-        # logging metrics on each iteration.
-        for idx, metrics in enumerate(eval_results):
-            try_mlflow_log(mlflow.log_metrics, metrics, step=idx)
+        run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id


Can we use mlflow.active_run().info.run_id instead? A run should have already been created on line 350 (either that or an active run already existed prior to autologging)

dbczumar · 2020-11-05T17:20:59Z

mlflow/utils/autologging_utils.py

+        if self.total_log_batch_time == 0:  # we don't yet have data on how long logging takes
+            return True


Now I don't think we need this :)

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar

Almost there! Found a few more small issues

dbczumar · 2020-11-06T15:49:57Z

mlflow/utils/autologging_utils.py

+
+
+@contextlib.contextmanager
+def with_batch_metrics_logger(run_id):


Nit: Let's drop the leading with so that usages become with batch_metrics_logger rather than with with_batch_metrics_logger:

Suggested change

def with_batch_metrics_logger(run_id):

def batch_metrics_logger(run_id):

dbczumar · 2020-11-06T15:51:42Z

mlflow/utils/autologging_utils.py

+    with_batch_metrics_logger = BatchMetricsLogger(run_id)
+    yield with_batch_metrics_logger
+    with_batch_metrics_logger._purge()


Suggested change

with_batch_metrics_logger = BatchMetricsLogger(run_id)

yield with_batch_metrics_logger

with_batch_metrics_logger._purge()

metrics_logger = BatchMetricsLogger(run_id)

yield metrics_logger

metrics_logger._purge()

dbczumar · 2020-11-06T15:53:20Z

mlflow/utils/autologging_utils.py

+        self.total_log_batch_time += end - start
+
+    def _should_purge(self):
+        log_batch_time_fudge_factor = 10


Fudge factor seems like the wrong term here. This is the desired ratio of training time to batch logging time. Perhaps target_training_to_logging_time_ratio?

dbczumar · 2020-11-06T15:53:47Z

tests/utils/test_autologging_utils.py

+
+            log_batch_mock.reset_mock()  # resets the 'calls' of this mock
+
+            # the above 'training' took 1 second. So with fudge factor of 10x,


As above, I don't think that fudge factor is accurate terminology for this use case

dbczumar · 2020-11-06T15:55:37Z

tests/utils/test_autologging_utils.py

+        run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id
+        with with_batch_metrics_logger(run_id) as batch_metrics_logger:
+            batch_metrics_logger.record_metrics({"x": 1}, step=0)  # data doesn't matter
+            # first metrics should be skipped to record a previous timestamp and batch log time


skipped seems to imply that we're dropping metrics or not logging them; I think we mean that we're logging them immediately (i.e. "skipping waiting")

dbczumar · 2020-11-06T16:00:24Z

tests/utils/test_autologging_utils.py

+
+
+def test_batch_metrics_logger_logs_all_metrics(start_run):  # pylint: disable=unused-argument
+    with mock.patch.object(MlflowClient, "log_batch") as log_batch_mock:


Do we need to mock log_batch? Ideally, it would be nice to test that the metrics are actually logged by leaving this unmocked and querying the run data after the iterative calls to record_metrics() complete.

do we want to do this for every test? or is it sufficient to do it for one test and know that it works end-to-end, then save time by keeping the mock for the rest of the tests

actually it seems this really only applies to one of the tests.. the rest care about the intermediate state AS the BatchMetricsLogger is logging, not the final outcome, so it makes more sense to use mocks there

dbczumar · 2020-11-06T16:14:27Z

mlflow/utils/autologging_utils.py

+        self.total_training_time += training_time
+
+        for key, value in metrics.items():
+            self.data.append(Metric(key, value, current_timestamp, step))


the metric timestamp must be an integer value with millisecond resolution; e.g., this should be the following (based on what we do in the fluent API -

mlflow/mlflow/tracking/fluent.py

Line 443 in 055978c

timestamp = int(time.time() * 1000)

):

Suggested change

self.data.append(Metric(key, value, current_timestamp, step))

self.data.append(Metric(key, value, int(current_timestamp * 1000), step))

I found this by running our XGBoost example, where I encountered warning logs from the file store about operating on timestamp content in float form, rather than integer form:

/Users/czumar/mlflow/mlflow/xgboost.py:412: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec() all_arg_names = inspect.getargspec(original)[0] # pylint: disable=W1505 [0] train-mlogloss:0.74723 [1] train-mlogloss:0.54060 [2] train-mlogloss:0.40276 [3] train-mlogloss:0.30789 [4] train-mlogloss:0.24052 [5] train-mlogloss:0.19087 [6] train-mlogloss:0.15471 [7] train-mlogloss:0.12807 [8] train-mlogloss:0.10722 [9] train-mlogloss:0.09053 /Users/czumar/mlflow/mlflow/xgboost.py:387: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131' try_mlflow_log(mlflow.log_artifact, filepath) /Users/czumar/mlflow/mlflow/xgboost.py:465: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131' try_mlflow_log(mlflow.log_artifact, filepath) /Users/czumar/mlflow/mlflow/xgboost.py:501: UserWarning: Logging to MLflow failed: invalid literal for int() with base 10: '1604679208.8766131' input_example=input_example,

^ Interestingly, the error is only encountered when subsequent artifact logging operations are called, at which point the file store reads the logged metric files (not sure why it needs to do this, but it does)

Can we make sure to add a test case for this, ensuring that timestamps have millisecond resolution and are integer values?

dbczumar · 2020-11-06T16:23:55Z

mlflow/utils/autologging_utils.py

+            for i in range(0, len(self.data), MAX_METRICS_PER_BATCH)
+        ]
+        for metrics_slice in metrics_slices:
+            MlflowClient().log_batch(run_id=self.run_id, metrics=metrics_slice)


Can we wrap this in a call to try_mlflow_log() to ensure that failures don't prevent future metrics from being logged?

We should document this behavior and, if possible, we should add a test case for it too.

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar · 2020-11-10T21:59:17Z

mlflow/utils/autologging_utils.py

@@ -261,6 +264,9 @@ def batch_metrics_logger(run_id):
    that logging imposes no more than a 10% overhead on the training, where the training is
    measured by adding up the time elapsed between consecutive calls to record_metrics.

+    If logging a batch fails, a log will be emitted and subsequent metrics will continue to


Suggested change

If logging a batch fails, a log will be emitted and subsequent metrics will continue to

If logging a batch fails, a warning will be emitted and subsequent metrics will continue to

dbczumar

LGTM with one tiny comment - thanks @andrewnitu !

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

…it (mlflow#3619) * xgboost log on every iteration with timing Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * get avg time Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * fix Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * batch send all at the end of training Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * stash Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * rename promise to future Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * remove batch_log_interval Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * make should_purge have no side effects Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * do not assume step anymore Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * add test case Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * stash Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * autofmt Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * linting Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * some cleanup and gather batch log time on initial iteration Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * more cleanup Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * reimport time Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * revert changes to xgboost example Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * add chunking test and clean up tests Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * refactor chunking test Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * revert adding __eq__ method to metric entity Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * remove commented-out code Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * fix xgboost autolog tests Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * remove unused import Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * remove unused import Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * code review Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * fix line lenght Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * change to total log batch time instead of average Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * make test go through two cycles of batch logging Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * code review Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * some code review Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * code review Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * remove extra param from xgboost example Signed-off-by: Andrew Nitu <andrewnitu@gmail.com> * nit fix Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu added 16 commits October 27, 2020 19:50

xgboost log on every iteration with timing

35b624b

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

get avg time

d8fff96

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

fix

a3fef38

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

batch send all at the end of training

1eb5da3

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

stash

7eb0d4c

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

rename promise to future

9b1a782

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

remove batch_log_interval

c937852

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

make should_purge have no side effects

cf22cb8

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

do not assume step anymore

13ee723

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

add test case

d74d638

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

stash

05aac0e

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

autofmt

d99f981

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

linting

8c59ab8

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

some cleanup and gather batch log time on initial iteration

2e9cbdf

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

more cleanup

9d15742

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

reimport time

ee7614e

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/none List under Small Changes in Changelogs. labels Nov 2, 2020

revert changes to xgboost example

055971e

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu changed the title ~~Anitu/xgboost log on every iteration~~ Introduce time-based batch metrics logging and change XGBoost to use it Nov 2, 2020

add chunking test and clean up tests

be1db98

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu commented Nov 2, 2020

View reviewed changes

mlflow/entities/metric.py Outdated Show resolved Hide resolved

andrewnitu added 3 commits November 2, 2020 17:40

refactor chunking test

c5a11ae

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

revert adding __eq__ method to metric entity

a36dcf5

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

remove commented-out code

6aab196

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu requested review from dbczumar and harupy November 2, 2020 22:42

andrewnitu added 3 commits November 2, 2020 18:39

fix xgboost autolog tests

e36549e

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

remove unused import

0ec68db

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

remove unused import

1cb6057

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar requested changes Nov 3, 2020

View reviewed changes

andrewnitu added 2 commits November 3, 2020 22:19

code review

b106659

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

fix line lenght

fd61abd

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar reviewed Nov 4, 2020

View reviewed changes

mlflow/utils/autologging_utils.py Outdated Show resolved Hide resolved

andrewnitu added 2 commits November 4, 2020 18:08

change to total log batch time instead of average

1304367

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

make test go through two cycles of batch logging

c7b41ce

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu requested a review from dbczumar November 5, 2020 17:04

dbczumar reviewed Nov 5, 2020

View reviewed changes

code review

3c18c8b

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu requested a review from dbczumar November 5, 2020 22:10

dbczumar reviewed Nov 6, 2020

View reviewed changes

andrewnitu requested a review from mohamad-arabi November 6, 2020 19:24

some code review

1d0c8aa

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu requested a review from edgan8 November 6, 2020 19:54

code review

f665e4f

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu requested a review from dbczumar November 10, 2020 17:26

remove extra param from xgboost example

bd71788

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

dbczumar reviewed Nov 10, 2020

View reviewed changes

dbczumar approved these changes Nov 10, 2020

View reviewed changes

nit fix

26dc835

Signed-off-by: Andrew Nitu <andrewnitu@gmail.com>

andrewnitu merged commit 8eed7c4 into mlflow:master Nov 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce time-based batch metrics logging and change XGBoost to use it #3619

Introduce time-based batch metrics logging and change XGBoost to use it #3619

andrewnitu commented Nov 2, 2020 •

edited

dbczumar left a comment

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020

andrewnitu Nov 4, 2020

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020 •

edited

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020

andrewnitu Nov 4, 2020

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020

dbczumar Nov 3, 2020

dbczumar Nov 4, 2020

dbczumar Nov 4, 2020

dbczumar Nov 5, 2020

dbczumar left a comment

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

andrewnitu Nov 9, 2020

andrewnitu Nov 9, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 6, 2020

dbczumar Nov 10, 2020

dbczumar left a comment



		# we pass the batch_metrics_handler through, such that the callback can access it
		def _timed_log_batch(batch_metrics_handler, run_id, metrics):

		batch_metrics_handler.num_log_batch += 1


		class BatchMetricsHandler: # BatchMetricsLogger maybe?

		# data is an array of tuples of the form (timestamp, metrics at timestamp)
		self.data = {}

		if self.total_log_batch_time == 0: # we don't yet have data on how long logging takes
		return True



		@contextlib.contextmanager
		def with_batch_metrics_logger(run_id):

	def with_batch_metrics_logger(run_id):
	def batch_metrics_logger(run_id):


		log_batch_mock.reset_mock() # resets the 'calls' of this mock

		# the above 'training' took 1 second. So with fudge factor of 10x,



		def test_batch_metrics_logger_logs_all_metrics(start_run): # pylint: disable=unused-argument
		with mock.patch.object(MlflowClient, "log_batch") as log_batch_mock:

	self.data.append(Metric(key, value, current_timestamp, step))
	self.data.append(Metric(key, value, int(current_timestamp * 1000), step))

	If logging a batch fails, a log will be emitted and subsequent metrics will continue to
	If logging a batch fails, a warning will be emitted and subsequent metrics will continue to

Introduce time-based batch metrics logging and change XGBoost to use it #3619

Introduce time-based batch metrics logging and change XGBoost to use it #3619

Conversation

andrewnitu commented Nov 2, 2020 • edited

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Nov 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

andrewnitu commented Nov 2, 2020 •

edited

dbczumar Nov 3, 2020 •

edited