Added `dvclive.get_step` #142

daavoo · 2021-08-25T11:37:13Z

❗ I have followed the Contributing to DVCLive guide.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

dvc.org PR: iterative/dvc.org#2765

Closes #113
Closes #128

This P.R. introduces a new public function: dvclive.get_step() and removes step from dvclive.init()

The main use cases are driven by (but not limited to) using dvclive alongside dvc checkpoints and resuming training:

Custom control flow

while dvclive.get_step() < X:
    train()
    metrics = eval()
    for m, v in metrics.items():
        dvclive.log(m, v)
    dvclive.next_step()

ML Framework

model.fit(
    . . .
    epochs=params["epochs"],
    initial_epoch=dvclive.get_step(),
)

codecov-commenter · 2021-08-25T11:50:53Z

Codecov Report

Merging #142 (f42f778) into master (1ffce08) will increase coverage by 0.24%.
The diff coverage is 100.00%.

❗ Current head f42f778 differs from pull request most recent head bbab7c8. Consider uploading reports for the commit bbab7c8 to get more accurate results

@@            Coverage Diff             @@
##           master     #142      +/-   ##
==========================================
+ Coverage   90.88%   91.12%   +0.24%     
==========================================
  Files          14       14              
  Lines         340      338       -2     
==========================================
- Hits          309      308       -1     
+ Misses         31       30       -1

Impacted Files	Coverage Δ
dvclive/__init__.py	`100.00% <100.00%> (ø)`
dvclive/metrics.py	`96.93% <100.00%> (+0.78%)`	⬆️
dvclive/mmcv.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ffce08...bbab7c8. Read the comment docs.

dberenbaum

LGTM.

Reviewing raised some questions that are outside the scope of this PR (I can extract to separate issues):

Custom steps

What is the intended use case for a custom step?
Should it overwrite existing values for the same step (it doesn't)?
Should results be ordered by write time or step (they are ordered by write time)?
Why set custom steps at the metric level with dvclive.log(step=n) since the step value should probably apply to all metrics?
If I log one metric and then set a different step for a second metric, which step number should be used for the first metric (it will have a different step in the tsv and the summary json)?

tsv -> summary workflow

Would summary -> tsv be more helpful (this would obviously require summary to always exist)? It's more intuitive to me (and follows the internal logic of MetricLogger._metrics) to gather all metrics for a step and then append to metrics logs. It also enables no-step scenarios like classical ML algorithms by logging the summary without ever creating the tsv files.

tests/test_main.py

pared · 2021-09-06T13:36:43Z

dvclive/__init__.py

+def get_step() -> None:
+    global _metric_logger  # pylint: disable=global-statement
+    _metric_logger = _lazy_init(_metric_logger)
+    return _metric_logger.step


This method returns int.
Also, I think we shoul change MetricLogger's step property into get_step() method to maintain consistency with with API. @daavoo what do you think?

Not sure, tbh. get_X method instead of @property kind of feels strange and but step doesn't sound good for a public method neither

Well, my POV is that I presume that at some point one might want to, parallelize their code, and in that case do something like:
dvclive = MetricsLogger() in that case, dvclive.get_step stops working.
Now that I mention that, it would probably be good to mention that dvclive is not thread-safe, and one needs to initialize their own Loggers in case of parallel jobs.

I see your point.

However, assuming we are focusing on "integrations first" (iterative/example-repos-dev#77 (comment)), the parallelization would happen at the ML Framework level and most ML Frameworks already take care of properly calling the callbacks/loggers.

What's the downside to having the public API match MetricsLogger?

I don't think I understand the point about ML framework integrations. Even if ML frameworks spawn a separate process for each model training, dvclive would try to read/write using the same file by default, right? Users might need to specify a different path for each one, which isn't supported yet in the callbacks.

I would need to investigate each case but, at least in the Deep Learning Frameworks I'm familiar with, parallelism usually occurs:

At the Data Loader level
Which doesn't affect DVCLive callbacks.

In Distributed training strategy
Where ML Framework usually provide some decorator like rank_zero_only / master_only which is (should be) used in the DVCLive callback.

daavoo added 5 commits August 25, 2021 12:54

Added get_step to test_initialization_error

1363e18

Added test_get_step

649fbc8

Added get_step. Remove step from public API

3019405

pre-commit

4bde17d

Added _lazy_init to get_step

c33ed90

daavoo requested review from dberenbaum and pared August 25, 2021 11:37

pre-commit

2ac9b6d

dberenbaum approved these changes Aug 26, 2021

View reviewed changes

tests/test_main.py Show resolved Hide resolved

tests/test_main.py Outdated Show resolved Hide resolved

This was referenced Aug 26, 2021

Custom step #144

Closed

logger: log to summary first, then to logs #145

Closed

Added test suggestions

bbab7c8

daavoo mentioned this pull request Aug 27, 2021

DVCLive get step iterative/dvc.org#2765

Merged

daavoo merged commit 1ae0372 into master Aug 27, 2021

daavoo deleted the step branch August 27, 2021 11:13

casperdcl assigned daavoo Aug 27, 2021

This was referenced Aug 27, 2021

Use infinite loop instead of set number of epochs iterative/dvc-checkpoints-mnist#13

Closed

unlimited epochs, misc fixes iterative/dvc-checkpoints-mnist#14

Merged

pared reviewed Sep 6, 2021

View reviewed changes

pared mentioned this pull request Sep 8, 2021

Add dvclive.set_step #154

Closed

daavoo mentioned this pull request Sep 10, 2021

Added dvclive.set_step #157

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added `dvclive.get_step` #142

Added `dvclive.get_step` #142

Uh oh!

daavoo commented Aug 25, 2021 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 25, 2021 •

edited

Loading

Uh oh!

dberenbaum left a comment

Uh oh!

Uh oh!

Uh oh!

pared Sep 6, 2021

Uh oh!

daavoo Sep 6, 2021 •

edited

Loading

Uh oh!

pared Sep 7, 2021

Uh oh!

daavoo Sep 8, 2021

Uh oh!

dberenbaum Sep 8, 2021

Uh oh!

daavoo Sep 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Added dvclive.get_step #142

Added dvclive.get_step #142

Uh oh!

Conversation

daavoo commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dberenbaum left a comment

Choose a reason for hiding this comment

Custom steps

tsv -> summary workflow

Uh oh!

Uh oh!

Uh oh!

pared Sep 6, 2021

Choose a reason for hiding this comment

Uh oh!

daavoo Sep 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pared Sep 7, 2021

Choose a reason for hiding this comment

Uh oh!

daavoo Sep 8, 2021

Choose a reason for hiding this comment

Uh oh!

dberenbaum Sep 8, 2021

Choose a reason for hiding this comment

Uh oh!

daavoo Sep 9, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Added `dvclive.get_step` #142

Added `dvclive.get_step` #142

daavoo commented Aug 25, 2021 •

edited

Loading

codecov-commenter commented Aug 25, 2021 •

edited

Loading

daavoo Sep 6, 2021 •

edited

Loading