Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvclive: update how to track results #4674

Merged
merged 10 commits into from
Aug 8, 2023
Merged

Conversation

dberenbaum
Copy link
Contributor

Opening this in place of #4660 based on the comment to keep everything in one page.

Closes #4644.

This separates how dvclive tracks results and works with git and dvc into its own page. Before merging, we should decide where these explanations are sufficient, and where we need to make product updates to simplify.


<admon type="tip">

`save_dvc_exp=True` is ignored when [running with DVC](#run-with-dvc) since
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it save_dvc_exp=False that is ignored? or just save_dvc_exp

Copy link
Contributor

@daavoo daavoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increased complexity worries me, although I don't have that many ideas on how to fight that. I would prefer to keep DVCLive docs about the happy path.

I see some potential changes like:

  • making save_dvc_exp=True by default in DVCLive so we could drop all the paragraphs about it.

  • Dropping Track large artifacts with DVC from here. We could say something like "use log_artifact to track with DVC" and redirect to a DVC page about data management.

  • Dropping Run with DVC . We could say "If you have or want to use a DVC pipeline go here" and link to a DVC page about pipelines.

  • Dropping Customize with DVC. It feels like it should be part of Running with DVC.

@shcheklein
Copy link
Member

The increased complexity worries me, although I don't have that many ideas on how to fight that.

Same. And I also don't know a good solution for this yet. It feels we need to brainstorm the next iteration. What else we can do to make it simler.

@dberenbaum
Copy link
Contributor Author

I would prefer to keep DVCLive docs about the happy path.

Not pretending to know the right balance of simplicity vs complexity which we are always struggling to get right, but my sense from recent feedback is that we have enough simple happy-path examples, and people struggle to understand how things work beyond that. This page to me is the equivalent of the dvclive user guide, where I would expect an in-depth explanation of how things work. How does it hurt the happy path?

  • making save_dvc_exp=True by default in DVCLive so we could drop all the paragraphs about it.

We can do this next release, but I think we should still mention here how it works or there's no way for people to understand what it does or the dangers of setting it to false.

  • Dropping Track large artifacts with DVC from here. We could say something like "use log_artifact to track with DVC" and redirect to a DVC page about data management.

  • Dropping Run with DVC . We could say "If you have or want to use a DVC pipeline go here" and link to a DVC page about pipelines.

This already links to those pages, but I think it's helpful to discuss how it specifically applies to the dvclive scenario.

  • Dropping Customize with DVC. It feels like it should be part of Running with DVC.

What about customizing plots? It doesn't feel to me like it belongs in Running with DVC.

@dberenbaum
Copy link
Contributor Author

Discussed a couple concerns with @daavoo:

  1. How much of this is about pipelines? Is it enough to better explain how to use dvclive with pipelines?
  2. Can we put this info anywhere else to avoid developing a dvclive-specific guide?

Let me know if I missed anything. I'll think on these and try to do another draft.

@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl July 21, 2023 18:42 Failure
@github-actions
Copy link
Contributor

github-actions bot commented Jul 21, 2023

Link Check Report

There were no links to check!

@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl July 21, 2023 18:54 Failure
@dberenbaum
Copy link
Contributor Author

I took another pass at this and here's what I have:

  • Added a separate h2 for Run with DVC to discuss transitioning to pipelines in more depth. This section highlight the awkwardness of the current state, but I'd rather be explicit for now while we think of ways to make it smoother.
  • Under the existing h2 for Track the results, I added a short h3 for Customize with DVC and made a few minor updates but tried not to expand it much.

I'm also open to moving all the info into /docs/user-guide/experiment-management somewhere, but not strong opinion except that it probably doesn't belong in this PR.

@shcheklein @daavoo PTAL when you have a chance 🙏

@dberenbaum
Copy link
Contributor Author

Also note that this would help with iterative/dvclive#631. We could catch cases where users call Live.log_artifact() inside dvc exp run but don't track the output in their pipeline and refer them back to this page.

@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl July 21, 2023 19:36 Failure
@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl July 21, 2023 19:52 Failure
@dberenbaum
Copy link
Contributor Author

dberenbaum commented Jul 21, 2023

Seeing how much space we spend warning about not writing to dvclive/dvc.yaml, I'm very open to writing to the root dvc.yaml instead.

@dberenbaum
Copy link
Contributor Author

@shcheklein @daavoo Any thoughts here? Do you feel it's better to close it?

same path and overwrite the results each time. Include
### Git integration

Unlike other experiment trackers, DVCLive relies on Git to track the [directory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2cs: I think track results can start with a bit basic stuff and something that I think more people can relate to / understands faster.

1.that we can track them in VS Code and Studio
2.may be ways to compare experiments, or just experiments, or tracking experiments - that where we can go into Git concept to a certain degree and large files, etc (even though I still think we need

The biggest issues with explanation is that people don't expect it / can't most likely even understand why we put it here until they hit some issues.

May be another idea - "DVCLive vs other trackers: important workflow details".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed from "Track the results" to "Git and DVC integration" and introduced it by explaining that this differentiates it from other experiment trackers.


Using `Live.log_image()` to log multiple images may also grow too large to track
with Git, in which case you can use
[`Live(cache_images=True)`](/doc/dvclive/live#parameters) to cache them.

### Run with DVC
### Customize with DVC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that probably also a bit too much? even if we keep it - should it be part of the Run with DVC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to part of Run with DVC and consolidated slightly.

experiment run. Instead, write customizations to a new `dvc.yaml` file at the
base of your repository or elsewhere outside the DVCLive directory.

## Run with DVC

Experimenting in Python interactively (like in notebooks) is great for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any other benefits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more benefits listed later in the paragraph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's fine - it's just a bit abstract to me (as an end user). I mean the "more structured way to run
reproducible experiments" part and parallelized hyperparameter search jumps right into the advanced case. Again, I'm paying a lot of attention to this here since I expect the readers of this won't be DVC, and even not necessarily advanced Git users. There should be a story using their language / terminology as much as possible. Sorry, Dave for all this iterations. no intent to block it. I'm fine to merge it any time since it's an improvement already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the examples here from parallelized hyperparameter search to multi-step pipeline or queueing multiple experiments.

@daavoo
Copy link
Contributor

daavoo commented Aug 4, 2023

@shcheklein @daavoo Any thoughts here? Do you feel it's better to close it?

I think the added information is valuable, despite the concerns about formatting/location.
Better to have it (merge, iterate on follow-ups) than not.

@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl August 5, 2023 12:19 Failure
@shcheklein shcheklein had a problem deploying to dvc-org-dvclive-clarifi-pvevvl August 5, 2023 12:23 Failure
@dberenbaum
Copy link
Contributor Author

@shcheklein Did one more round of iterations. Let me know if you want to take a look.

DVCLive expects each run to be tracked by Git, so it will save each run to the
same path and overwrite the results each time. Include
DVCLive differs from some other experiment trackers by relying on Git and DVC
for tracking instead of a central database. This provides a closer connection to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick thought: I guess it's somewhat similar to Tensorboard btw (no Git, but also not central database)

@shcheklein shcheklein requested a deployment to dvc-org-dvclive-clarifi-pvevvl August 8, 2023 14:35 Abandoned
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-clarifi-pvevvl August 8, 2023 14:37 Inactive
@dberenbaum dberenbaum merged commit 7752926 into main Aug 8, 2023
2 checks passed
@dberenbaum dberenbaum deleted the dvclive-clarifications-2 branch August 8, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve docs around dvclice/dvc.yaml
3 participants