Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote/no-git logging #638

Closed
dberenbaum opened this issue Jul 25, 2023 · 7 comments
Closed

remote/no-git logging #638

dberenbaum opened this issue Jul 25, 2023 · 7 comments
Labels
A: studio Area: Studio integration p2-medium

Comments

@dberenbaum
Copy link
Contributor

There are several scenarios where it's not realistic to work in a git repo:

  • Databricks
  • Google colab
  • SageMaker training jobs

People may run notebooks or jobs in these environments that are associated with a Git repo, but access to the git repo is awkward when running that notebook or job. In this case, dvclive+studio can still provide a typical experiment tracking experience (logging metrics, plots, etc. to the studio server). If the studio token and project are available, dvclive should still be able to log to studio to see live updates and associate them with an experiment.

It can be out of scope for initial implementation, but it would also be useful to persist the metrics and plots logged in this scenario back to git. If they are not tracked in git, they will be lost when that experiment is merged, reproduced, etc. It might be possible to reverse engineer the metrics and plots from studio and add them back to the git repo.

@dberenbaum dberenbaum added the p1-important Include in the next sprint label Jul 25, 2023
@shcheklein
Copy link
Member

Does it mean that on those platforms people don't use Git usually?

@dberenbaum
Copy link
Contributor Author

I can't say with certainty, but from my experience and limited knowledge, it depends. They may not use it at all, or they may use it in a limited way. As a generalization, I think people sometimes track the notebook or script that gets run, but the outputs are not something that they track, and with other experiment trackers, there's not really much reason to do so.

@dberenbaum
Copy link
Contributor Author

Okay, I see. What is the plan of connecting it back? Those experiments will stay "studio"-only, is it correct?

And why can't we try to do a git clone? any specific obstacles for that?

Originally posted by @shcheklein in #646 (comment)

@shcheklein Answering here so we don't have the discussion all in the PR, and since I don't have all the answers yet, there is likely to be more discussion.

Let me break it down by different possible use cases since they are all unique (and I'm sure there are plenty more):

  • Databricks: it's feasible to clone and commit using the Databricks UI, but DVC doesn't have access to the repo, so users have to do this manually. DVCLive sees it as a no-git scenario, so logging to Studio can only work in no-git mode. It's possible for the user to commit the DVCLive output using the Databricks UI and push it, but there's not currently any way to connect that commit to the live experiment sent to Studio.
  • SageMaker: in notebook mode, it's possible to clone and run like anywhere else, but when using their SDK to do remote training, there is no way to clone the Git repo or do anything other than invoke a single python script (you can download a script or dir from a repo, but there is no way to access the repo itself). Maybe we can find a way to provide further customization, but my initial goal is to have something work with minimal changes to the typical workflow. So far, it would be "Studio-only," but I think it's possible that at the end of training, the "driver" node could ping Studio and retrieve the results.
  • Colab: Users can clone and get the full experience, but it's nice to show metrics in Studio even if they don't.

@shcheklein
Copy link
Member

thanks @dberenbaum !

but I think it's possible that at the end of training, the "driver" node could ping Studio and retrieve the results.

It still means that it's Studio-based at least (Studio is required to have experiments with DVC). Breaks the point a bit of DVC in this case, unfortunately.

By clone I meant - why can't dvclive itself internally clone the repo? Seamlessly for the users, similar to the way we do in CI pretty much. Would it be possible?

@dberenbaum
Copy link
Contributor Author

@shcheklein Do you want to have a call to discuss? I worry this isn't an efficient way to decide the overall approach.

@shcheklein
Copy link
Member

@dberenbaum hey, yes, sure - I think it makes sense to do a call and go over it one more time, probably record it as well.

@dberenbaum
Copy link
Contributor Author

Here is a high-level proposal of where we could go with this:

Metrics and plots

With the Studio token, any metrics, plots, etc. should be pushed directly to Studio as a live experiment, even without access to a Git repo (this is what #646 is for). Any live experiment should be able to be pulled via dvc exp pull, which will retrieve the info saved by Studio. dvc exp apply should be able to apply those files on top of the workspace, and Studio should be able to apply it on top of any existing commit in a new branch/PR. This makes it possible to connect all live experiments back to Git.

Artifacts

For artifacts, dvclive can collect and push hash info to Studio (even if not inside a Git/DVC repo), which can become part of the dvc exp pull/apply mentioned above. To store the artifact files, dvclive can include an option to auto-push to the remote. Studio can serve as a proxy to upload the artifacts to the remote if there is no Git/DVC repo config or the remote is unreachable for any reason. This could also be useful in recovering from an interrupted experiment, since you could regularly push your live experiment results to Studio and your remote, and you could pull and apply the latest result to recover from failures.

@dberenbaum dberenbaum changed the title no-git logging remote/no-git logging Aug 23, 2023
@daavoo daavoo added the A: studio Area: Studio integration label Sep 4, 2023
@dberenbaum dberenbaum added p2-medium and removed p1-important Include in the next sprint labels Feb 22, 2024
@dberenbaum dberenbaum closed this as not planned Won't fix, can't repro, duplicate, stale Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: studio Area: Studio integration p2-medium
Projects
None yet
Development

No branches or pull requests

3 participants