Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show pushed DVC experiments #45

Closed
dberenbaum opened this issue Jan 10, 2022 · 16 comments
Closed

Show pushed DVC experiments #45

dberenbaum opened this issue Jan 10, 2022 · 16 comments
Labels
feature request New feature or request priority-p1

Comments

@dberenbaum
Copy link

If a user runs experiments through dvc exp run workflow, they may push experiments to GitHub or another git server (via dvc exp push). It would be great to see the results of these experiments in Studio.

This may be useful in several scenarios:

  • Sharing results with leadership but need to show comparison to other experiments tried. Leaders may need to see other experiments because they need to provide input on which experiment to choose, or for audit purposes.
  • Team experimentation. If multiple team members are running different experiments, it's useful to be able to compare them all side by side.
  • Individual user who wants to review old experiments. If I need to review results from a previously pushed experiment, I don't want to have to pull all of the experiment info locally to check its performance.
@dberenbaum
Copy link
Author

From https://discord.com/channels/485586884165107732/485596304961962003/930150143259459644:

Hey I was wondering if there is an easy way to visualize experiment results without using the command line, ie to share experiments with a manager that is not technical? (but mb scientific). I was wondering whether there is an easy way to host dvc experiments on a server, a bit like neptune or wandb.

@dberenbaum
Copy link
Author

If it is impossible to detect when new experiment refs have been pushed, IMO it would be acceptable to have users manually request to check for new experiments.

@mvshmakov mvshmakov added the feature request New feature or request label Jan 10, 2022
@shcheklein shcheklein changed the title Show pushed dvc experiments Show pushed DVC experiments May 17, 2022
@dberenbaum
Copy link
Author

Will this be possible when live metrics for local experiments are supported? If we are planning to show live metrics for local experiments, it seems like Studio could also show the final results or send completed experiments.

cc @daavoo

@daavoo
Copy link

daavoo commented Jan 3, 2023

Will this be possible when live metrics for local experiments are supported?
If we are planning to show live metrics for local experiments, it seems like Studio could also show the final results or send completed experiments.

I think the UI components would be there but the client and backend require additional work to support the use case.

The current local live metric only receive updates on stuff logged with DVCLive

@mvshmakov
Copy link

I believe we have another request for this: https://iterativeai.slack.com/archives/CUSNDR35K/p1672740347317239

@dberenbaum
Copy link
Author

I think the UI components would be there but the client and backend require additional work to support the use case.

The current local live metric only receive updates on stuff logged with DVCLive

What would it take to send "non-live"/post-experiment metrics to Studio? Some thoughts on the scope:

  • I don't think it has to rely on dvc exp push or include any info besides what's posted for live metrics. It might be better to start without dvc exp push to keep it lightweight (no GitHub).
  • VS Code will probably also want to use it, so not sure if a Python API alone will be enough unless there's some way for VS Code to call it.

@mvshmakov
Copy link

mvshmakov commented Jan 10, 2023

What would it take to send "non-live"/post-experiment metrics to Studio

I agree with @daavoo here. We'll need to:

  1. Implement a new ingestion API endpoint and another serving API endpoint (modify existing one) on the Studio backend side.
  2. Make dvc exp push aware of the ingestion endpoint and push data there.
  3. Support a serving API endpoint on the Studio client-side + render new rows appropriately.

For DVC <> Studio BE communication we can reuse the contract we currently use for the DVCLive <> Studio BE communication: https://github.com/iterative/dvc-studio-client. We can also reuse existing UI components without a lot of modifications. Client-side is also prepared for more types of experiments.

Hard to say how much time it'll take, but it looks like a separate Story to implement tbh.

@dberenbaum
Copy link
Author

Hard to say how much time it'll take, but it looks like a separate Story to implement tbh.

Thanks @mvshmakov! Yup, that's okay, just want to discuss it for now.

For DVC <> Studio BE communication we can reuse the contract we currently use for the DVCLive <> Studio BE communication: https://github.com/iterative/dvc-studio-client. We can also reuse existing UI components without a lot of modifications. Client-side is also prepared for more types of experiments.

Sorry, I'm not that familiar with the live metrics implementation, so I'm not following. If we have this existing project and contract, why are all the other modifications needed? If DVC fulfills the API contract, what would happen if Studio treats it like a live experiment?

@shcheklein
Copy link
Member

Is it possible to read and subscribe to updates from GH/GL/BB to get these experiments in regular way as we are getting usual commits?

Implement a new ingestion API endpoint and another serving API endpoint (modify existing one) on the Studio backend side. Make dvc exp push aware of the ingestion endpoint and push data there. Support a serving API endpoint on the Studio client-side + render new rows appropriately.

We should think about keeping this protocol open. E.g. we can use and support Git protocol from the Studio end. In this case Studio becomes an extra Git remote that can accept dvc exp push / pull in a regular way.

@dberenbaum
Copy link
Author

  • I don't think it has to rely on dvc exp push or include any info besides what's posted for live metrics. It might be better to start without dvc exp push to keep it lightweight (no GitHub).

To reiterate, this is a bit different than the initial request in this issue, but I'm not focused on dvc exp push here and assume there is no Git ref being pushed somewhere, just metrics and other info that's being pushed for live experiments. For many cases, this is probably enough and pushing Git refs may actually be undesirable due to extra complexity or not wanting to clutter the GH repo with ephemeral experiments.

@shcheklein
Copy link
Member

@dberenbaum got it. dvc exp push in the ticket's description confused me I guess (and probably it's a signal to keep the semantics of the command simple - pushing git refs). Do you feel it should be CLI though, or can be part of DVCLive? something like - log_experiment?

@mvshmakov
Copy link

mvshmakov commented Jan 11, 2023

High-level: should we move the discussion internally and create a respected issue?

High-level: @dberenbaum should we have the functionality of persisting the experiments in the Studio in the first place if we plan to support custom git refs? By decentralizing the experiments store (custom git refs and Studio-stored experiments) we may confuse users even more. In addition, at that point, it is not clear how to show all these different types of experiments in the project table, so it'd not be confusing. I personally believe it'd be better for us to focus on parsing custom refs, but I may be missing something.

Overall, I think this issue is a bit confusing and we should better define the problem and scope (maybe break it into several ones) to be able to iterate on it.

If DVC fulfills the API contract, what would happen if Studio treats it like a live experiment?

The Studio will treat it as a live experiment with all the outcomes. Namely, Studio UI will show a loader near the experiment, the backend will expect start, data and end events (not just a single event with all data), will display the experiment under the base commit, etc. The user will be able to "delete" the experiment, which in the case of dvc exp push will not work as expected (delete in Studio, not delete in the git forge).

We can theoretically push a new experiment with an end event and all the data, so it'll be persisted in the table. It seems to address the feature request functionality, but I'm not sure of the outcomes and if it is the best way to cope with the problem.

Is it possible to read and subscribe to updates from GH/GL/BB to get these experiments in regular way as we are getting usual commits?
... we can use and support Git protocol from the Studio end. In this case Studio becomes an extra Git remote

Listening to push webhook events (say, for the GH) and extracting custom git refs is more desirable I'd say. This way we'll stay within the GitOps concept boundaries and the main forge will still be a single source of truth. Otherwise, if the user manually removes some experiments from custom git refs and pushes them to their main forge omitting DVC, Studio will still have them.

@tapadipti
Copy link

I’m late to the party here. But good to read all the discussions.

Looks like there are 3 ways being proposed to see results of DVC experiments:

  1. Listening to push events and extracting custom git refs (original request in this issue)
  2. DVC (without DVCLive) sending exp results directly to Studio without pushing them to Git
  3. Getting users to use DVCLive for any experiments that they want to see directly in Studio

Using DVCLive (3) is what we are already implementing (Live metrics for local experiments). We can improve it further in the future, and maybe add log_experiment as @shcheklein suggested (and other log_ methods - eg, log_artifact, log_plot etc.). But that is a separate issue.

Regarding DVC pushing directly to Studio (2), I agree with @mvshmakov that By decentralizing the experiments store.. we may confuse users even more. What would be our main reason to have this feature in DVC; what is lacking in the current implementation of sending live metrics for local experiments?

IMO extracting custom git refs (1) is the desirable solution for this isse.

@dberenbaum
Copy link
Author

@mvshmakov You are right that it's a bit off topic. I moved this to a proposal in https://www.notion.so/iterative/Enhancement-Proposal-07b8d53f87b94b7291c289bb1eb45159 to discuss further.

@tapadipti That's a good summary. Take a look at the proposal for why I think it's worthwhile to reuse live metrics functionality.

@shcheklein
Copy link
Member

@dberenbaum thanks! that was helfpul.

so, we already have this dichotomy, already have two (or even three) types of experiments - commits, exp git refs, and live in Studio. Even if we don't do what @dberenbaum suggests, we'll have to reconcile dvc exp ones with live exps anyways I guess.

In terms of functionality (let's think about UX in VS Code) it might complicate it a bit - "Share an experiment" now becomes less clear. @dberenbaum it would be great to think the UX in VS Code and CLI in the proposal.

@amritghimire
Copy link

@dberenbaum Should we close this since we now have dvc experiments visible in Studio?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request priority-p1
Projects
None yet
Development

No branches or pull requests

6 participants