Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringing in DVCLive to DVC #9709

Closed
omesser opened this issue Jul 6, 2023 · 16 comments
Closed

Bringing in DVCLive to DVC #9709

omesser opened this issue Jul 6, 2023 · 16 comments
Labels
A: api Related to the dvc.api A: live Related to DVCLive integration discussion requires active participation to reach a conclusion enhancement Enhances DVC

Comments

@omesser
Copy link
Contributor

omesser commented Jul 6, 2023

From a customer/user perspective, it's a long standing point of friction and confusion that DVCLive is it's own separate tool/product and needs to be pip installed / imported separately.

Without breaking / hard-deprecating this use case (and breaking people's code), we should address this - I think there's no real technical barrier to just import it under dvc.api.live or dvc.live and replace all the public references to it to just be part of DVC.

@omesser omesser added enhancement Enhances DVC A: api Related to the dvc.api A: live Related to DVCLive integration labels Jul 6, 2023
@dberenbaum
Copy link
Contributor

tldr I agree in a vacuum but it might not make sense to prioritize right now.

AFAICT there are two proposals/benefits here:

  1. Avoiding two separate installations.
  2. Reusing the dvc namespace when importing live.

Did I miss anything?

We did previously make dvclive a dependency of dvc. We eventually backtracked on it, and to my surprise it didn't seem to hurt dvclive to separate it from the dvc installation. There are also still discussions in #397 about use cases where they should be separate (for example, in some managed environment I might want a lightweight dvclive install without all of dvc). I don't think those issues are insurmountable, but we don't have good answers for them yet.

Reusing the dvc namespace also makes sense but I'm not eager to do it now. We have already started to establish dvclive, and rebranding it this way is likely to cause more confusion in the short term, and I'm not sure it's something we can afford to do right now.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jul 10, 2023
@omesser
Copy link
Contributor Author

omesser commented Jul 11, 2023

Thanks @dberenbaum !

Did I miss anything?

Nope - what you wrote are indeed the 2 key benefits.
But I wouldn't dismiss their weight. I honestly find it peculiar we're still in this state where users are expected to know that dvclive is it's own thing - it's non standard in the space.
after some discussion with @daavoo and the dvc team on slack (internal) - it does not seem like there's a technical barrier (the details of how exactly can vary), or any reason to keep the current state which is, IMO not healthy.

and to my surprise it didn't seem to hurt dvclive to separate it from the dvc installation.

I'm not sure how this was measured but I'm always very weary of such conclusions "you don't know what you don't know". it's adding confusion and mental load to the users - it looks weird in every code snippet and in the docs and I don't see why 🤷 .

I might want a lightweight dvclive install without all of dvc

This was "debunked" AFAICT:

  • dvc is not that large (if installed in a notebook/file "just for live tracking" it's the minimal dvc), it's way smaller than any other experiment tracker and all users accept their size (see size comparison)
  • dvclive now requires dvc. so even if we were to argue over a few MiB before, now it's there anyways (dvclive uses dvc to manage experiment refs I believe)

I don't think those issues are insurmountable, but we don't have good answers for them yet.

I think that we're letting minor (solvable) technical questions, hold us off for obvious product moves/actions that I feel like a lot of us understand are needed (not just me). It really baffles me. to me it's obvious dvclive being a package is an internal implementation details and it's a first order problem burdening the users with it. goes clearly against our attempts of simplifying the ecosystem and make it approachable, and digestable

Reusing the dvc namespace also makes sense but I'm not eager to do it now. We have already started to establish dvclive, and rebranding it this way is likely to cause more confusion in the short term, and I'm not sure it's something we can afford to do right now.

I don't really understand this point, can you explain? there's no rebranding to do. DVCLive is not really a brand, the fact that it's standalone is the thing I identify as a problem to solve so I don't really understand how this is an argument against the change 🙃 it's doc section needs some superficial editing and to be moved under the Python API (create a "live()" section, but not sure what else you mean by it starting to be established. I don't think that this is a "rebranding" effort as it's already considered part of DVC, and the DVCLive package will still exist. but the users don't have to know about it, only import dvc.live and use it (similar to dvc_objects or dvc_data)

@omesser omesser added discussion requires active participation to reach a conclusion and removed awaiting response we are waiting for your reply, please respond! :) labels Jul 11, 2023
@dberenbaum
Copy link
Contributor

To reiterate, we are only discussing priority since I agree we should ultimately do this. WDYT about adding it to iterative/dvclive#484? It's not technically a breaking change (or doesn't have to be), but I would rather bundle it together with other changes so we don't change too frequently and disorient users (3.0 blog post and older posts and videos become outdated; code snippets used in vs code and studio need to be updated; users may feel like they should migrate their code at some point even though they would probably rather not have to touch it).

@skshetry
Copy link
Member

This was "debunked" AFAICT:
it's way smaller than any other experiment tracker and all users accept their size

@omesser, number of dependencies matter more than the size. dvc is heavy in that it brings a large number of dependencies, which usually means more conflicts/issues.

@aschuh-hf
Copy link

I might want a lightweight dvclive install without all of dvc

This was "debunked" AFAICT:
dvc is not that large (if installed in a notebook/file "just for live tracking" it's the minimal dvc) [...]

I second the point @skshetry is making. For me, it's also not about the size but the additional dependencies and the locked minimum versions or version ranges that a requirement to have dvc in the list of my dvclive using project pulls in. As long as dvc was mainly a command line tool, I could install DVC in a separate virtual environment without creating dependency conflicts for my project only because of a small utility such as DVCLive to integrate with it. I don't know if with DVC 3 or recent DVCLive changes this is still feasible, but it was a positive in this regard to have them separate.

@omesser
Copy link
Contributor Author

omesser commented Jul 13, 2023

Thanks @skshetry, I understand the concerns about dvc being "heavy" in terms of number of dependencies - and it is true, already. However dvclive doesn't change that at all 😄 it brings no dependencies that dvc doesn't already have in terms of external packages. Everything else in the optional dependencies of dvclive is only for testing and development and I'm not suggesting it's imported into dvc. So this is not a differentiating factor as far as I can tell - let me know if I miss something, but it seems inconsequential to this decision / issue.

@aschuh-hf - dvc is already a dependency of dvclive since early march (iterative/dvclive#481) ♻️ . So we have a circular dependency issue to resolve here. But, to your concern, this is only about dvc-related internal packages. dvclive will bring NO external packages to dvc. Even treating dvclive as a small utility is now misleading (since it required dvc for some functionality).
If anything, I'd say that this move is probably in the right direction in a technical sense as well - since we might want to move some parts from dvclive to dvc instead of having a package like dvclive requiring all of dvc. it actually might help make dvclive an actual small utility on its own, but this is a different discussion. the point here, is that it enables it and doesn't take step away from making/keeping dvclive "lightweight"

@omesser
Copy link
Contributor Author

omesser commented Jul 13, 2023

@dberenbaum
Thanks, I do believe there's not much effort here (not technical 😄 ), the hurdles are mainly psychological (as I'm not proposing we break current usage of importing dvclive, at least not for now)
Added it to: #9709

we gotta take care of the circular dependency and maybe make a note in the docs that dvclive can still be imported directly but as of dvc 3.x.x it can be used from dvc. that's basically it. updating code snippets everywhere is not a task/burden - it's the reward 😄 and there's not time pressure for this.

I do believe that the benefits of removing confusion for potential newcomers outweighs the effort needed here, and the confusion/decision existing users would "face" - having to decide whether to change their import from dvclive to dvc.live or not 😄 - it's gonna work either way

@aschuh-hf
Copy link

aschuh-hf commented Jul 13, 2023

dvc is already a dependency of dvclive since early march

Indeed. And I understand with this the two projects become more intertwined. But this wasn't the case before, and maybe one could alternatively have factored out the common components needed by both DVC and DVCLive. But I don't know enough about the reasons behind making dvclive dependent on dvc, which to end users is mainly a set of CLIs.

[the dependency seems to be related to the new save_dvc_exp (and dvcyaml) options of DVCLive; a functionality I don't at the moment envision to require / want; maybe I am just missing out on the added value of these new features]

@francesco086
Copy link

Observation from a user perspective (I'm an ML engineer) I was currently unable to install dvc along with the Data Scientist model due to a dependency conflict. However, I was able to install dvc with pipx and still track the experiment. This was possible because I did not rely on dvclive. This is imho a huge strategic advantage over other solutions like MLFlow, where this is simply not possible.

Now, dvclive would indeed be handy, but frankly I would be reluctant to use it, and would always prefer to use a static dvc.yaml, for the reason mentioned above. You have to consider that I am trying to set up a template project for data scientists, and I want to avoid any possible source of problems.

In conclusion, from my point of view, it would be great if dvclive had almost zero dependencies to avoid any possible conflict with other packages. It could be installed together with dvc, I don't mind that. But it should be possible to install dvclive without dvc.

@omesser
Copy link
Contributor Author

omesser commented Sep 5, 2023

Thanks for your inputs @francesco086 !

As mentioned before:

dvclive will bring NO external packages to dvc

😄

It's unfortunate that DVC is already very dependency heavy, maybe this can be improved btw, as a separate effort, but packaging dvclive in DVC should not put a dent in that either way (not improve but not hurt)!

@skshetry
Copy link
Member

skshetry commented Sep 5, 2023

@francesco086, what was the dependency that was in conflict? We try to be as compatible as possible with dependencies. So it would be helpful if we know where we can improve/fix.

@shcheklein
Copy link
Member

Just to clarify, I think pipx DVC+DVCLive won't work in @francesco086 's case. We won't be able to use DVCLive in that scenario. The best would be to have DVCLive that doesn't depend on DVC (or run DVC as a CLI not as Python lib), but that is unfortunately can be complicated to do atm.

@francesco086
Copy link

@francesco086, what was the dependency that was in conflict? We try to be as compatible as possible with dependencies. So it would be helpful if we know where we can improve/fix.

Sry, it was quite some time ago and I can't find it anymore. Now it seems to be possible to install them together. But as far as I remember it was not something for you to fix, rather the model depending on a very old version of some library.

@omesser
Copy link
Contributor Author

omesser commented Sep 8, 2023

@shcheklein

The best would be to have DVCLive that doesn't depend on DVC (or run DVC as a CLI not as Python lib), but that is unfortunately can be complicated to do atm.

🙏
Actually trying to enable this (again) might be another reason to integrate dvclive into dvc (for most common integrative use cases). it would allow us to move some functionality from dvclive to dvc in the future, make dvclive pure/lightweight (again 😄 ).
Then, we can reinstate support for the more rare/involced and specific use case of users wanting "only" dvclive instrumentation library in their python environment (without dvc) if they want

@shcheklein
Copy link
Member

to integrate dvclive into dvc (for most common integrative use cases).

🤔 do you have any specific use cases in mind already? or just a suggestion to think through, some intuition?

@skshetry
Copy link
Member

Closing as we don't have immediate plans to bring dvclive into dvc.

@skshetry skshetry closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api A: live Related to DVCLive integration discussion requires active participation to reach a conclusion enhancement Enhances DVC
Projects
None yet
Development

No branches or pull requests

7 participants