-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bringing in DVCLive to DVC #9709
Comments
tldr I agree in a vacuum but it might not make sense to prioritize right now. AFAICT there are two proposals/benefits here:
Did I miss anything? We did previously make dvclive a dependency of dvc. We eventually backtracked on it, and to my surprise it didn't seem to hurt dvclive to separate it from the dvc installation. There are also still discussions in #397 about use cases where they should be separate (for example, in some managed environment I might want a lightweight dvclive install without all of dvc). I don't think those issues are insurmountable, but we don't have good answers for them yet. Reusing the dvc namespace also makes sense but I'm not eager to do it now. We have already started to establish dvclive, and rebranding it this way is likely to cause more confusion in the short term, and I'm not sure it's something we can afford to do right now. |
Thanks @dberenbaum !
Nope - what you wrote are indeed the 2 key benefits.
I'm not sure how this was measured but I'm always very weary of such conclusions "you don't know what you don't know". it's adding confusion and mental load to the users - it looks weird in every code snippet and in the docs and I don't see why 🤷 .
This was "debunked" AFAICT:
I think that we're letting minor (solvable) technical questions, hold us off for obvious product moves/actions that I feel like a lot of us understand are needed (not just me). It really baffles me. to me it's obvious dvclive being a package is an internal implementation details and it's a first order problem burdening the users with it. goes clearly against our attempts of simplifying the ecosystem and make it approachable, and digestable
I don't really understand this point, can you explain? there's no rebranding to do. DVCLive is not really a brand, the fact that it's standalone is the thing I identify as a problem to solve so I don't really understand how this is an argument against the change 🙃 it's doc section needs some superficial editing and to be moved under the Python API (create a "live()" section, but not sure what else you mean by it starting to be established. I don't think that this is a "rebranding" effort as it's already considered part of DVC, and the DVCLive package will still exist. but the users don't have to know about it, only import |
To reiterate, we are only discussing priority since I agree we should ultimately do this. WDYT about adding it to iterative/dvclive#484? It's not technically a breaking change (or doesn't have to be), but I would rather bundle it together with other changes so we don't change too frequently and disorient users (3.0 blog post and older posts and videos become outdated; code snippets used in vs code and studio need to be updated; users may feel like they should migrate their code at some point even though they would probably rather not have to touch it). |
@omesser, number of dependencies matter more than the size. dvc is heavy in that it brings a large number of dependencies, which usually means more conflicts/issues. |
I second the point @skshetry is making. For me, it's also not about the size but the additional dependencies and the locked minimum versions or version ranges that a requirement to have |
Thanks @skshetry, I understand the concerns about dvc being "heavy" in terms of number of dependencies - and it is true, already. However dvclive doesn't change that at all 😄 it brings no dependencies that dvc doesn't already have in terms of external packages. Everything else in the optional dependencies of dvclive is only for testing and development and I'm not suggesting it's imported into dvc. So this is not a differentiating factor as far as I can tell - let me know if I miss something, but it seems inconsequential to this decision / issue. @aschuh-hf - dvc is already a dependency of dvclive since early march (iterative/dvclive#481) ♻️ . So we have a circular dependency issue to resolve here. But, to your concern, this is only about dvc-related internal packages. dvclive will bring NO external packages to dvc. Even treating dvclive as a small utility is now misleading (since it required dvc for some functionality). |
@dberenbaum we gotta take care of the circular dependency and maybe make a note in the docs that dvclive can still be imported directly but as of dvc 3.x.x it can be used from dvc. that's basically it. updating code snippets everywhere is not a task/burden - it's the reward 😄 and there's not time pressure for this. I do believe that the benefits of removing confusion for potential newcomers outweighs the effort needed here, and the confusion/decision existing users would "face" - having to decide whether to change their import from |
Indeed. And I understand with this the two projects become more intertwined. But this wasn't the case before, and maybe one could alternatively have factored out the common components needed by both DVC and DVCLive. But I don't know enough about the reasons behind making dvclive dependent on [the dependency seems to be related to the new |
Observation from a user perspective (I'm an ML engineer) I was currently unable to install dvc along with the Data Scientist model due to a dependency conflict. However, I was able to install dvc with pipx and still track the experiment. This was possible because I did not rely on dvclive. This is imho a huge strategic advantage over other solutions like MLFlow, where this is simply not possible. Now, dvclive would indeed be handy, but frankly I would be reluctant to use it, and would always prefer to use a static dvc.yaml, for the reason mentioned above. You have to consider that I am trying to set up a template project for data scientists, and I want to avoid any possible source of problems. In conclusion, from my point of view, it would be great if dvclive had almost zero dependencies to avoid any possible conflict with other packages. It could be installed together with dvc, I don't mind that. But it should be possible to install dvclive without dvc. |
Thanks for your inputs @francesco086 ! As mentioned before:
😄 It's unfortunate that DVC is already very dependency heavy, maybe this can be improved btw, as a separate effort, but packaging dvclive in DVC should not put a dent in that either way (not improve but not hurt)! |
@francesco086, what was the dependency that was in conflict? We try to be as compatible as possible with dependencies. So it would be helpful if we know where we can improve/fix. |
Just to clarify, I think |
Sry, it was quite some time ago and I can't find it anymore. Now it seems to be possible to install them together. But as far as I remember it was not something for you to fix, rather the model depending on a very old version of some library. |
🙏 |
🤔 do you have any specific use cases in mind already? or just a suggestion to think through, some intuition? |
Closing as we don't have immediate plans to bring dvclive into dvc. |
From a customer/user perspective, it's a long standing point of friction and confusion that DVCLive is it's own separate tool/product and needs to be pip installed / imported separately.
Without breaking / hard-deprecating this use case (and breaking people's code), we should address this - I think there's no real technical barrier to just import it under
dvc.api.live
ordvc.live
and replace all the public references to it to just be part of DVC.The text was updated successfully, but these errors were encountered: