-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove dvc lib requirement #397
Comments
Mixed feelings. I am ok with adopting short-term solutions like using the CLI and reimplementing logic, but:
Not saying this is not True, but I think we should use the existing analytics for loyal users and check the numbers.
I agree it's heavy. Not a good excuse but in the Python data science context (DVCLive scope), perhaps is it not that worrying.
My main concern here is that the quick alternatives like using CLI and/or reimplementing existing logic (like we do with If we consider this really critical, we could consider what are the critical parts that can be decoupled from the rest of the DVC repo and create a separate, lightweight package. For some things is trickier, like
Studio Live metrics, currently. |
I would not use Python ecosystem as an example though. It's broken, it's heavy, it's painful. We should be trying to avoid this if possible.
that's what we already do I think in some other places. Especially if this is true: |
On my machine and a fresh .venv with only
If we are willing to invest efforts, it might be more beneficial to optimize how dependencies are handled in DVC (make optional imports here and there + configure install requirements) than to invest time in decoupling / extracting common code. |
Any specific ideas about this? Asking because I can't think about a good solution from the top of my head. The only thing that comes to my mind is to make |
Something like that but not so All the UI/CLI and queue stuff would be not needed and those appear to be the heaviest dependencies |
may be |
Is a thin version of dvc something that's worth spending time on at the moment? Many of the heaviest deps are scm-related, so they are needed anyway, and I don't really want to limit what parts of dvc can be used by dvclive if it's not necessary. It's useful to be able to install dvclive without dvc, but as an optional dependency To reiterate some points that I think we all already know: It's common for other helper libraries like this to depend on their "parent" library (lightning, keras, etc.), and dvclive already depends on these much heavier ML frameworks. This pattern is so common that I think trying to go against it will only confuse most Python/DS users. Also, looking at our event data, only about 5% come from binary installs. I'm not sure the global cli approach is that common among DS, because many DS are used to the conda or other python packaging system and terrified to change versions of a package that is working for their current model setup. |
Exactly, thats why isolating (e.g. install CLI globally) would be an easier way for users to deal with this. DVC opens the whole "can of worms" with all its dependencies- there will a lot of case when some wheel is missing, some conflict between tqdm that DVC needs and being used by my code, etc, etc. It's a very large surface of additional conflicts and pain.
Hmm. I'm not sure this is true tbh. Loggers come as part of the main frameworks mostly? And don't require any additional dependencies? Especially of the DVC size, especially the one that installs and overrides potentially CLI. Unless I'm missing something. |
@shcheklein I don't want to dispute your legitimate concerns, which we can keep discussing, but I want to acknowledge the reality of the pydata community, which encompasses most of our users. For tf, they started to bundle keras with tf2, but keras still has tf as a dependency and uses it as a lib. For torch, it doesn't come with any high-level framework, and the high-level frameworks for it (lightning, fast.ai) depend on it and use it as lib (they even have cli). Those deps lists include things like pandas, numpy, scipy, sklearn, etc. that are all way heavier than dvc (same with mlflow). My take: DVC is still changing frequently, so I would be wary to have a global install that could break my projects with updates or leave me with an old version forever. I think most DS would rather deal with dependency hell than risk a version change breaking the reproducibility of their project, so they install the latest version of dvc inside their venv at the start of each project and never touch it again (which the lack of binary installs seems to support). I'm not saying it's a good engineering practice, but this is what users do. |
I thought that most of them were including loggers out of the box, like this https://docs.wandb.ai/guides/integrations/lightning :
but from what I see wandb is quite lightweight anyways: https://github.com/wandb/wandb/blob/main/requirements.txt (no heavy DS/ML deps) For MLFlow there is https://pypi.org/project/mlflow-skinny/ . But I agree, that if people have to install it as a regular full mlfow with all extras - that's problematic as well. I doubt ppl are happy about this. (btw, I don't think that keras, tf, pytorch, etc - are good examples in this case. They are main frameworks, it's expected that they are heavy, might depend on each other, etc - it's natural. I would try to analyze callbacks, loggers in this case. If it becomes painful to install an extra logger that becomes painful and people might not have enough motivation) |
@shcheklein, any specific concerns regarding dvc's dependency? I know |
@skshetry a few concerns:
|
Recursive list of deps (click to expand)$ pipdeptree -p dvc -e dvc-http
Warning!! Cyclic dependencies found:
* dvc-http => dvc => dvc-http
* dvc => dvc-http => dvc
------------------------------------------------------------------------
dvc==2.33.3.dev137+g198daec42
- appdirs [required: >=1.4.3, installed: 1.4.4]
- colorama [required: >=0.3.9, installed: 0.4.6]
- configobj [required: >=5.0.6, installed: 5.0.6]
- six [required: Any, installed: 1.16.0]
- distro [required: >=1.3.0, installed: 1.8.0]
- dpath [required: >=2.0.2,<3, installed: 2.0.8]
- dvc-data [required: ==0.28.4, installed: 0.28.4]
- attrs [required: >=21.3.0, installed: 21.4.0]
- dictdiffer [required: >=0.8.1, installed: 0.9.0]
- diskcache [required: >=5.2.1, installed: 5.4.0]
- dvc-objects [required: ==0.14.0, installed: 0.14.0]
- fsspec [required: >=2022.10.0, installed: 2022.11.0]
- funcy [required: >=1.14, installed: 1.17]
- shortuuid [required: >=0.5.0, installed: 1.0.11]
- tqdm [required: >=4.63.1,<5, installed: 4.64.1]
- typing-extensions [required: >=3.7.4, installed: 4.4.0]
- funcy [required: >=1.14, installed: 1.17]
- nanotime [required: >=0.5.2, installed: 0.5.2]
- pygtrie [required: >=2.3.2, installed: 2.5.0]
- shortuuid [required: >=0.5.0, installed: 1.0.11]
- dvc-render [required: ==0.0.15, installed: 0.0.15]
- dvc-task [required: ==0.1.8, installed: 0.1.8]
- celery [required: >=5.2.0,<6, installed: 5.2.7]
- billiard [required: >=3.6.4.0,<4.0, installed: 3.6.4.0]
- click [required: >=8.0.3,<9.0, installed: 8.1.3]
- click-didyoumean [required: >=0.0.3, installed: 0.3.0]
- click [required: >=7, installed: 8.1.3]
- click-plugins [required: >=1.1.1, installed: 1.1.1]
- click [required: >=4.0, installed: 8.1.3]
- click-repl [required: >=0.2.0, installed: 0.2.0]
- click [required: Any, installed: 8.1.3]
- prompt-toolkit [required: Any, installed: 3.0.33]
- wcwidth [required: Any, installed: 0.2.5]
- six [required: Any, installed: 1.16.0]
- kombu [required: >=5.2.3,<6.0, installed: 5.2.4]
- amqp [required: >=5.0.9,<6.0.0, installed: 5.1.1]
- vine [required: >=5.0.0, installed: 5.0.0]
- vine [required: Any, installed: 5.0.0]
- pytz [required: >=2021.3, installed: 2022.6]
- vine [required: >=5.0.0,<6.0, installed: 5.0.0]
- funcy [required: >=1.17, installed: 1.17]
- kombu [required: >=5.2.0,<6, installed: 5.2.4]
- amqp [required: >=5.0.9,<6.0.0, installed: 5.1.1]
- vine [required: >=5.0.0, installed: 5.0.0]
- vine [required: Any, installed: 5.0.0]
- shortuuid [required: >=1.0.8, installed: 1.0.11]
- dvclive [required: >=1.2.2, installed: 1.2.2]
- dvc-render [required: >=0.0.12, installed: 0.0.15]
- ruamel.yaml [required: >=0.17.11, installed: 0.17.21]
- flatten-dict [required: >=0.4.1,<1, installed: 0.4.2]
- six [required: >=1.12,<2.0, installed: 1.16.0]
- flufl.lock [required: >=5, installed: 7.1.1]
- atpublic [required: >=2.3, installed: 3.1.1]
- psutil [required: >=5.9.0, installed: 5.9.4]
- funcy [required: >=1.14, installed: 1.17]
- grandalf [required: ==0.6, installed: 0.6]
- future [required: Any, installed: 0.18.2]
- pyparsing [required: Any, installed: 3.0.9]
- hydra-core [required: >=1.1.0, installed: 1.3.0]
- antlr4-python3-runtime [required: ==4.9.*, installed: 4.9.3]
- omegaconf [required: ~=2.2, installed: 2.2.3]
- antlr4-python3-runtime [required: ==4.9.*, installed: 4.9.3]
- PyYAML [required: >=5.1.0, installed: 5.4.1]
- packaging [required: Any, installed: 21.3]
- pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.9]
- iterative-telemetry [required: ==0.0.6, installed: 0.0.6]
- appdirs [required: Any, installed: 1.4.4]
- distro [required: Any, installed: 1.8.0]
- filelock [required: Any, installed: 3.8.2]
- requests [required: Any, installed: 2.28.1]
- certifi [required: >=2017.4.17, installed: 2022.9.24]
- charset-normalizer [required: >=2,<3, installed: 2.1.1]
- idna [required: >=2.5,<4, installed: 3.4]
- urllib3 [required: >=1.21.1,<1.27, installed: 1.26.13]
- networkx [required: >=2.5, installed: 2.8.8]
- packaging [required: >=19.0, installed: 21.3]
- pyparsing [required: >=2.0.2,!=3.0.5, installed: 3.0.9]
- pathspec [required: >=0.9.0,<0.10.0, installed: 0.9.0]
- psutil [required: >=5.8.0, installed: 5.9.4]
- pydot [required: >=1.2.4, installed: 1.4.2]
- pyparsing [required: >=2.1.4, installed: 3.0.9]
- pygtrie [required: >=2.3.2, installed: 2.5.0]
- pyparsing [required: >=2.4.7, installed: 3.0.9]
- requests [required: >=2.22.0, installed: 2.28.1]
- certifi [required: >=2017.4.17, installed: 2022.9.24]
- charset-normalizer [required: >=2,<3, installed: 2.1.1]
- idna [required: >=2.5,<4, installed: 3.4]
- urllib3 [required: >=1.21.1,<1.27, installed: 1.26.13]
- rich [required: >=10.13.0, installed: 12.6.0]
- commonmark [required: >=0.9.0,<0.10.0, installed: 0.9.1]
- pygments [required: >=2.6.0,<3.0.0, installed: 2.13.0]
- ruamel.yaml [required: >=0.17.11, installed: 0.17.21]
- scmrepo [required: ==0.1.4, installed: 0.1.4]
- asyncssh [required: >=2.7.1,<3, installed: 2.12.0]
- cryptography [required: >=3.1, installed: 38.0.3]
- cffi [required: >=1.12, installed: 1.15.1]
- pycparser [required: Any, installed: 2.21]
- typing-extensions [required: >=3.6, installed: 4.4.0]
- dulwich [required: >=0.20.46, installed: 0.20.50]
- urllib3 [required: >=1.25, installed: 1.26.13]
- fsspec [required: >=2021.7.0, installed: 2022.11.0]
- funcy [required: >=1.14, installed: 1.17]
- gitpython [required: >3, installed: 3.1.29]
- gitdb [required: >=4.0.1,<5, installed: 4.0.10]
- smmap [required: >=3.0.1,<6, installed: 5.0.0]
- pathspec [required: >=0.9.0,<0.10.0, installed: 0.9.0]
- pygit2 [required: >=1.10.0, installed: 1.11.1]
- cffi [required: >=1.9.1, installed: 1.15.1]
- pycparser [required: Any, installed: 2.21]
- pygtrie [required: >=2.3.2, installed: 2.5.0]
- shortuuid [required: >=0.5.0, installed: 1.0.11]
- shtab [required: >=1.3.4,<2, installed: 1.5.8]
- tabulate [required: >=0.8.7, installed: 0.9.0]
- tomlkit [required: >=0.11.1, installed: 0.11.6]
- tqdm [required: >=4.63.1,<5, installed: 4.64.1]
- typing-extensions [required: >=3.7.4, installed: 4.4.0]
- voluptuous [required: >=0.11.7, installed: 0.13.1]
- zc.lockfile [required: >=1.2.1, installed: 2.0]
- setuptools [required: Any, installed: 65.5.0]
$ pipdeptree -p dvc-http -e dvc
Warning!! Cyclic dependencies found:
* dvc-http => dvc => dvc-http
* dvc => dvc-http => dvc
------------------------------------------------------------------------
dvc-http==2.27.2
- aiohttp-retry [required: >=2.5.0, installed: 2.8.3]
- aiohttp [required: Any, installed: 3.8.3]
- aiosignal [required: >=1.1.2, installed: 1.3.1]
- frozenlist [required: >=1.1.0, installed: 1.3.3]
- async-timeout [required: >=4.0.0a3,<5.0, installed: 4.0.2]
- attrs [required: >=17.3.0, installed: 21.4.0]
- charset-normalizer [required: >=2.0,<3.0, installed: 2.1.1]
- frozenlist [required: >=1.1.1, installed: 1.3.3]
- multidict [required: >=4.5,<7.0, installed: 6.0.2]
- yarl [required: >=1.0,<2.0, installed: 1.8.1]
- idna [required: >=2.0, installed: 3.4]
- multidict [required: >=4.0, installed: 6.0.2]
- fsspec [required: Any, installed: 2022.11.0] Flattened list of dependencies (83 in total)
|
I think this is a good goal that we should see if we can work towards. In the short term, looking at issue like iterative/studio-support#73, I would suggest that we are better off making DVC a dependency of DVCLive (and removing DVCLive from DVC) until we are able to stop calling DVC as a lib. At least things will be more explicit, and if DVCLive is installed, DVC will also be installed as a lib and reduce the chances of things breaking. |
Workaround #397 for live experiments with `dvc exp run`. Allows to send live experiment updates when running inside `dvc exp run` and DVC is not available as a Python library.
Opened #481 , doing the opposite to the issue title 😅 but I think it is the best to do for now. |
We need to double check if we need to change VS Code or Studio now to install DVCLive explicitly. Also demo projects. |
Since we are discussing it, we should put DVCLive callbacks into the ML frameworks, and it shouldn't be dependent on this issue (none of the logger libs are installed as deps). |
Both example-get-started and example-get-started include dvclive explicitly already in their requirements. |
Closing in favor of iterative/dvc#9709 |
Originally posted by @shcheklein in #381 (comment)
Dvclive now depends on the dvc api for a few things like:
The text was updated successfully, but these errors were encountered: