Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.dvc file specific cache location #7622

Open
johan-sightic opened this issue Apr 25, 2022 · 7 comments
Open

.dvc file specific cache location #7622

johan-sightic opened this issue Apr 25, 2022 · 7 comments
Labels
A: data-management Related to dvc add/checkout/commit/move/remove feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@johan-sightic
Copy link

Scenario

I am working with a few datasets, some are very large and some are quite small, all are tracked by DVC. I have set up a shared cache on a NAS to be able to handle the large datasets, however there is no need to cache the smaller datasets on the NAS. And since I don't always have access to the NAS I would like to cache the smaller locally.

Possible solution

Add the option to specify a different cache location for a specific .dvc file that is not the project wide cache. For example, this could be another option to output-entries or the cache option could be modified to accept a path.

@johan-sightic
Copy link
Author

johan-sightic commented Apr 25, 2022

A similar feature has already been implemented for remotes: #6486

@karajan1001 karajan1001 added A: data-management Related to dvc add/checkout/commit/move/remove feature request Requesting a new feature labels Apr 25, 2022
@dberenbaum dberenbaum added the p2-medium Medium priority, should be done, but less important label Apr 25, 2022
@johan-sightic
Copy link
Author

Any updates on this?

@efiop
Copy link
Member

efiop commented Feb 3, 2023

@johan-sightic No updates, unfortunately. We do not plan on implementing this ourselves any time soon.

@johan-sightic
Copy link
Author

@efiop I just found dvc-in-subdirectories which I think I can use instead

@johan-sightic
Copy link
Author

@johan-sightic
Copy link
Author

johan-sightic commented Jan 5, 2024

I guess you still have no plans on implementing this? It is still a big pain for us and the subdir solution and other hacks we are trying are far from optimal.

I have been trying to implement this myself but it is not easy to get into the code at this point. Could I have some pointers on what steps to take and which files to modify?

@efiop
Copy link
Member

efiop commented Jan 5, 2024

@johan-sightic Sorry, no plans from our side :( We are a small team and have our own priorities that we are trying to keep up with at this moment.

Regarding pointers, I'm sorry I can't slice it up that fine to the point of particular files, but I would recommend looking into how something simple like dvc checkout works and going from there. The key point these days is really Index and DataIndex that we build as a part of it. See them and _load_storage_from_outs in dvc/repo/index.py. Also notice that we have a remote per output feature supported (see remote in https://dvc.org/doc/user-guide/project-structure/dvc-files), which should be fairly similar to cache per output that you are trying to achieve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

4 participants