Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index: introduce fetch #341

Closed
efiop opened this issue Apr 2, 2023 · 3 comments · Fixed by #346
Closed

index: introduce fetch #341

efiop opened this issue Apr 2, 2023 · 3 comments · Fixed by #346
Assignees

Comments

@efiop
Copy link
Member

efiop commented Apr 2, 2023

We currently have a junky version of fetch based on odb that is not used anywhere. It was part of early experiments (not dvc exp) and is no longer needed.

In dvc fetch we currently do two things:

  1. collect and trasfer objects from regular outputs
  2. download files to a temp location using an index built out of imports

we need to take 2), make it dedup based on source fs/path and download stuff into a temporary location (note that we are not talking about reproducing the structure of indexes there, but purely stashing data somewhere). This will allow us to download stuff optimally across different indexes (e.g. across different git revisions), which also means that fetch should probably accept multiple indexes and not just 1. And probably it should update storage_info.data as a result.

Related https://github.com/iterative/studio/issues/4782

@pmrowla pmrowla self-assigned this Apr 4, 2023
@efiop
Copy link
Member Author

efiop commented Apr 9, 2023

Note that deduping by fs, effectively means creating an index for a storage with files we are interested in and then downloading (aka caching) it locally in some location. For ObjectStoragees we use object storages (aka .dvc/cache), but for FileStorages we would logically use a cloned/rsynced/etc local directory.

Ideally would also avoid having to modify existing indexes that are passed to us, and maybe we could do that by introducing an ability for each Storage to have its own cache, so that when we try to get the data from the storage - it reaches into cache first. There might be better ways to handle this too.

@dberenbaum
Copy link
Contributor

@efiop Could you give a high-level example? Is it mostly about imports and other external data?

@efiop efiop self-assigned this Apr 10, 2023
@efiop
Copy link
Member Author

efiop commented Apr 10, 2023

@dberenbaum This is about all data management that we have in dvc. So that we can get rid of get_used_objs stuff and so that all manipulations (like filtering by size, etc) are in one place.

@pmrowla pmrowla removed their assignment Apr 11, 2023
efiop added a commit to efiop/dvc-data that referenced this issue Apr 14, 2023
efiop added a commit to efiop/dvc-data that referenced this issue Apr 14, 2023
efiop added a commit to efiop/dvc-data that referenced this issue Apr 14, 2023
efiop added a commit to efiop/dvc-data that referenced this issue Apr 14, 2023
efiop added a commit to efiop/dvc-data that referenced this issue Apr 14, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 5, 2023
@efiop efiop closed this as completed in #346 May 6, 2023
efiop added a commit that referenced this issue May 6, 2023
index: introduce fetch

Fixes #341
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit that referenced this issue May 8, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit to efiop/dvc-data that referenced this issue May 8, 2023
efiop added a commit that referenced this issue May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants