Skip to content

objects: hash_file: retrieve size from hash_info#6284

Merged
efiop merged 2 commits intotreeverse:masterfrom
isidentical:speed-up-size-computing
Jul 6, 2021
Merged

objects: hash_file: retrieve size from hash_info#6284
efiop merged 2 commits intotreeverse:masterfrom
isidentical:speed-up-size-computing

Conversation

@isidentical
Copy link
Contributor

Resolves #6283. I am still not quite sure to whether we should keep getsize() logic at all, or not since this information should be propagated from an upper level (e.g staging, in this case fills HashInfos). Lazily retrieving this is very bad for performance, especially when you have a lot of files.

@isidentical isidentical requested a review from a team as a code owner July 5, 2021 12:17
@isidentical isidentical requested review from efiop and skshetry July 5, 2021 12:17
@isidentical
Copy link
Contributor Author

Command (100 x 1kb): time dvc import-url -vv s3://dvc-temp/batuhan-test/data --to-remote (master)

Master: 2m10.225s (2m10.564s, 2m11.069s, 2m10.225s)
This PR: 0m57.127s (0m57.318s, 0m57.127s, 0m57.685s)

The difference highly depends on the number of files, since for each file we perform an API call.

@isidentical isidentical force-pushed the speed-up-size-computing branch from 47d77c4 to 8707aaa Compare July 6, 2021 07:06
@isidentical isidentical requested a review from efiop July 6, 2021 07:24
@efiop efiop merged commit 98f7d1e into treeverse:master Jul 6, 2021
@efiop efiop added the optimize Optimizes DVC label Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimize Optimizes DVC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

import-url: unresponsive wait

2 participants

Comments