Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add summary to dvc status -c [target] #3568

Closed
AljoSt opened this issue Apr 1, 2020 · 7 comments
Closed

Add summary to dvc status -c [target] #3568

AljoSt opened this issue Apr 1, 2020 · 7 comments
Labels
enhancement Enhances DVC feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@AljoSt
Copy link

AljoSt commented Apr 1, 2020

In order to estimate how long a pull is going to take, it would be good if dvc status -c [target] would supply a summary in the form of Total: N files, X GB

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Apr 1, 2020
@skshetry skshetry added enhancement Enhances DVC feature request Requesting a new feature good first issue p3-nice-to-have It should be done this or next sprint labels Apr 1, 2020
@triage-new-issues triage-new-issues bot removed triage Needs to be triaged labels Apr 1, 2020
@AljoSt
Copy link
Author

AljoSt commented Apr 2, 2020

had a brief look into this, but I'm not quite sure where to start. I guess we would have a method here that adds the filesize for each object, and then summarized and printed here? Not sure though, how to get the file size

@shcheklein
Copy link
Member

@AljoSt this a really great feature request. You pointed the right places to do changes. The biggest question is how do we get file sizes. We don't have that functionality right now in the remote class, only exists method or list all that do check if files exists remotely or returns all files. They both do not provide any additional info currently. So, we would probably need to "teach" them to return more info.

I suppose, we can start with one of the remotes to try. What do you think, @efiop ?

@AljoSt what remote storage you are mostly interested in?

@AljoSt
Copy link
Author

AljoSt commented Apr 3, 2020

@shcheklein I'm currently using S3

@shcheklein
Copy link
Member

@AljoSt so, in case of S3 (check the s3.py file) we use s3.head_object and list_objects. I think both of them return sizes so we can extend (might be needed to rename exists or make a second function) them to returns file sizes.

Suggested TODO (@efiop please review :) ):

  • Check that s3.head_object returns actual object size (even if it's a multi-part upload).
  • Check that s3.list_object and s3.list_object_v2 return sizes. Again, need to be sure that it properly returns it in case of multipart.
  • Create a function like get_file_size that returns size of the object it if exists.
  • Modify
  • Use get_file_size in the status check in the remote base
  • Modify cache_exists (and helpers it calls) in the remote base to deal with sizes.
  • Modify status itself to collect and print sizes.

If I'm not missing anything for this hight level picture we can start with checks and move forward, I'll try to provide more details and help discussing this.

@shcheklein shcheklein mentioned this issue Apr 7, 2020
3 tasks
@shcheklein shcheklein added p2-medium Medium priority, should be done, but less important and removed good first issue p3-nice-to-have It should be done this or next sprint labels Aug 26, 2020
@h3ndrk
Copy link

h3ndrk commented Aug 26, 2020

@shcheklein We are using an SSH remote and would like to see the sizes in advance. I've not enough knowledge of the code but on a high level for SSH it might be possible to determine the file sizes on the remote in the cache?

@shcheklein
Copy link
Member

@h3ndrk yes, it should be possible to implement this with SSH. In fact it should be pretty similar to other remote types on the code level, some very specific details might differ (actual command to get a file size is different on SSH and S3, for example).

Thanks for mentioning this. Just for the context, we had discussion on Discord regarding this:

https://discordapp.com/channels/485586884165107732/563406153334128681/748050741989474364

@efiop
Copy link
Member

efiop commented Oct 8, 2021

Closing in favor of #4682

status is migrating to object-based workflow, so we might introduce some info like that as well later.

@efiop efiop closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

5 participants