Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gs: support progress callback #1566

Closed
mhham opened this issue Feb 1, 2019 · 13 comments · Fixed by #2809
Closed

gs: support progress callback #1566

mhham opened this issue Feb 1, 2019 · 13 comments · Fixed by #2809
Assignees
Labels
enhancement Enhances DVC good first issue help wanted ui user interface / interaction

Comments

@mhham
Copy link
Contributor

mhham commented Feb 1, 2019

I may have missed an issue concerning this, but when I am dvc pushing to a google cloud remote, the progress bars that are currently displayed don't get updated progressively (They go from 0 when starting to 100 when finished).

It would be nice to have a dynamic progressbar with upload speed and remaining time estimation, especially when push huge sets of data.

@efiop
Copy link
Contributor

efiop commented Feb 1, 2019

Hi @mhham !

Indeed, callback for google cloud storate push/pull is not implemented yet. Let's keep this issue open as a reminder. Thank you for the feedback! 🙂

@efiop efiop changed the title Progressbar (with speed?) when using dvc pull or dvc push with cloud remote gs: support progress callback Feb 25, 2019
@casperdcl casperdcl added the ui user interface / interaction label Jul 28, 2019
@casperdcl casperdcl self-assigned this Jul 28, 2019
@casperdcl
Copy link
Contributor

casperdcl commented Jul 28, 2019

Problem is there's no current API.

google.cloud.storage.bucket.Bucket.:

Possibly could work around this by using upload_from_file and download_to_file with our own custom file objects which have progress for read() and write()

@ghost
Copy link

ghost commented Nov 13, 2019

Looks good, let me look at it.

@ghost
Copy link

ghost commented Nov 17, 2019

Let's focus on the downloading first.. .

with our own custom file objects

What do you mean @casperdcl ? Could you provide a link to that? I only found this

dvc/dvc/api.py

Line 56 in 38c2100

def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):

which doesn't seems to be helpful.

I think the more appropriate approach would be to use this google-resumable-media as it's recommended

For more fine-grained control over the download process, check out google-resumable-media. For example, this library allows downloading parts of a blob rather than the whole thing.

here.

I will check if using download_to_file with start & end gives the similar result to google-resumable-media. If so, we could avoid adding a new dependency.

@ghost
Copy link

ghost commented Nov 17, 2019

.. with our own custom file objects which have progress for read() and write()

Ok, got it. I think @casperdcl meant the custom file object is to be written as a part of this task.

@casperdcl
Copy link
Contributor

Based on the resumable media link it seems like the proposed download_to_file approach would be just as complicated to implement really. If you have issues though feel free to open an incomplete PR and I can take a look.

@casperdcl
Copy link
Contributor

My main problem is testing - @efiop how would you recommended setting up local testing for google cloud storage push/pull?

@efiop
Copy link
Contributor

efiop commented Nov 17, 2019

@casperdcl I'm not aware of any local implementation that you could run on your machine 🙁 So far I was just using real GS to test things out.

@ghost
Copy link

ghost commented Nov 17, 2019

@casperdcl in meantime i started to play a bit with your concept of custom file object and did this very limited PoC (since it's PoC I used a simple file in the root of the dir & unittests)
https://github.com/iterative/dvc/compare/master...xliiv:progress?expand=1

It's for StringIO, so some work needed to adjust it to our needs, but looks promising.

Maybe tqdm should expose objects like the ProgressStringIO?

@efiop
Copy link
Contributor

efiop commented Nov 17, 2019

@xliiv A side note: Please use pytest-style tests for new tests. We do have some unittest tests as a legacy, but all new ones are written with pytest.

Also, why StringIO and not BytestIO?

@ghost
Copy link

ghost commented Nov 17, 2019

@efiop sure, i'll be using pytest.

Also, why StringIO and not BytestIO?

It is PoC and I felt it would be simpler to use StringIO. I'll convert it to BytesIO.

@casperdcl
Copy link
Contributor

casperdcl commented Nov 17, 2019

So far I was just using real GS to test things out.

Ok @efiop what's the easiest way to set up real GS? I mean example scripts to

  1. add a remote (ideally one which already exists and doesn't need credentials if possible)
  2. push
  3. pull
  4. repeat steps 2 & 3 each time a code change is made while fixing this issue

@casperdcl casperdcl self-assigned this Nov 17, 2019
casperdcl added a commit to casperdcl/dvc that referenced this issue Nov 17, 2019
casperdcl added a commit to casperdcl/dvc that referenced this issue Nov 17, 2019
@efiop
Copy link
Contributor

efiop commented Nov 17, 2019

@casperdcl The easies (and the only way afaik) is to go to https://cloud.google.com/storage/ , register, create a project and use it :) That will surely take more time than writing this patch though, so could totally try it out for you, as the patch is simple enough anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC good first issue help wanted ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants