Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callbacks: Implement fsspec.callbacks #675

Merged
merged 21 commits into from
Jul 2, 2021

Conversation

isidentical
Copy link
Member

Resolves #668

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, a real review this time, so that we can move forward with this, which is certainly a good thing and improvement.

I find the API of the callback objects confusing. I think it makes sense to have concrete methods:

  • set_size(func, args). Note that the length of a list/set/bytes is always fast, so I wonder where being lazy helps
  • relative_update(val)
  • absolute_update(val)
  • normalise_path(path)
  • branch(path1, path2) ; I don't think we need the function too.
    rather than having to look up string values.

I assume you already have a tqdm callback in your code - can it be included? It should be possible to set the default callback to that instead of no-op. Or at least we should provide a dot-printing callback. If so, we need to try with all backends, to make sure we don't pass callback= to methods that don't expect it.

fsspec/implementations/reference.py Show resolved Hide resolved
fsspec/implementations/local.py Outdated Show resolved Hide resolved
fsspec/implementations/sftp.py Outdated Show resolved Hide resolved
fsspec/spec.py Outdated Show resolved Hide resolved
fsspec/spec.py Outdated Show resolved Hide resolved
fsspec/tests/test_async.py Show resolved Hide resolved
@isidentical
Copy link
Member Author

I find the API of the callback objects confusing. I think it makes sense to have concrete methods:

Being able to delegate other objects' methods into a self-contained object is something that I find useful, especially considering the objective here (being able to support multiple plug in and out different implementations). I can observe that using strings when calling is a bit weird, and error-prone (typos) (but it is very unlikely that you will miss out this, since callbacks tend to be very interacted with the overall UI) but I'd prefer them over restricted methods in fsspec itself (since callbacks might vary depending on the filesystem).

set_size(func, args). Note that the length of a list/set/bytes is always fast, so I wonder where being lazy helps

For example you might want to make a set_size call on a cloud provider where you don't have immediate access to the object details, so you would waste an API call even if the hook is not set (the same goes to making a stat call). I'd agree on len() is cheap compared to others (e.g a stat call).

branch(path1, path2) ; I don't think we need the function too.

Would you mind expanding this further? branch() is very useful, and very simplifying for certain codepaths (e.g put/get) but is not a callback method, just a utility.

I assume you already have a tqdm callback in your code - can it be included? It should be possible to set the default callback to that instead of no-op. Or at least we should provide a dot-printing callback. If so, we need to try with all backends, to make sure we don't pass callback= to methods that don't expect it.

I don't think fsspec should display a progress bar by default. It would break a lot of assumptions about the side effects of those methods.

@martindurant
Copy link
Member

Quick answers:

Would you mind expanding this further? branch() is very useful, and very simplifying for certain codepaths

It feels like a method, since it introspects the callbacks attributes and compares to a specific value of the callback

I don't think fsspec should display a progress bar by default.

Sorry, that's not what I suggested; I thought we should provide the no-op callback and at least one more. The user should be able to specify which callback is the default using the config or even a function. They would now have to set the global module attribute.

@isidentical
Copy link
Member Author

It feels like a method, since it introspects the callbacks attributes and compares to a specific value of the callback

But it is not a callback by itself (as well as normalize_path), so I feel like they should be separated from the callback object. They are simply convenience utilities for filesystem developers.

Sorry, that's not what I suggested; I thought we should provide the no-op callback and at least one more. The user should be able to specify which callback is the default using the config or even a function. They would now have to set the global module attribute.

Right, it is a private variable now but we might make it part of the filesystem config to make it accessible by the users (maybe not on this PR directly, since it is huge as is).

@isidentical
Copy link
Member Author

@martindurant I've tried to address all the general points, let me know if there are any more comments (or directions that you want this patch take). I'd really love to pass this before 2021.7.0.

@martindurant
Copy link
Member

I think this is ready (except for the small conflict that's come up), I'll give it a once-over tomorrow, likely will merge.

@martindurant
Copy link
Member

So, I'm going to merge this now, as it doesn't have downsides for anyone. However, I think it needs a lot of documentation and examples for usage. How do we actually get a TDQM set of progress bars from this?

I also pushed alternative code for you to consider, along the lines of my previous comments. To my (possibly old-fashioned brain), my version seems a lot easier to understand.

@martindurant martindurant merged commit 47f72bc into fsspec:master Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[IDEA] Implement progress callbacks
2 participants