-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize zarr multiscale writing based on #237 #257
Conversation
Thanks for this contribution. Could you show how you're using this? Is it only used for using Also, it would be great to have this parameter tested - I realise that we don't have ome-zarr-py/tests/test_writer.py Line 149 in 2c4d489
A test similar to that which tested the Would it make sense to add this same parameter to |
Hello @will-moore , thanks for the prompt response. Yes, I can give you some examples in how we're using this feature to write datasets in our team. Example 1 and example 2. For both examples, we are currently using In terms to the testing, I agree, it is totally necessary to test this parameter. As I pointed in this comment, we're using Lastly, I believe it makes sense to add this parameter to the |
) | ||
|
||
if not len(dask_delayed_jobs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe before this line it makes sense to assert compute == (len(dask_delayed_jobs) > 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I tried to follow dask convention. in the compute flag. Therefore, when compute is False
we return the delayed jobs and assert not compute == len(dask_delayed_jobs)
.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #257 +/- ##
==========================================
+ Coverage 84.79% 84.84% +0.05%
==========================================
Files 13 13
Lines 1473 1485 +12
==========================================
+ Hits 1249 1260 +11
- Misses 224 225 +1
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. That's all from me 👍
Hello! @will-moore I just wanted to follow up with this PR. Do I have to do anything else to get the branch merge with main? |
Hi - apologies for the delay - I have assigned this to be included in the next release, but I wanted to get feedback from others on the OME team (some of us are away just now - Easter etc). But not forgotten! - cc @joshmoore @sbesson. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a preamble, I am not actively using the ome-zarr Python library and have even less input on the dask functionality. I am unsure if the new maintenance responsibilities have been redefined for this repository. If not, I would suggest a member of the academically-funded OME team should add his/her review.
I don't have much to add to this PR beyond the fact that the proposed API addition feels simple enough. Only suggestion is that the new return value for each of the write
method might be worth communicating in the docstring (and hence the generated readthedocs). In particular, my understand is that compute
will be a no-op and the return list will be []
if the image is a simple numpy array, is that correct? If so, it might be worth clarifying the expectation for consumers.
Thanks for this, @camilolaiton. Definitely like the idea and it seems straight-forward enough. The only question I have is whether or not there is any duck-typing (and/or wrapping) to be considered on the return type to improve the future-proofing. It's likely not a particularly critical issue at this point so I wouldn't want us to get into over-engineering, but if there is a common idiom out there (e.g. from https://data-apis.org/) then I could see adding that here. Alternatively, we (continue to) go all in on Dask as part of the public API. Does anyone have any thoughts? |
I totally agree. I'll add this to the docstring.
Yes, I get your point. However, I think it's not entirely necessary to add a wrapper for the return on this functionality, as it would probably make things a little more complicated than they should be. I think going all in on Dask is better at this point for the public API. |
Resolved conflicts and build now running. |
Everything is green. If there are no other comments, I'd propose to get this shipped as 0.7.0. We may eventually re-consider the all-in-on-dask decision, but introducing a new interface at this stage seems tricky at best. |
Taking that as confirmation and releasing. Thanks again, @camilolaiton! |
Note that this might have broken the conda-forge build:
see conda-forge/ome-zarr-feedstock#12 for the failing build |
Please, consider this PR as a possible solution to #237. It is more convenient to let the user decide how to compute the dask delayed objects. This is certainly useful/optimal when we are working with large image datasets.