-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix --bf option for bioformats2raw 0.3.0 #75
Conversation
- Remove dropped --file-type option - Handle arbitrary number of options - Do not create target directory
As an initial data point for discussion, together with IDR/deployment#343 and https://github.com/openmicroscopy/management_tools/pull/1458, I tested this PR against https://idr.openmicroscopy.org/webclient/img_detail/8343617 i.e. a fairly large pixel volume with minimal metadata overhead
This gives a two-fold improvement in the conversion speed. Interestingly the generated sizes are different
Looking at the internal structure, the most noticeable difference might come from the chunk size. (base) [sbesson@pilot-zarr1-dev ~]$ diff -wu /data/ExpD_chicken_embryo_stitched.ome.tif_default/0/0/.zarray /data/8343617.zarr/0/.zarray
--- /data/ExpD_chicken_embryo_stitched.ome.tif_default/0/0/.zarray 2021-08-12 12:39:20.579408467 +0000
+++ /data/8343617.zarr/0/.zarray 2021-08-12 13:19:01.616102479 +0000
@@ -3,17 +3,18 @@
1,
1,
1,
- 1024,
- 1024
+ 4491,
+ 3540
],
"compressor" : {
- "clevel" : 5,
"blocksize" : 0,
- "shuffle" : 1,
+ "clevel": 5,
"cname" : "lz4",
- "id" : "blosc"
+ "id": "blosc",
+ "shuffle": 1
},
- "dtype" : ">u2",
+ "dimension_separator": "/",
+ "dtype": "<u2",
"fill_value" : 0,
"filters" : null,
"order" : "C",
@@ -24,6 +25,5 @@
4491,
3540
],
- "zarr_format" : 2,
- "dimension_separator" : "/"
+ "zarr_format": 2
}
\ No newline at end of file |
Adjusting the export command to use the same chunk size reduces the execution time further
Interestingly, the dataset size is unchanged (70G vs 52G).
Possibly a difference in compression efficiency between the Java and Python implementations? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving out some of the wider issues for a moment, the 0.3-compatible changes as they stand all look good.
self.ctx.err(stderr) | ||
else: | ||
self.ctx.out(f"Image exported to {target.resolve()}") | ||
self.ctx.err(stderr.decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remove of the err printing is because the output of bioformats2raw
is now so much more limited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the problem there is always some output sent to stderr
by the utility (the standard OpenJDK 64-Bit Server VM warning: You have loaded library
...) so the Image exported...
message would be unconditionally skipped.
I will look into re-instantiating a conditional behavior based on process.returncode
.
Tested the last commit with successive exports:
|
👍 |
From a quick discussion with @joshmoore, I would propose to get this merged and look into the unification of the Zarr layouts as the next step using Given the |
The breaking changes of bioformat2raw 0.3.0 has broken
omero zarr export --bf
. These commits provide the minimal set of changes to restore the command functionality. This currently only support bioformats2raw 0.3.0 and drops support for 0.2.x but see below for the discussion on the flag support:Tested on
pilot-zarr1-dev
withomero --debug DEBUG zarr export --bf Image:13422206
.This initial work raises a few high-level questions on the future of this option. As a general rule, most of the support and testing has been added to the OMERO-only export workflow. This means most of the nice recent features do not apply to the
--bf
option like the rendering setting in the Zarr metadata, the HCS support... Additionally the layout of the OMERO export is different from thebioformats2raw
export since the latter works at the fileset level rather than the image level.I can conceive two general export strategies:
omero zarr export
command with two export path i.e. OMERO API vs bioformat2rawbioformats2raw
first followed by a second command that would 1- update the layout, 2- enrich the OME-Zarr metadata. This workflow is what https://github.com/IDR/idr-zarr-tools/blob/master/merge.py currently aims at doing. Arguably this could be generalized as a subcommand of this plugin e.g.omero zarr update
as the code will be effectively duplicated.