Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --bf option for bioformats2raw 0.3.0 #75

Merged
merged 3 commits into from
Aug 18, 2021
Merged

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Aug 12, 2021

The breaking changes of bioformat2raw 0.3.0 has broken omero zarr export --bf. These commits provide the minimal set of changes to restore the command functionality. This currently only support bioformats2raw 0.3.0 and drops support for 0.2.x but see below for the discussion on the flag support:

Tested on pilot-zarr1-dev with omero --debug DEBUG zarr export --bf Image:13422206.

This initial work raises a few high-level questions on the future of this option. As a general rule, most of the support and testing has been added to the OMERO-only export workflow. This means most of the nice recent features do not apply to the --bf option like the rendering setting in the Zarr metadata, the HCS support... Additionally the layout of the OMERO export is different from the bioformats2raw export since the latter works at the fileset level rather than the image level.

I can conceive two general export strategies:

  • either we want to keep a single omero zarr export command with two export path i.e. OMERO API vs bioformat2raw
  • or we want to separate the commands i.e. let consumers invoke bioformats2raw first followed by a second command that would 1- update the layout, 2- enrich the OME-Zarr metadata. This workflow is what https://github.com/IDR/idr-zarr-tools/blob/master/merge.py currently aims at doing. Arguably this could be generalized as a subcommand of this plugin e.g. omero zarr update as the code will be effectively duplicated.

- Remove dropped --file-type option
- Handle arbitrary number of options
- Do not create target directory
@sbesson
Copy link
Member Author

sbesson commented Aug 12, 2021

As an initial data point for discussion, together with IDR/deployment#343 and https://github.com/openmicroscopy/management_tools/pull/1458, I tested this PR against https://idr.openmicroscopy.org/webclient/img_detail/8343617 i.e. a fairly large pixel volume with minimal metadata overhead

(zarr) [sbesson@pilot-zarr1-dev data]$ time omero --debug DEBUG zarr export --bf Image:8343617
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
bioformats2raw /nfs/bioimage/drop/idr0066-voigt-mesospim/20190821-ftp/ExperimentD/data/ExpD_chicken_embryo_stitched.ome.tif /data/ExpD_chicken_embryo_stitched.ome.tif
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp11206441239058845902/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/ExpD_chicken_embryo_stitched.ome.tif

real    25m18.067s
user    53m21.899s
sys     2m22.052s
(zarr) [sbesson@pilot-zarr1-dev data]$ omero login public@idr.openmicroscopy.org -w public
Previous session expired for public on idr.openmicroscopy.org:4064
Created session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
(zarr) [sbesson@pilot-zarr1-dev data]$ time omero --debug DEBUG zarr export Image:8343617
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to 8343617.zarr (0.2)
Finished.

real    59m24.492s
user    42m19.202s
sys     6m6.100s

This gives a two-fold improvement in the conversion speed. Interestingly the generated sizes are different

(base) [sbesson@pilot-zarr1-dev ~]$ du -csh /data/8343617.zarr/
52G	/data/8343617.zarr/
52G	total
(base) [sbesson@pilot-zarr1-dev ~]$ du -csh /data/ExpD_chicken_embryo_stitched.ome.tif/
69G	/data/ExpD_chicken_embryo_stitched.ome.tif/
69G	total

Looking at the internal structure, the most noticeable difference might come from the chunk size. bioformats2raw uses 1024x1014 while omero zarr export seems to use the full plane dimensions by default

(base) [sbesson@pilot-zarr1-dev ~]$ diff -wu /data/ExpD_chicken_embryo_stitched.ome.tif_default/0/0/.zarray /data/8343617.zarr/0/.zarray
--- /data/ExpD_chicken_embryo_stitched.ome.tif_default/0/0/.zarray	2021-08-12 12:39:20.579408467 +0000
+++ /data/8343617.zarr/0/.zarray	2021-08-12 13:19:01.616102479 +0000
@@ -3,17 +3,18 @@
     1,
     1,
     1,
-    1024,
-    1024
+        4491,
+        3540
   ],
   "compressor" : {
-    "clevel" : 5,
     "blocksize" : 0,
-    "shuffle" : 1,
+        "clevel": 5,
     "cname" : "lz4",
-    "id" : "blosc"
+        "id": "blosc",
+        "shuffle": 1
   },
-  "dtype" : ">u2",
+    "dimension_separator": "/",
+    "dtype": "<u2",
   "fill_value" : 0,
   "filters" : null,
   "order" : "C",
@@ -24,6 +25,5 @@
     4491,
     3540
   ],
-  "zarr_format" : 2,
-  "dimension_separator" : "/"
+    "zarr_format": 2
 }
\ No newline at end of file

@sbesson
Copy link
Member Author

sbesson commented Aug 12, 2021

Adjusting the export command to use the same chunk size reduces the execution time further

(zarr) [sbesson@pilot-zarr1-dev data]$ time omero --debug DEBUG zarr export --bf --tile_width=3540 --tile_height=4491 Image:8343617
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
bioformats2raw /nfs/bioimage/drop/idr0066-voigt-mesospim/20190821-ftp/ExperimentD/data/ExpD_chicken_embryo_stitched.ome.tif /data/ExpD_chicken_embryo_stitched.ome.tif --tile_width=3540 --tile_height=4491
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp10586280222332712237/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/ExpD_chicken_embryo_stitched.ome.tif

real    16m29.448s
user    44m52.741s
sys     2m45.422s

Interestingly, the dataset size is unchanged (70G vs 52G).

(zarr) [sbesson@pilot-zarr1-dev data]$ diff -urw 8343617.zarr/0/.zarray ExpD_chicken_embryo_stitched.ome.tif/0/0/.zarray 
--- 8343617.zarr/0/.zarray	2021-08-12 13:19:01.616102479 +0000
+++ ExpD_chicken_embryo_stitched.ome.tif/0/0/.zarray	2021-08-12 15:58:58.383689254 +0000
@@ -7,14 +7,13 @@
         3540
     ],
     "compressor": {
-        "blocksize": 0,
         "clevel": 5,
+    "blocksize" : 0,
+    "shuffle" : 1,
         "cname": "lz4",
-        "id": "blosc",
-        "shuffle": 1
+    "id" : "blosc"
     },
-    "dimension_separator": "/",
-    "dtype": "<u2",
+  "dtype" : ">u2",
     "fill_value": 0,
     "filters": null,
     "order": "C",
@@ -25,5 +24,6 @@
         4491,
         3540
     ],
-    "zarr_format": 2
+  "zarr_format" : 2,
+  "dimension_separator" : "/"
 }
\ No newline at end of file

Possibly a difference in compression efficiency between the Java and Python implementations?

@joshmoore joshmoore added this to In progress in OME-NGFF v0.3 (axes) Aug 14, 2021
Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving out some of the wider issues for a moment, the 0.3-compatible changes as they stand all look good.

self.ctx.err(stderr)
else:
self.ctx.out(f"Image exported to {target.resolve()}")
self.ctx.err(stderr.decode("utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remove of the err printing is because the output of bioformats2raw is now so much more limited?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the problem there is always some output sent to stderr by the utility (the standard OpenJDK 64-Bit Server VM warning: You have loaded library ...) so the Image exported... message would be unconditionally skipped.

I will look into re-instantiating a conditional behavior based on process.returncode.

OME-NGFF v0.3 (axes) automation moved this from In progress to Reviewer approved Aug 17, 2021
@sbesson
Copy link
Member Author

sbesson commented Aug 17, 2021

Tested the last commit with successive exports:

(zarr) [sbesson@pilot-zarr1-dev data]$ omero zarr export --bf Image:9842129
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp15643632484399810721/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/2017-07-24-Brain-01_01_R3D.dv
(zarr) [sbesson@pilot-zarr1-dev data]$ omero zarr export --bf Image:9842129
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp5507427887558727498/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Exception in thread "main" picocli.CommandLine$ExecutionException: Error while calling command (com.glencoesoftware.bioformats2raw.Converter@5e2c3d18): java.lang.IllegalArgumentException: Output path /data/2017-07-24-Brain-01_01_R3D.dv already exists
	at picocli.CommandLine.executeUserObject(CommandLine.java:1962)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
	at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
	at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
	at picocli.CommandLine.call(CommandLine.java:2761)
	at com.glencoesoftware.bioformats2raw.Converter.main(Converter.java:1756)
Caused by: java.lang.IllegalArgumentException: Output path /data/2017-07-24-Brain-01_01_R3D.dv already exists
	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:451)
	at com.glencoesoftware.bioformats2raw.Converter.call(Converter.java:92)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	... 9 more

(zarr) [sbesson@pilot-zarr1-dev data]$ rm -rf 2017-07-24-Brain-01_01_R3D.dv
(zarr) [sbesson@pilot-zarr1-dev data]$ omero zarr export --bf Image:9842129
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp4557707144645268562/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/2017-07-24-Brain-01_01_R3D.dv

@joshmoore
Copy link
Member

👍

@sbesson
Copy link
Member Author

sbesson commented Aug 17, 2021

From a quick discussion with @joshmoore, I would propose to get this merged and look into the unification of the Zarr layouts as the next step using bioformats2raw and the series index.

Given the --series feature is only available in bioformats2raw 0.3.0, I suspect there will soon be almost no value in maintaining backwards compatibility with older versions of bioformats2raw and we should only support bioformats2raw 0.3 or later.

@sbesson sbesson merged commit aab16a6 into ome:master Aug 18, 2021
OME-NGFF v0.3 (axes) automation moved this from Reviewer approved to Done Aug 18, 2021
@sbesson sbesson deleted the bf2raw_0.3.0 branch August 18, 2021 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants