Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioformats2raw export: unify Zarr layout and add OMERO metadata #76

Merged
merged 5 commits into from
Sep 3, 2021

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Aug 18, 2021

Follow-up of #75, this PR:

  • unifies the layout of the generated Zarr to comply with the format of omero export Image:<id>. The image series is passed to bioformats2raw and the unique image group (<fileset>/0) is renamed as <id>.zarr
  • separates the multiscale metadata addition from the omero metadata addition in raw_pixels
  • adds the omero and creator metadata to the Zarr generated by bioformats2raw

The add_omero_metadata and add_toplevel_metadata are arguably outside the scope of the raw_pixels module and could be moved elsewhere (utils? new module?)

- pass --series argument using image.series
- pass --no-root-group and --no-ome-meta
- rename exported image folder as image_id.zarr
Pass -profile black to isort pre-commit to solve incompatibility when committing
abs_path = Path("/") / Path(p)
else:
if self.client is None:
raise Exception("This cannot happen") # mypy is confused
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens if the field is ever assigned with a None like in

self.client = None # type: ignore

I don't know if changing that to del self.client keeps mypy and everyone else happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is still confusing to me. I don't understand why mypy does not throw the same error with self.gateway which has the same behavior as self.client

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's certainly an element of magic involved, at least partially related to finallys and other if checks.

@sbesson
Copy link
Member Author

sbesson commented Aug 18, 2021

Initial set of sample files generated with and without --bf and uploaded to a temporary public bucket for comparison

Image ID Dimensions (XYZCT) omero zarr export omero zarr export --bf
13422206 256 x 256 x735 x 3 x 1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/13422206.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/13422206.zarr
3491626 2048 x 2048 x 1 x 5 x 20 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/3491626.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/3491626.zarr
8343617 3540 x 4491 x 2977 x 1 x1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/8343617.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/8343617.zarr
13383974 3000 x 3000 x 1 x 3 x1 https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/omero/13383974.zarr https://hms-dbmi.github.io/vizarr?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/13383974.zarr

From the performance perspective

(zarr) [sbesson@pilot-zarr1-dev data]$  time omero zarr --output /data/omero-cli-zarr_76/omero/ export Image:8343617
Previous session expired for public on idr.openmicroscopy.org:4064
Server: [idr.openmicroscopy.org:4064]
Username: [public]
Password:
Created session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
Exporting to /data/omero-cli-zarr_76/omero/8343617.zarr (0.3)
Finished.

real    58m52.531s
user    42m19.562s
sys     5m45.421s
(zarr) [sbesson@pilot-zarr1-dev data]$  time omero zarr --output /data/omero-cli-zarr_76/bf/ export --bf Image:8343617
Previous session expired for public on idr.openmicroscopy.org:4064
Server: [idr.openmicroscopy.org:4064]
Username: [public]
Password:
Created session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp5020382743495610090/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/omero-cli-zarr_76/bf/8343617.zarr

real    29m38.845s
user    55m35.191s
sys     2m17.729s

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor thoughts but no real worries.

@@ -294,7 +308,21 @@ def _bf_export(self, abs_path: Path, args: argparse.Namespace) -> None:
if stderr:
self.ctx.err(stderr.decode("utf-8"))
if process.returncode == 0:
self.ctx.out(f"Image exported to {target.resolve()}")
image_source = target / "0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always 0 despite the series index? Interesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes --series x,y is removing the series from the root metadata object and hence reindexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate, it means that you can't then make use of OME-XML if enabled.

)
root = open_group(store)
add_omero_metadata(root, image)
add_toplevel_metadata(root)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just sets _creator. Is it almost not more appropriately bf2raw, or a mix?

Copy link
Member Author

@sbesson sbesson Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a mix. There is some Bio-Formats metadata under multiscales/metadata but more could be captured.
Happy to look into capturing that but I am a bit worried you'll ask me to solve #48 ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to, ... 😉

@will-moore
Copy link
Member

This worked once I'd updated to the 0.3.0 release of bioformats2raw.

One difference I noticed was in the number of pyramid levels exported is different.
For a 512 x 512 image, bf exported 2 levels whereas omero exported 4 levels, down to 64 x 64.

Known difference: --bf exports v0.2 whereas omero export uses v0.3 (with axes)

@sbesson
Copy link
Member Author

sbesson commented Aug 19, 2021

This worked once I'd updated to the 0.3.0 release of bioformats2raw.

Thanks. 0.3.0 is absolutely a requirement for this work as this PR uses some of the new options. I'll make this clear in the help.

One difference I noticed was in the number of pyramid levels exported is different.
For a 512 x 512 image, bf exported 2 levels whereas omero exported 4 levels, down to 64 x 64.

Yes that's one of the implementation differences which comes from different defaults in the maximal size for the smallest resolution: 96 for omero-cli-zarr

vs 256 fo bioformats2raw - https://github.com/glencoesoftware/bioformats2raw/blob/4114f1ef8340317df67d8940151f3b7a0159a5a3/src/main/java/com/glencoesoftware/bioformats2raw/Converter.java#L105.

--bf exports v0.2 whereas omero export uses v0.3 (with axes)

Yes that's captured as glencoesoftware/bioformats2raw#113

)
export.add_argument(
"--max_workers", default=None, help="For use with bioformats2raw"
"--max_workers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specific to this PR, but I could imagine having a method for setting arbitrary bioformats2raw properties:

--bf:max_workers=

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. Could even go as far as --bf2raw-config=<config_file> with an INI style of YAML list of key/value pairs to be passed to bioformats2raw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, if you're going to go that far, I'd take a config file for all of the properties here, and separate into [bioformats2raw] and [default] (or [omero]) ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we turn this into an issue?

@sbesson
Copy link
Member Author

sbesson commented Aug 23, 2021

Another large scale example of usage of this PR. Trying to convert the >1TB lightsheet dataset from McDole et al (https://doi.org/10.17867/10000116):

(zarr) [sbesson@pilot-zarr1-dev ~]$ time omero zarr --output /data/omero-cli-zarr_76/bf/ export --bf Image:4007801 --max_workers 16
Using session for public@idr.openmicroscopy.org:4064. Idle timeout: 10 min. Current group: Public
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp6140859124054326326/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

Image exported to /data/omero-cli-zarr_76/bf/4007801.zarr

real    1934m27.700s
user    22426m47.134s
sys     452m8.650s

(zarr) [sbesson@pilot-zarr1-dev ~]$ time aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 cp --recursive /data/omero-cli-zarr_76/bf/4007801.zarr/ s3://omero-cli-zarr_76/bf/4007801.zarr/
...
real    1510m25.233s
user    666m44.302s

So after 2.5 days of processing + S3 upload, the data can be viewed from

https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007801.zarr
https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007802.zarr
https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007803.zarr
https://hms-dbmi.github.io/vizarr/?source=https://uk1s3.embassy.ebi.ac.uk/omero-cli-zarr_76/bf/4007804.zarr

@sbesson
Copy link
Member Author

sbesson commented Aug 27, 2021

From the outstanding points of this discussion, the command-line overhaul has been turned into an issue. And glencoesoftware/bioformats2raw#114 should allow to align the number of resolutions generated with/without --bf in the near future.

Any objections to getting this merged @joshmoore @will-moore ? I would propose a release of the plugin with bioformat2raw 0.3.0 support and start capturing the next items to review as issues.

@joshmoore
Copy link
Member

joshmoore commented Aug 27, 2021

SGTM 👍

@joshmoore joshmoore merged commit 14e1a3c into ome:master Sep 3, 2021
@sbesson sbesson deleted the bf2raw_layout branch September 3, 2021 06:49
@sbesson sbesson mentioned this pull request Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants