Initial OME Zarr HCS implementation #69

melissalinkert · 2020-11-13T16:22:33Z

Automatically uses HCS layout if a plate is present in the reader's MetadataStore, unless the --no-hcs option is used.

--no-hcs falls back to the original ```series/resolution``` group name format.

...and update the converter to make them pass.

In particular, removes leading plate index.

sbesson · 2020-11-26T20:28:40Z

src/main/java/com/glencoesoftware/bioformats2raw/Converter.java

+    plateMap.put("name", meta.getPlateName(plate));
+
+    List<Map<String, Object>> columns = new ArrayList<Map<String, Object>>();
+    for (int c=0; c<meta.getPlateColumns(plate).getValue(); c++) {


After 30min of conversion of a plate, the call ended with a java.lang.NullPointerException while trying to write the HCS metadata. Now I understand better why ome/bioformats#3638 is required :)

Two questions:

does this mean progress is blocked on having Populate plate row and column dimensions ome/bioformats#3638 available in an mainline release?

or do we need to implement some fallback anyways for the readers where this metadata might be lacking?

sbesson · 2020-12-08T14:18:41Z

Tested the last commit with the IDR Bio-Formats JAR and --file_type zarr against https://idr.openmicroscopy.org/webclient/?show=plate-2551. Conversion successfully completed in ~45min and generated a structure very close to the samples published and announced in https://blog.openmicroscopy.org/file-formats/community/2020/12/01/zarr-hcs/.

A few comments:

the first two levels under the top-level data.zarr are now the rows and columns and contain 8 elements and 12 elements respectively as expected
this representation uses 0-based indexing for all the groups (vs letter-based indexing and 1-based indexing for the samples generated by OME-Zarr)
as per the above, a next step will be to test these generated datasets against some of the clients we have used for driving the HCS spec - this might lead us to reviewing some assumptions
are the more usual A-Z and 1-12 names for wells/stored stored anywhere in the OME metadata or is it an OMERO concept?

each well contains 12 images (unexpected) but only 6 are referenced in the .zattrs (as expected)

(base) [sbesson@pilot-zarr1-dev 2551.zarr]$ less data.zarr/0/0/
0/       1/       10/      11/      2/       3/       4/       5/       6/       7/       8/       9/       .zattrs  .zgroup
(base) [sbesson@pilot-zarr1-dev 2551.zarr]$ cat data.zarr/0/0/.zattrs 
{"well":{"images":[{"path":0,"acquisition":"0"},{"path":0,"acquisition":"0"},{"path":1,"acquisition":"1"},{"path":1,"acquisition":"1"},{"path":2,"acquisition":"2"},{"path":2,"acquisition":"2"},{"path":3,"acquisition":"3"},{"path":3,"acquisition":"3"},{"path":4,"acquisition":"4"},{"path":4,"acquisition":"4"},{"path":5,"acquisition":"5"},{"path":5,"acquisition":"5"}]}}(base) [sbesson@pilot-zarr1-dev 2551.zarr]$

more generally, as I am trying to diff JSON files with different formatting conventions, any immediate suggestion of the most appropriate tool?

melissalinkert · 2020-12-08T14:58:45Z

are the more usual A-Z and 1-12 names for wells/stored stored anywhere in the OME metadata or is it an OMERO concept?

The metadata contains 0-based indexes and optional RowNamingConvention and ColumnNamingConvention on Plate: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#NamingConvention. The typical well names would have to be calculated accordingly.

more generally, as I am trying to diff JSON files with different formatting conventions, any immediate suggestion of the most appropriate tool?

Not really, unfortunately. I run the bioformats2raw JSON through a pretty-printer (https://github.com/makamaka/JSON-PP) as needed, but that doesn't completely solve the problem.

melissalinkert · 2020-12-09T16:36:06Z

Which IDR Bio-Formats commit specifically were you using? I am having trouble finding a version that detects 6 fields and plate acquisitions without throwing an exception.

…isition

sbesson · 2020-12-09T16:40:04Z

Sorry, I should have precised, I build bioformats2raw from 998e4cf with the following local modification

(base) [sbesson@pilot-zarr1-dev bioformats2raw]$ git diff
diff --git a/build.gradle b/build.gradle
index 54006ec..8af170e 100644
--- a/build.gradle
+++ b/build.gradle
@@ -26,7 +26,7 @@ repositories {
 }
 
 dependencies {
-    implementation 'ome:formats-gpl:6.5.1'
+    implementation 'idr:formats-gpl:0.6.5'
     implementation 'info.picocli:picocli:4.2.0'
     implementation 'com.univocity:univocity-parsers:2.8.4'
     implementation 'org.janelia.saalfeldlab:n5:2.2.0'

and the conversion command I used was

 sudo docker run --rm -it -u 11615:1030 -v /data:/data -v /nfs:/nfs bf2rawhcs /nfs/bioimage/drop/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01\(2012-07-31_10-41-12\)/001001001.flex /data/idr0001-graml-sysgro/2551.zarr --file_type zarr

melissalinkert · 2020-12-10T04:04:44Z

Ah, OK. That's the same version I was using, but a different dataset (I had https://downloads.openmicroscopy.org/images/Flex/idr0001/).

As far as I can tell by looking at the original data for idr0001-graml-sysgro/20151116-verified/JL_120731_S6A, there should in fact be 12 images per well - 2 fields x 6 plate acquisitions. showinf -nopix -omexml with https://github.com/idr/bioformats/tree/IDR-0.9.0 shows 2 WellSamples for each well in each plate acquisition, and the .flex file names match up with those dimensions. The JSON in #69 (comment) lists 12 images per well, but the paths are wrong for half of them. This is fixed in 48dccc8.

sbesson · 2020-12-10T16:15:51Z

After double checking the data, you are absolutely right the dataset contains multiple fields of view per well for each acquisition

$ ls -alh /nfs/bioimage/drop//idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01\(2012-07-31_10-41-12\)/
total 17G
drwxrwx---. 2 6166 idrnfs  20K May 15  2016 .
drwxrwx---. 8 6166 idrnfs 4.0K May 15  2016 ..
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001001001.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001001002.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001002001.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001002002.flex

Unfortunately this means the dataset is not directly comparable to the data exported from IDR as only the first field of view of each well seems to have been imported. Proposing to discuss at the next Formats meeting the next steps for validating this PR as well as the handling of empty rows/columns.

chris-allan · 2021-01-26T15:00:52Z

@sbesson: We would like to get this in for the forthcoming 0.3.0 release. Is there anything further you'd like to discuss/test here beforehand?

…to hcs

melissalinkert · 2021-01-26T15:26:35Z

Conflicts resolved.

sbesson

@chris-allan thanks for the heads up.

Summarizing briefly, most of my testing has been focused on the use case of plates with multiple acquisitions. This is primarily because this was the scenario we were trying to work in terms of specification. Unfortunately for the reasons discussed above, it is not straightforward to compare the output of bioformats2raw vs omero-cli-zarr.

A potential test might to try and convert the other sample plates discussed in the public OME-Zarr HCS spec blog post using bioformats2raw and compare the output with the public Zarr representations on S3. Given the preliminary testing as well as my current capacity I certainly don't feel a compelling reason to block the upcoming minor release though.

Instead I'd propose to schedule this conversion test using bioformats2raw 0.3.0 and capture issues separately.

chris-allan · 2021-01-27T13:34:08Z

@melissalinkert: This PR is ready to have conflicts resolved prior to merge.

…to hcs

melissalinkert · 2021-01-27T14:48:13Z

Conflicts resolved.

melissalinkert added 4 commits November 11, 2020 17:53

Automatically use HCS spec group format if a plate is present

ef08126

--no-hcs falls back to the original ```series/resolution``` group name format.

Initial plate/well metadata population

6cb98af

Add HCS conversion tests

77a8550

...and update the converter to make them pass.

Test HCS metadata

443abbf

melissalinkert mentioned this pull request Nov 19, 2020

Populate plate row and column dimensions ome/bioformats#3638

Merged

Update HCS layout to match current IDR datasets

796305f

In particular, removes leading plate index.

melissalinkert mentioned this pull request Nov 25, 2020

Initial OME Zarr HCS implementation glencoesoftware/raw2ometiff#45

Merged

sbesson reviewed Nov 26, 2020

View reviewed changes

melissalinkert added 2 commits November 30, 2020 17:47

Don't require Plate.Rows and Plate.Columns to be set

29df734

Update plate acquisition handling and associated test

998e4cf

Fix image path when there are multiple fields and a single plate acqu…

48dccc8

…isition

Merge branch 'master' of github.com:glencoesoftware/bioformats2raw in…

e303bf6

…to hcs

sbesson approved these changes Jan 26, 2021

View reviewed changes

Merge branch 'master' of github.com:glencoesoftware/bioformats2raw in…

c1d458c

…to hcs

chris-allan merged commit 8a597e2 into glencoesoftware:master Jan 27, 2021

sbesson mentioned this pull request Mar 10, 2021

Plate conversion IDR/idr-zarr-tools#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial OME Zarr HCS implementation #69

Initial OME Zarr HCS implementation #69

melissalinkert commented Nov 13, 2020

sbesson Nov 26, 2020

sbesson commented Dec 8, 2020 •

edited

Loading

melissalinkert commented Dec 8, 2020

melissalinkert commented Dec 9, 2020

sbesson commented Dec 9, 2020 •

edited

Loading

melissalinkert commented Dec 10, 2020

sbesson commented Dec 10, 2020

chris-allan commented Jan 26, 2021

melissalinkert commented Jan 26, 2021

sbesson left a comment

chris-allan commented Jan 27, 2021

melissalinkert commented Jan 27, 2021

Initial OME Zarr HCS implementation #69

Initial OME Zarr HCS implementation #69

Conversation

melissalinkert commented Nov 13, 2020

sbesson Nov 26, 2020

Choose a reason for hiding this comment

sbesson commented Dec 8, 2020 • edited Loading

melissalinkert commented Dec 8, 2020

melissalinkert commented Dec 9, 2020

sbesson commented Dec 9, 2020 • edited Loading

melissalinkert commented Dec 10, 2020

sbesson commented Dec 10, 2020

chris-allan commented Jan 26, 2021

melissalinkert commented Jan 26, 2021

sbesson left a comment

Choose a reason for hiding this comment

chris-allan commented Jan 27, 2021

melissalinkert commented Jan 27, 2021

sbesson commented Dec 8, 2020 •

edited

Loading

sbesson commented Dec 9, 2020 •

edited

Loading