Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial OME Zarr HCS implementation #69

Merged
merged 10 commits into from
Jan 27, 2021

Conversation

melissalinkert
Copy link
Member

See ome/omero-ms-zarr#75

Automatically uses HCS layout if a plate is present in the reader's MetadataStore, unless the --no-hcs option is used.

--no-hcs falls back to the original ```series/resolution``` group name format.
...and update the converter to make them pass.
In particular, removes leading plate index.
plateMap.put("name", meta.getPlateName(plate));

List<Map<String, Object>> columns = new ArrayList<Map<String, Object>>();
for (int c=0; c<meta.getPlateColumns(plate).getValue(); c++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 30min of conversion of a plate, the call ended with a java.lang.NullPointerException while trying to write the HCS metadata. Now I understand better why ome/bioformats#3638 is required :)

Two questions:

@sbesson
Copy link
Member

sbesson commented Dec 8, 2020

Tested the last commit with the IDR Bio-Formats JAR and --file_type zarr against https://idr.openmicroscopy.org/webclient/?show=plate-2551. Conversion successfully completed in ~45min and generated a structure very close to the samples published and announced in https://blog.openmicroscopy.org/file-formats/community/2020/12/01/zarr-hcs/.

A few comments:

  • the first two levels under the top-level data.zarr are now the rows and columns and contain 8 elements and 12 elements respectively as expected

  • this representation uses 0-based indexing for all the groups (vs letter-based indexing and 1-based indexing for the samples generated by OME-Zarr)

  • as per the above, a next step will be to test these generated datasets against some of the clients we have used for driving the HCS spec - this might lead us to reviewing some assumptions

  • are the more usual A-Z and 1-12 names for wells/stored stored anywhere in the OME metadata or is it an OMERO concept?

  • each well contains 12 images (unexpected) but only 6 are referenced in the .zattrs (as expected)

    (base) [sbesson@pilot-zarr1-dev 2551.zarr]$ less data.zarr/0/0/
    0/       1/       10/      11/      2/       3/       4/       5/       6/       7/       8/       9/       .zattrs  .zgroup
    (base) [sbesson@pilot-zarr1-dev 2551.zarr]$ cat data.zarr/0/0/.zattrs 
    {"well":{"images":[{"path":0,"acquisition":"0"},{"path":0,"acquisition":"0"},{"path":1,"acquisition":"1"},{"path":1,"acquisition":"1"},{"path":2,"acquisition":"2"},{"path":2,"acquisition":"2"},{"path":3,"acquisition":"3"},{"path":3,"acquisition":"3"},{"path":4,"acquisition":"4"},{"path":4,"acquisition":"4"},{"path":5,"acquisition":"5"},{"path":5,"acquisition":"5"}]}}(base) [sbesson@pilot-zarr1-dev 2551.zarr]$ 
    
  • more generally, as I am trying to diff JSON files with different formatting conventions, any immediate suggestion of the most appropriate tool?

@melissalinkert
Copy link
Member Author

are the more usual A-Z and 1-12 names for wells/stored stored anywhere in the OME metadata or is it an OMERO concept?

The metadata contains 0-based indexes and optional RowNamingConvention and ColumnNamingConvention on Plate: https://www.openmicroscopy.org/Schemas/Documentation/Generated/OME-2016-06/ome_xsd.html#NamingConvention. The typical well names would have to be calculated accordingly.

more generally, as I am trying to diff JSON files with different formatting conventions, any immediate suggestion of the most appropriate tool?

Not really, unfortunately. I run the bioformats2raw JSON through a pretty-printer (https://github.com/makamaka/JSON-PP) as needed, but that doesn't completely solve the problem.

@melissalinkert
Copy link
Member Author

Which IDR Bio-Formats commit specifically were you using? I am having trouble finding a version that detects 6 fields and plate acquisitions without throwing an exception.

@sbesson
Copy link
Member

sbesson commented Dec 9, 2020

Sorry, I should have precised, I build bioformats2raw from 998e4cf with the following local modification

(base) [sbesson@pilot-zarr1-dev bioformats2raw]$ git diff
diff --git a/build.gradle b/build.gradle
index 54006ec..8af170e 100644
--- a/build.gradle
+++ b/build.gradle
@@ -26,7 +26,7 @@ repositories {
 }
 
 dependencies {
-    implementation 'ome:formats-gpl:6.5.1'
+    implementation 'idr:formats-gpl:0.6.5'
     implementation 'info.picocli:picocli:4.2.0'
     implementation 'com.univocity:univocity-parsers:2.8.4'
     implementation 'org.janelia.saalfeldlab:n5:2.2.0'

and the conversion command I used was

 sudo docker run --rm -it -u 11615:1030 -v /data:/data -v /nfs:/nfs bf2rawhcs /nfs/bioimage/drop/idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01\(2012-07-31_10-41-12\)/001001001.flex /data/idr0001-graml-sysgro/2551.zarr --file_type zarr

@melissalinkert
Copy link
Member Author

Ah, OK. That's the same version I was using, but a different dataset (I had https://downloads.openmicroscopy.org/images/Flex/idr0001/).

As far as I can tell by looking at the original data for idr0001-graml-sysgro/20151116-verified/JL_120731_S6A, there should in fact be 12 images per well - 2 fields x 6 plate acquisitions. showinf -nopix -omexml with https://github.com/idr/bioformats/tree/IDR-0.9.0 shows 2 WellSamples for each well in each plate acquisition, and the .flex file names match up with those dimensions. The JSON in #69 (comment) lists 12 images per well, but the paths are wrong for half of them. This is fixed in 48dccc8.

@sbesson
Copy link
Member

sbesson commented Dec 10, 2020

After double checking the data, you are absolutely right the dataset contains multiple fields of view per well for each acquisition

$ ls -alh /nfs/bioimage/drop//idr0001-graml-sysgro/20151116-verified/JL_120731_S6A/Meas_01\(2012-07-31_10-41-12\)/
total 17G
drwxrwx---. 2 6166 idrnfs  20K May 15  2016 .
drwxrwx---. 8 6166 idrnfs 4.0K May 15  2016 ..
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001001001.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001001002.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001002001.flex
-rw-rw----. 1 6166 idrnfs  88M May  4  2016 001002002.flex

Unfortunately this means the dataset is not directly comparable to the data exported from IDR as only the first field of view of each well seems to have been imported. Proposing to discuss at the next Formats meeting the next steps for validating this PR as well as the handling of empty rows/columns.

@chris-allan
Copy link
Member

@sbesson: We would like to get this in for the forthcoming 0.3.0 release. Is there anything further you'd like to discuss/test here beforehand?

@melissalinkert
Copy link
Member Author

Conflicts resolved.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-allan thanks for the heads up.

Summarizing briefly, most of my testing has been focused on the use case of plates with multiple acquisitions. This is primarily because this was the scenario we were trying to work in terms of specification. Unfortunately for the reasons discussed above, it is not straightforward to compare the output of bioformats2raw vs omero-cli-zarr.

A potential test might to try and convert the other sample plates discussed in the public OME-Zarr HCS spec blog post using bioformats2raw and compare the output with the public Zarr representations on S3. Given the preliminary testing as well as my current capacity I certainly don't feel a compelling reason to block the upcoming minor release though.

Instead I'd propose to schedule this conversion test using bioformats2raw 0.3.0 and capture issues separately.

@chris-allan
Copy link
Member

@melissalinkert: This PR is ready to have conflicts resolved prior to merge.

@melissalinkert
Copy link
Member Author

Conflicts resolved.

@chris-allan chris-allan merged commit 8a597e2 into glencoesoftware:master Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants