created proposal for minimal RO crate metadata #2

sherwoodf · 2024-07-02T16:57:17Z

Created proposal with examples, diagrams, and .md file with details.

joshmoore · 2024-07-03T07:38:14Z

Thanks, @sherwoodf! Merging so that others will see it in the mainline. Looking forward to chatting about it.

cc: @normanrz @sukunis @Tom-TBT

imagesc-bot · 2024-07-03T18:22:03Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome2024-ngff-challenge/97363/15

joshmoore

Various post-merge comments while reading through the JSON-LD more carefully. Sorry for the wait.

joshmoore · 2024-07-11T13:06:59Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "BioChemEntity": "https://schema.org/BioChemEntity",
+            "obo": "http://purl.obolibrary.org/obo/",
+            "acquisiton_method": {
+                "@reverse": "https://schema.org/result",


TIL @reverse. It's interesting that I've not run into it before. Thanks, @sherwoodf. It does leave me wondering whether or not that bumps us to a higher-level of complexity.

As you'll see i mention a lot below, this was the result of me a) not finding very specific ontology terms that i think would fit better and b) trying to preserve what felt like a more sensible sequence when reading the json objects. Ideally we would have better terms to connect these objects.

a) definitely understood. Under the GIDE banner, I assume we can also consider either updating or creating ontologies as necessary. I don't assume we want to do that for the challenge, so maybe more a question of just documenting whether use of idioms like @reverse are temporary or not.

b) Big 👍 for a focus on readability.

joshmoore · 2024-07-11T13:08:32Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "organism_classification": "https://schema.org/taxonomicRange",
+            "BioChemEntity": "https://schema.org/BioChemEntity",
+            "obo": "http://purl.obolibrary.org/obo/",
+            "acquisiton_method": {


As a sidenote, the ARC community (https://github.com/nfdi4plants/ARC-Symposium, etc.) has suggested that the new term LabProcess be used: BioSchemas/specifications#669 In the graph that would then add an entity which takes inputs and outputs which may touch on some of what is happening here with the @reverse of result.

joshmoore · 2024-07-11T13:09:26Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+                "@reverse": "https://schema.org/result",
+                "@type": "@id"
+            },
+            "biological_entity": "https://schema.org/about",


This aliasing surprised me a bit.

This is what i meant by 'not wrong, but very generic'. I was trying to constrain myself to json that read well to non-linked-data people e.g. by having evocative field names & keeping the structure simple, but i wasn't able to find ontology terms out there that accurately describe the relationships between these entities in the biological imaging context. I was able to find lots of terms within domains e.g. describing protocols & methods, but struggled to find terms connecting them. I suspect foundingGide work might provide a solution there, and we could then switch over the ontological term without upsetting the json at a later date? But keen to hear more about the constraints you think are important for the metadata,

not wrong, but very generic

nods

I suspect foundingGide work might provide a solution there

👍

But keen to hear more about the constraints you think are important for the metadata,

Perhaps what might help would be a collection from challengers (or really, anyone) on the terms they would like to use. I imagine the best we can hope for at the moment would be a UNION of various sources. (I've listed on the agenda for the 17th)

As an alternative to retain better interoperability, you could also handle this by having rules in a profile that specify what @type (and other properties) the entity referenced by about should have (e.g. maybe it must be a BioChemEntity). But that is more helpful for devs than non-linked-data folks, I guess

joshmoore · 2024-07-11T13:12:14Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            }
+        },
+        {
+            "@id": "./",


I keep coming back to having this within the Zarr, though it's fair to consider what happens with an OME-TIFF. The most recent issue I had when discussing this issue on the RO-Crate Regional Drop-In Calls is that one could imagine that someone might want to put "yet-another" RO-Crate at a level higher-up, then we would have:

new top RO-Crate

this RO-Crate

zarr

possibly more RO-Crates within

It just feels like we could save a level in the generic case.

https://www.researchobject.org/ro-crate/specification/1.1/appendix/implementation-notes.html#combining-with-other-packaging-schemes may be relevant.

joshmoore · 2024-07-11T13:13:14Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "@type": "Dataset",
+            "name": "OME-ZARR files",
+            "description": "the ome zarr files of the fly.",
+            "acquisition_method": [


These could validly be written as:

"acquisition_method": "_:b0"

(i.e. non-list)?

Yes, though i'm always aware that biological images contain a lot of edge cases & I mostly only know that i'm not knowledgeable in this area. E.g. what happens if an image is created via the combination of various imaging techniques?

So in general, i went with values in lists. The RDF users won't see any difference, but I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it.

what happens if an image is created via the combination of various imaging techniques?

Definitely agreed that a multi-answer is possible.

I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

This is a good point.

I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

Noting that this would be a good rule to enshrine in any profile you make if you want to enforce it for your user base, since it's more restrictive than base RO-Crate

And on fact this contradicts the "compact" requirement.

joshmoore · 2024-07-11T13:13:46Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "acquisition_method": [
+                "_:b0"
+            ],
+            "preparation_method": [


Another synatic point: I imagine the wider audience for these would prefer embedded blank nodes rather than references.

While i 100% agree and much prefer embedded (not sure what to call non-flattened), compacted json-ld, the RO-Crate standard asks for flattened:

The RO-Crate Metadata File MUST contain RO-Crate JSON-LD; a valid [JSON-LD 1.0](https://www.w3.org/TR/2014/REC-json-ld-20140116/) document in [flattened](https://www.w3.org/TR/json-ld/#flattened-document-form) and [compacted](https://www.w3.org/TR/json-ld/#compacted-document-form) form

from: https://www.researchobject.org/ro-crate/specification/1.1/structure.html

Maybe the flattened structure less of a hard requirement than it being JSON-LD (JSON-LD libraries shouldn't have issues converting between the different profiles) but i went with flattened being a hard requirement just in case. Would be worth investigating further.

RO-Crate Metadata File MUST contain RO-Crate JSON-LD; a valid JSON-LD 1.0 document in flattened and [compacted]

Wow! 🫨 TIL. I will ask. I find that quite surprising.

The flattened structure is a hard requirement.

This part of the spec appendix has more formal extra guidance and clarity on this
https://www.researchobject.org/ro-crate/specification/1.1/appendix/jsonld#describing-entities-in-json-ld

And the 1.2-DRAFT version is clearer on when to use local identifiers vs blank nodes https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld#describing-entities-in-json-ld

In particular, 1.2-DRAFT states:

The use of a blank node identifier SHOULD be taken as hint by RO-Crate presentation software* to display the entity in-line, not as a separate entity with its own view, such as a page.

* e.g. Crate-O or roc2html, but I don't know if they are implemented in this way

joshmoore · 2024-07-11T13:17:16Z

dev3/2024-07-02/example-metadata/minimal.json

+        },
+        {
+            "@id": "_:b0",
+            "@type": [


I'm left wondering if we couldn't come up with a more readable way to make use of the ontology classes. Are there other idioms that you considred, @sherwoodf?

This is a little tied up with using flattened form. I was considering having a property (that isn't rdf:type, but again, we'd probably need to define this) to link to these objects & then could neatly include a name field in that object alongside the ID, but with flattened this would be a whole other node in the list of objects.

nod understood. Let's hear what the folks on the seek4science slack say and go from there.

as @sherwoodf says, this would normally be handled as a separate node with its own name (and description) for human comprehensibility.

You could look at how the Process Run Crate uses CreateAction and instrument, but I can see the limitations of it when it comes to describing and categorising physical experimental methods rather than computational processes.
To me it seems as if both acquisition_method and preparation_method could potentially be Process Run Crates of their own accord, if you wanted to track all the provenance of how those were carried out (or follow a similar structure within this larger crate).

joshmoore · 2024-07-12T12:19:05Z

Links:

sherwoodf · 2024-07-12T12:31:25Z

https://www.researchobject.org/packaging_data_with_ro-crate/07-cross-references/index.html#about-cross-references

https://www.researchobject.org/ro-crate/specification/1.1/appendix/jsonld#describing-entities-in-json-ld

Thanks for reaching out to someone about that - very helpful to know it's a strong requirement!

elichad

Hello! I work on RO-Crate and was speaking with @joshmoore last week. I'm still fairly new to RO-Crate and linked data space myself (and I'm no bioinformatician at all), but I've added some thoughts to this PR that I hope are hopeful as you continue to think about this

elichad · 2024-07-12T12:18:51Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "@type": "Dataset",
+            "name": "OME-ZARR files",
+            "description": "the ome zarr files of the fly.",
+            "acquisition_method": [


I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

Noting that this would be a good rule to enshrine in any profile you make if you want to enforce it for your user base, since it's more restrictive than base RO-Crate

elichad · 2024-07-12T12:20:36Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+                "@reverse": "https://schema.org/result",
+                "@type": "@id"
+            },
+            "biological_entity": "https://schema.org/about",


As an alternative to retain better interoperability, you could also handle this by having rules in a profile that specify what @type (and other properties) the entity referenced by about should have (e.g. maybe it must be a BioChemEntity). But that is more helpful for devs than non-linked-data folks, I guess

elichad · 2024-07-12T12:24:17Z

dev3/2024-07-02/example-metadata/min-specimen-biosample.json

+            "acquisition_method": [
+                "_:b0"
+            ],
+            "preparation_method": [


The flattened structure is a hard requirement.

This part of the spec appendix has more formal extra guidance and clarity on this
https://www.researchobject.org/ro-crate/specification/1.1/appendix/jsonld#describing-entities-in-json-ld

And the 1.2-DRAFT version is clearer on when to use local identifiers vs blank nodes https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld#describing-entities-in-json-ld

In particular, 1.2-DRAFT states:

The use of a blank node identifier SHOULD be taken as hint by RO-Crate presentation software* to display the entity in-line, not as a separate entity with its own view, such as a page.

* e.g. Crate-O or roc2html, but I don't know if they are implemented in this way

elichad · 2024-07-12T12:41:01Z

dev3/2024-07-02/example-metadata/minimal.json

+        },
+        {
+            "@id": "_:b0",
+            "@type": [


as @sherwoodf says, this would normally be handled as a separate node with its own name (and description) for human comprehensibility.

You could look at how the Process Run Crate uses CreateAction and instrument, but I can see the limitations of it when it comes to describing and categorising physical experimental methods rather than computational processes.
To me it seems as if both acquisition_method and preparation_method could potentially be Process Run Crates of their own accord, if you wanted to track all the provenance of how those were carried out (or follow a similar structure within this larger crate).

created proposal for minimal RO crate metadata

b38a75b

joshmoore merged commit f3b56cc into ome:main Jul 3, 2024

joshmoore reviewed Jul 11, 2024

View reviewed changes

joshmoore mentioned this pull request Jul 11, 2024

Where to place RO Crate Metadata #4

Closed

elichad reviewed Jul 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

created proposal for minimal RO crate metadata #2

created proposal for minimal RO crate metadata #2

sherwoodf commented Jul 2, 2024

joshmoore commented Jul 3, 2024

imagesc-bot commented Jul 3, 2024

joshmoore left a comment

joshmoore Jul 11, 2024

sherwoodf Jul 11, 2024

joshmoore Jul 12, 2024

joshmoore Jul 11, 2024

joshmoore Jul 11, 2024

sherwoodf Jul 11, 2024

joshmoore Jul 12, 2024

elichad Jul 12, 2024

joshmoore Jul 11, 2024

joshmoore Jul 12, 2024

joshmoore Jul 11, 2024

sherwoodf Jul 11, 2024

joshmoore Jul 12, 2024

elichad Jul 12, 2024

joshmoore Jul 12, 2024

joshmoore Jul 11, 2024

sherwoodf Jul 11, 2024

joshmoore Jul 12, 2024

elichad Jul 12, 2024 •

edited

Loading

joshmoore Jul 11, 2024

sherwoodf Jul 11, 2024

joshmoore Jul 12, 2024

elichad Jul 12, 2024

joshmoore commented Jul 12, 2024

sherwoodf commented Jul 12, 2024

elichad left a comment

elichad Jul 12, 2024

elichad Jul 12, 2024

elichad Jul 12, 2024 •

edited

Loading

elichad Jul 12, 2024

created proposal for minimal RO crate metadata #2

created proposal for minimal RO crate metadata #2

Conversation

sherwoodf commented Jul 2, 2024

joshmoore commented Jul 3, 2024

imagesc-bot commented Jul 3, 2024

joshmoore left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elichad Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshmoore commented Jul 12, 2024

sherwoodf commented Jul 12, 2024

elichad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elichad Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elichad Jul 12, 2024 •

edited

Loading

elichad Jul 12, 2024 •

edited

Loading