Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

created proposal for minimal RO crate metadata #2

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

sherwoodf
Copy link
Contributor

Created proposal with examples, diagrams, and .md file with details.

@joshmoore
Copy link
Member

Thanks, @sherwoodf! Merging so that others will see it in the mainline. Looking forward to chatting about it.

cc: @normanrz @sukunis @Tom-TBT

@joshmoore joshmoore merged commit f3b56cc into ome:main Jul 3, 2024
@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ome2024-ngff-challenge/97363/15

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various post-merge comments while reading through the JSON-LD more carefully. Sorry for the wait.

"BioChemEntity": "https://schema.org/BioChemEntity",
"obo": "http://purl.obolibrary.org/obo/",
"acquisiton_method": {
"@reverse": "https://schema.org/result",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL @reverse. It's interesting that I've not run into it before. Thanks, @sherwoodf. It does leave me wondering whether or not that bumps us to a higher-level of complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you'll see i mention a lot below, this was the result of me a) not finding very specific ontology terms that i think would fit better and b) trying to preserve what felt like a more sensible sequence when reading the json objects. Ideally we would have better terms to connect these objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a) definitely understood. Under the GIDE banner, I assume we can also consider either updating or creating ontologies as necessary. I don't assume we want to do that for the challenge, so maybe more a question of just documenting whether use of idioms like @reverse are temporary or not.
  • b) Big 👍 for a focus on readability.

"organism_classification": "https://schema.org/taxonomicRange",
"BioChemEntity": "https://schema.org/BioChemEntity",
"obo": "http://purl.obolibrary.org/obo/",
"acquisiton_method": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a sidenote, the ARC community (https://github.com/nfdi4plants/ARC-Symposium, etc.) has suggested that the new term LabProcess be used: BioSchemas/specifications#669 In the graph that would then add an entity which takes inputs and outputs which may touch on some of what is happening here with the @reverse of result.

"@reverse": "https://schema.org/result",
"@type": "@id"
},
"biological_entity": "https://schema.org/about",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aliasing surprised me a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what i meant by 'not wrong, but very generic'. I was trying to constrain myself to json that read well to non-linked-data people e.g. by having evocative field names & keeping the structure simple, but i wasn't able to find ontology terms out there that accurately describe the relationships between these entities in the biological imaging context. I was able to find lots of terms within domains e.g. describing protocols & methods, but struggled to find terms connecting them. I suspect foundingGide work might provide a solution there, and we could then switch over the ontological term without upsetting the json at a later date? But keen to hear more about the constraints you think are important for the metadata,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not wrong, but very generic

nods

I suspect foundingGide work might provide a solution there

👍

But keen to hear more about the constraints you think are important for the metadata,

Perhaps what might help would be a collection from challengers (or really, anyone) on the terms they would like to use. I imagine the best we can hope for at the moment would be a UNION of various sources. (I've listed on the agenda for the 17th)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative to retain better interoperability, you could also handle this by having rules in a profile that specify what @type (and other properties) the entity referenced by about should have (e.g. maybe it must be a BioChemEntity). But that is more helpful for devs than non-linked-data folks, I guess

}
},
{
"@id": "./",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep coming back to having this within the Zarr, though it's fair to consider what happens with an OME-TIFF. The most recent issue I had when discussing this issue on the RO-Crate Regional Drop-In Calls is that one could imagine that someone might want to put "yet-another" RO-Crate at a level higher-up, then we would have:

  • new top RO-Crate
    • this RO-Crate
      • zarr
        • possibly more RO-Crates within

It just feels like we could save a level in the generic case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"@type": "Dataset",
"name": "OME-ZARR files",
"description": "the ome zarr files of the fly.",
"acquisition_method": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could validly be written as:

"acquisition_method": "_:b0"

(i.e. non-list)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though i'm always aware that biological images contain a lot of edge cases & I mostly only know that i'm not knowledgeable in this area. E.g. what happens if an image is created via the combination of various imaging techniques?

So in general, i went with values in lists. The RDF users won't see any difference, but I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if an image is created via the combination of various imaging techniques?

Definitely agreed that a multi-answer is possible.

I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

This is a good point.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

Noting that this would be a good rule to enshrine in any profile you make if you want to enforce it for your user base, since it's more restrictive than base RO-Crate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on fact this contradicts the "compact" requirement.

"acquisition_method": [
"_:b0"
],
"preparation_method": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another synatic point: I imagine the wider audience for these would prefer embedded blank nodes rather than references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While i 100% agree and much prefer embedded (not sure what to call non-flattened), compacted json-ld, the RO-Crate standard asks for flattened:

The RO-Crate Metadata File MUST contain RO-Crate JSON-LD; a valid [JSON-LD 1.0](https://www.w3.org/TR/2014/REC-json-ld-20140116/) document in [flattened](https://www.w3.org/TR/json-ld/#flattened-document-form) and [compacted](https://www.w3.org/TR/json-ld/#compacted-document-form) form

from: https://www.researchobject.org/ro-crate/specification/1.1/structure.html

Maybe the flattened structure less of a hard requirement than it being JSON-LD (JSON-LD libraries shouldn't have issues converting between the different profiles) but i went with flattened being a hard requirement just in case. Would be worth investigating further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RO-Crate Metadata File MUST contain RO-Crate JSON-LD; a valid JSON-LD 1.0 document in flattened and [compacted]

Wow! 🫨 TIL. I will ask. I find that quite surprising.

Copy link

@elichad elichad Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flattened structure is a hard requirement.

This part of the spec appendix has more formal extra guidance and clarity on this
https://www.researchobject.org/ro-crate/specification/1.1/appendix/jsonld#describing-entities-in-json-ld

And the 1.2-DRAFT version is clearer on when to use local identifiers vs blank nodes https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld#describing-entities-in-json-ld

In particular, 1.2-DRAFT states:

The use of a blank node identifier SHOULD be taken as hint by RO-Crate presentation software* to display the entity in-line, not as a separate entity with its own view, such as a page.

* e.g. Crate-O or roc2html, but I don't know if they are implemented in this way

},
{
"@id": "_:b0",
"@type": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm left wondering if we couldn't come up with a more readable way to make use of the ontology classes. Are there other idioms that you considred, @sherwoodf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little tied up with using flattened form. I was considering having a property (that isn't rdf:type, but again, we'd probably need to define this) to link to these objects & then could neatly include a name field in that object alongside the ID, but with flattened this would be a whole other node in the list of objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod understood. Let's hear what the folks on the seek4science slack say and go from there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @sherwoodf says, this would normally be handled as a separate node with its own name (and description) for human comprehensibility.

You could look at how the Process Run Crate uses CreateAction and instrument, but I can see the limitations of it when it comes to describing and categorising physical experimental methods rather than computational processes.
To me it seems as if both acquisition_method and preparation_method could potentially be Process Run Crates of their own accord, if you wanted to track all the provenance of how those were carried out (or follow a similar structure within this larger crate).

@sherwoodf
Copy link
Contributor Author

Thanks for reaching out to someone about that - very helpful to know it's a strong requirement!

Copy link

@elichad elichad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! I work on RO-Crate and was speaking with @joshmoore last week. I'm still fairly new to RO-Crate and linked data space myself (and I'm no bioinformatician at all), but I've added some thoughts to this PR that I hope are hopeful as you continue to think about this

"@type": "Dataset",
"name": "OME-ZARR files",
"description": "the ome zarr files of the fly.",
"acquisition_method": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected JSON users would prefer to always expect a list even if there's occasionally only one element in it

Noting that this would be a good rule to enshrine in any profile you make if you want to enforce it for your user base, since it's more restrictive than base RO-Crate

"@reverse": "https://schema.org/result",
"@type": "@id"
},
"biological_entity": "https://schema.org/about",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative to retain better interoperability, you could also handle this by having rules in a profile that specify what @type (and other properties) the entity referenced by about should have (e.g. maybe it must be a BioChemEntity). But that is more helpful for devs than non-linked-data folks, I guess

"acquisition_method": [
"_:b0"
],
"preparation_method": [
Copy link

@elichad elichad Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flattened structure is a hard requirement.

This part of the spec appendix has more formal extra guidance and clarity on this
https://www.researchobject.org/ro-crate/specification/1.1/appendix/jsonld#describing-entities-in-json-ld

And the 1.2-DRAFT version is clearer on when to use local identifiers vs blank nodes https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/appendix/jsonld#describing-entities-in-json-ld

In particular, 1.2-DRAFT states:

The use of a blank node identifier SHOULD be taken as hint by RO-Crate presentation software* to display the entity in-line, not as a separate entity with its own view, such as a page.

* e.g. Crate-O or roc2html, but I don't know if they are implemented in this way

},
{
"@id": "_:b0",
"@type": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @sherwoodf says, this would normally be handled as a separate node with its own name (and description) for human comprehensibility.

You could look at how the Process Run Crate uses CreateAction and instrument, but I can see the limitations of it when it comes to describing and categorising physical experimental methods rather than computational processes.
To me it seems as if both acquisition_method and preparation_method could potentially be Process Run Crates of their own accord, if you wanted to track all the provenance of how those were carried out (or follow a similar structure within this larger crate).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants