Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Darwin Core IRI equivalents to the IPT #1947

Open
albenson-usgs opened this issue Feb 14, 2023 · 13 comments
Open

Adding Darwin Core IRI equivalents to the IPT #1947

albenson-usgs opened this issue Feb 14, 2023 · 13 comments

Comments

@albenson-usgs
Copy link
Contributor

One of the proposals in the current Darwin Core Public Review is intended to improve the use of the Darwin Core IRI equivalents. However I don't think the IPT is currently setup to receive and include in the DwC-A the dwciri equivalents if people were to start using them. For instance sformel-usgs is currently working on a dataset that he would like to include dwciri:sampleSizeUnit = http://vocab.nerc.ac.uk/collection/P06/current/SQKM/. We're not sure how to format the file to include the IRI equivalents. Should we append "dwc:" to all the regular Darwin Core terms and "dwciri:" to the IRI equivalent ones? Or only add "dwciri:" to the IRI equivalents and leave the regular Darwin Core terms as we have been provided them to the IPT to date (without "dwc:" appended to the beginning)? Would it be possible to modify the IPT to accept these IRI equivalents and include them in the DwC-A?

@mike-podolskiy90
Copy link
Contributor

@albenson-usgs Thank you for contacting us.
I need to explore this issue. We'll get back to you as soon as we find out.

@dagendresen
Copy link

Agree that adding the possibility to use the dwciri terms with the IPT would be very good! (I believe these terms were designed for use with RDF, but I think they could work fine with a DwC-A as well).

@timrobertson100
Copy link
Member

@baskaufs - any thoughts on this please?

@baskaufs
Copy link

@dagendresen is correct. Although the IRI dwciri: terms were originally created for use in RDF, I think there are circumstances where they might be very useful to use in tabular data as well -- basically any situation where there's an expectation that the value will be a valid IRI identifier, such as the example @albenson-usgs gave.

I don't know what the technical solution is for implementing this in the IPT. It seems like one should have the option of specifying a namespace, at least in cases where it isn't dwc:. From the standpoint of the meta.xml file, it already uses full IRIs anyway, so it's just a question of implementing a way for the IPT to recognize that the IRI should not start with http://rs.tdwg.org/dwc/terms/. I'm assuming that there is already some kind of hack to handle the dcterms: terms, which don't start with the that namespace either.

@dshorthouse
Copy link
Contributor

How might the IPT (and a DwC-A) deal with cases where instances of dwciri could form an array in relation to the core table? This is partly the reason why we came-up with (arguably a hack) identifiedByID and recordedByID when dwciri:identifiedBy and dwciri:recordedBy would have been just as effective.

@baskaufs
Copy link

@dshorthouse That is a legitimate question. Currently it seems like the only option is to restrict them to single values.

However, I think if the TAG can come up with a consensus one how to handle the issue of multiple values systematically, there might be a way out. I think @ben-norton is working on a proposal for that. Perhaps a JSON array as a value?

If the values could be serialized as some bit of JSON-LD, it seems like it could be possible (with some understood rules) to ingest them directly as machine readable metadata rather than requiring people to parse and disambiguate them as would be required for the ID terms. I think that creating a mechanism for direct ingestion as linked data entities without requiring further processing is the real spirit behind the dwciri: terms.

@ben-norton
Copy link

ben-norton commented Feb 27, 2023

@baskaufs @dshorthouse
Steve is correct. I am currently working on the complex values/arrays white paper. My goal is to send it out for review by March 10th. I'm dedicated to this goal as this topic keeps resurfacing. More to come very soon. David, I can add you to list of initial reviewers if you would like. I'd appreciate your insight on the issue.

@timrobertson100
Copy link
Member

Note that the Humboldt Eco Extension authors consciously chose not to use ID fields instead favoring dwciri.

@dshorthouse
Copy link
Contributor

Note that the Humboldt Eco Extension authors consciously chose not to use ID fields instead favoring dwciri.

Good decision. The first step for adding dwciri to the IPT is figuring out how to include the equivalent terms in DwC-A (or in a Frictionless Data implementation). TBH, I am uneasy about making a dwc sandwich with dwciri ingredients stuffed within; I expect we'd either have to continually adjust to the shifting make-up of the ingredients or be distracted in trying to define what is a sandwich.

@albenson-usgs
Copy link
Contributor Author

TBH, I am uneasy about making a dwc sandwich with dwciri ingredients stuffed within.

I think I'm struggling to understand what this means. I am envisioning that the dwciri would be available in the same way the ID terms are available, I guess just with a different prepending to them? If we can have identifiedBy and identifiedByID in the same file then is there a reason we can't have identifiedBy and dwciri:identifiedBy in the same file?

@dshorthouse
Copy link
Contributor

dshorthouse commented Sep 7, 2023

You may always count on me to muddy the waters; particularly more acute under the brain fog of round 2 of COVID 😳.

Here's what I mean about the sandwich.

An alternate namespace like dwciri to me means something "different", not quite a full participant in the usual depiction of dwc as a "bag-of-terms". So, what makes it different besides a namespace and why does it have a home in expressions where most other terms are in the dwc namespace? Are its contents serialized in some special way? Like say RDF/XML, Turtle, JSON-LD, JSON-AD (atomic data), other? Does it include a schema that can be invoked to validate its contents, say something from schema.org? What makes it operationally different? Should it always be expressed as a JSON array when included in flat serializations like in the core file in a DwC-A even though there may usually be a single value as is typically the case in identifiedBy? The rationale here is to mirror the arrays we'd expect in dwciri:recordedBy.

There's much canoodling left to do here if a dwciri term is to be used in serializations like DwC-A. The above is (mostly) nonsensical because as is stated, "Terms in the dwciri namespace are intended to be used in RDF with non-literal objects." We have no non-literal objects in serializations like DwC-A & so in effect, dwciri has no real home in the IPT until it can produce RDF. What makes me uneasy is the inability to extract asserted relationships between dwciri:identifiedBy, dwc:scientificName, and dwc:dateIdentified, though that is still true of dwc:identifiedBy and no comparable IRI-based terms for the other two. I'd just hope that by now, we could push ourselves to make this trident more tightly connected in an IRI namespace.

@baskaufs
Copy link

baskaufs commented Sep 8, 2023

You are right that it's stated that "Terms in the dwciri namespace are intended to be used in RDF with non-literal objects." However, many tools get used in ways that weren't originally intended. Take JSON itself: it was originally a hack of Javascript to meet a particular need in the Internet, but now it's used all over the place as a way to transfer structured data (morphed now even more as JSON-LD). So I think it's legitimate to consider using dwciri terms in ways that weren't originally envisioned.

With an appropriate metadata description file, it's possible for tabular data to serve as an RDF serialization (i.e. https://www.w3.org/TR/csv2rdf/). So it seems fine to me to use dwciri terms in column headers along with dwc ones. The main problems with doing this come when the value you want to provide isn't a single IRI. For example, one reason for not completely abandoning the ID terms in favor of the IRI terms is that you can use non-IRI identifiers with ID terms (i.e. unaltered UUIDs). That wouldn't be kosher with dwciri terms.

The other complication comes when we want to have multiple values or some complex thing involving blank nodes as we are discussing here. We couldn't put those kinds of things in a single table cell and expect them to be convertible to RDF without establishing some kind of serializing and processing rules (like requiring the contents to be valid bits of JSON-LD with a standard context). With appropriate rules, one could certainly produce RDF from the table if those rules were known. We could certainly write those rules if they served a useful purpose.

@dshorthouse
Copy link
Contributor

This has been a great discussion. Perhaps it's time to propose some examples for how dwciri-namespaced terms could be presented in the core file within DwC-A, alongside I suppose some form of (soft?) validation of their contents &/or some transformations made upon entry/mapping. And so, in that spirit I can think of a range of techniques, using dwciri:recordedBy as example:

Ultra-basic (array of strings):

["https://orcid.org/0000-0002-4391-107X", "https://orcid.org/0000-0003-4365-3135"]

Slightly more complicated (array of objects):

[{ "@id": "https://orcid.org/0000-0002-4391-107X"}, { "@id": "https://orcid.org/0000-0003-4365-3135"}]

Where I get a bit confused here is whether or not we could get away with calling either of the above dwciri:recordedBy. And, if the second one is attractive...because it's an array of objects...would it be permissive of any additional keys, such as more explicit ordering as might be accomplished through schema:Role?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants