Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolver to resolve material sample ids separately to the occurrence ids #19

Open
rukayaj opened this issue Jan 17, 2022 · 2 comments
Open

Comments

@rukayaj
Copy link
Collaborator

rukayaj commented Jan 17, 2022

We have a problem with the way we're publishing data currently: it's not possible to separately identify material samples vs occurrences for most of our records.

I don't think the resolver should issue identifiers to the data records, I think that we should be publishing the identifiers and the resolver should be resolving them.

We do have separate material sample IDs for DNA datasets from Corema, so step 1 could be to make the resolver resolve those separately.

@dagendresen
Copy link
Member

(1) I think that we CAN unambiguously publish (real) Occurrences separately from voucher specimens and tissue samples -- using the IPT and DwC-A by using basisOfRecord. (However, I also think that the GBIF data portal and Artskart do not present these appropriately/correctly).

(2) Agree! The resolver should not issue PIDs, only resolve them.

(3) Agree! The MaterialSamples with materialSampleIDs should be resolved by separate endpoints from the corresponding Occurrences they are linked to. The respective occurrenceID should here be an attribute of the MaterialSample endpoint metadata ...

@dagendresen
Copy link
Member

dagendresen commented Jan 17, 2022

I think we should also extract and resolve organismIDs, eventIDs, taxonIDs, etc (when these IDs are following a reasonable name string syntax that we can trust will be persistent ... TODO: decide of a test for the PID name syntax)

Notice also that there exists nowhere yet, for the Norwegian GBIF-datasets, except from the resolver we are building, any end-point (machine-readable or not) for occurrenceID, materialSampleID, organismID, ... etc.

(The global GBIF portal sort of almost provides something that resembles an end-point for data-records, but obviously not for any of the other real-life object classes ...).

The envisioned workflow is for the data publisher to mint (create) a persistent identifier - for their MaterialSamples Occurrences, etc., and add these to their DwCA datasets. The envisioned agreement is next for GBIF-Norway to create the end-point for these publisher-provided persistent identifiers. Currently we support urn:uuid:UUID type identifiers (including the PURL form), but exploring other (more robust?) identifier types such as Handles and DOIs would be perfect.

I think that establishing these end-points is the important rationale for the resolver :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants