Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source DOI as alternative identifier in EML? #15

Closed
peterdesmet opened this issue Apr 28, 2022 · 19 comments
Closed

Source DOI as alternative identifier in EML? #15

peterdesmet opened this issue Apr 28, 2022 · 19 comments
Assignees
Labels

Comments

@peterdesmet
Copy link
Member

Me:

Should the source DOI be an alternative identifier for the dataset in EML?

@timrobertson100:

I think so. I’m not sure if GBIF will then use it though… can you try perhaps?

@peterdesmet peterdesmet self-assigned this May 4, 2022
@peterdesmet
Copy link
Member Author

Hi @timrobertson100, if I set the source dataset DOI as alternativeIdentifier then GBIF will use it as the main identifier. See:

https://www.gbif-uat.org/dataset/0ef15f32-b41d-4274-ae96-eb5d0059fee6

I guess that means that GBIF would not mint a DOI for the dataset (in the example above it did that on first publication) and we would thus have a single DOI for both the source and the [subsampled representation] version.

Pro:

  • People are directed to the source dataset
  • Citations contribute to the source dataset

Con:

  • No DOI for the derived dataset, one would have to use the GBIF.org URL to access it.

I'm not against this approach. @sarahcd @timrobertson100 what do you think?

@sarahcd
Copy link
Collaborator

sarahcd commented May 10, 2022

This looks good to me. Open to feedback from @timrobertson100 based on how this is used by others.

If we were using DataCite's schema we could do this:
relatedIdentifier relatedIdentifierType="DOI" relationType="IsDerivedFrom"
Not sure if a similar clarification is possible here (I think all we have now is the suffix on the dataset title).

@peterdesmet
Copy link
Member Author

The GBIF IPT (or EML) does not have a field for relatedIdentifiers, only alternativeIdentifiers which I guess should be interpreted as same as.

@sarahcd
Copy link
Collaborator

sarahcd commented May 10, 2022

Ok, I think this is fine as is then.

@timrobertson100
Copy link

I think this is fine as is then

I tend to agree too

@peterdesmet
Copy link
Member Author

@timrobertson100 so to be clear, you are fine that the subsampled Darwin Core version of the dataset is not assigned a new DOI, but reuses the one from the source?

@timrobertson100
Copy link

@timrobertson100 so to be clear, you are fine that the subsampled Darwin Core version of the dataset is not assigned a new DOI, but reuses the one from the source?

Yes, to me this makes sense justified by the fact that the use of dataset DOI in GBIF is representing the concept of a living dataset and not a specific version of that dataset (our downloads on the other hand are immutable datasets). Here, we are sharing a downsampled dataset to 1) aid in discovery (i.e. people with a taxonomic/geographic/temporal filter will find it in GBIF), and 2) so that the broad location data can contribute to scientific questions asked of the GBIF aggregate dataset.

@dnoesgaard leads all the GBIF citation tracking - does this also seem reasonable to you please?

@dnoesgaard
Copy link

We don't track citations of DOIs that aren't minted by GBIF—except a few prefixes assigned and used by specific IPTs only. That being said, most use of GBIF-mediated data happens via downloads, in which case a GBIF DOI is minted to represent the download (which of course could be a single dataset).

How many datasets are we talking about here? Will they all have Zenodo DOIs?

@peterdesmet
Copy link
Member Author

The current scope is 11 datasets, all from Zenodo. But in the future there could be more, with DOIs from the Movebank Data Repository

@dnoesgaard
Copy link

Ok. At the moment, we will not be able to track citations of datasets with non-GBIF DOIs. The reason being that we track based on DOI prefix, so it's only feasible for us to do that when a prefix is (almost) exclusively used for datasets published in GBIF.

When/if proper dataset citations are more common and included in DOI metadata, we might be able to pull them directly from Event Data—as a supplement, if nothing else.

@sarahcd
Copy link
Collaborator

sarahcd commented May 11, 2022

I think given the proposed solution here, it's ok to leave it to the DOI-granting repositories to track dataset citations.

Regarding GBIF-mediated data downloads: It would be great if there was a way to recognize DOIs of data that contribute to these downloads. That and/or other ways to track use of their data would certainly get more movement ecologists interested in having their data on GBIF.

@dnoesgaard
Copy link

As I mentioned, most use of GBIF-mediated data happens through downloads that are also assigned (unique) GBIF-minted DOIs. We will still track use of downloaded data containing records from datasets without GBIF-assigned DOIs. This information is also aggregated at the level of the contributing datasets and their publishers. The metadata of download DOIs contains <relatedIdentifier> records of all contributing datasets (using "relationType": "References"), and citations of downloads are also recorded in the metadata using "relationType": "IsCitedBy" relationships.

Example: https://doi.org/10.15468/dl.5tm8an

This download was cited by this paper: https://doi.org/10.1016/j.ecss.2022.107883

You'll see this reflected in the DOI metadata (https://api.datacite.org/dois/10.15468/dl.5tm8an) as:

{
"relationType": "IsCitedBy",
"relatedIdentifier": "10.1016/j.ecss.2022.107883",
"relatedIdentifierType": "DOI"
}

@peterdesmet
Copy link
Member Author

That's cool! So if a paper cites a GBIF download, and that download includes a dataset with a non-GBIF-minted DOI, the GBIF dataset page for that dataset would show that paper as a citation. It's just that if the dataset was cited directly, that it wouldn't show up, since you don't track these?

@timrobertson100
Copy link

Correct

@peterdesmet
Copy link
Member Author

Documented in function documentation: https://inbo.github.io/movepub/reference/write_dwc.html#metadata

alternative identifier: DOI of original dataset. This way, no new DOI will be created when publishing to GBIF.

@peterdesmet
Copy link
Member Author

Reopening this with a question regarding versioned DOIs. How will GBIF handle the following workflow:

  1. Original dataset is published on Zenodo, with DOI v1.
  2. Subsampled representation of the dataset is created and published for the first time via a GBIF IPT, with DOI v1 as alternative identifier.
  3. Dataset is registered with GBIF, a dataset key is made. No DOI is minted, as dataset already has DOI v1 as identifier.
  4. DOI v1 makes it into GBIF downloads and collects citations.
  5. A new version of the dataset is created on Zenodo (e.g. additional records), with DOI v2.
  6. The subsampled representation is created again and replaces the previous one on the same IPT. This dataset has DOI v2 as identifier (replacing DOI v1)
  7. GBIF harvests the dataset under the same dataset Key.
  • What will happen with the already collected citations? Are they still visible on the dataset page
  • Should the dataset have both DOI v1 and DOI v2 as alternative identifiers, or can one be replaced as described in the scenario above?

@peterdesmet peterdesmet reopened this May 16, 2022
@dnoesgaard
Copy link

Fwiw, citations are linked to the GBIF dataset key (via downloads), so they remain unchanged.

@peterdesmet
Copy link
Member Author

Great, so no issues to be expected when updating the DOI.

@timrobertson100
Copy link

Great, so no issues to be expected when updating the DOI.

Correct, a dataset can change its DOI and we'll still link the citations that GBIF has tracked in the GBIF database. If others are tracking citations through a DOI metadata graph (e.g. DataCite) they won't be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants