Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are dataset and file metadata records already sent to EZID/DataCite being updated? #5060

Closed
jggautier opened this issue Sep 17, 2018 · 6 comments

Comments

@jggautier
Copy link
Contributor

jggautier commented Sep 17, 2018

Are metadata records already sent to EZID/DataCite being updated as Dataverse adds metadata in the DataCite schema?

I'm assuming (so it's possible I'm wrong :) that in some cases it's not being updated because:

However, this OAI-PMH record from EZID includes all of the authors added to this dataset's second version (the first version had just one author), so it looks like some information from the DataCite schema's required fields, like the creator field, are getting updated, and some, like the resourceType, are not. (Will this affect Dataverse's ability to update the resourceType displayed in DataCite Fabrica (#5086)?)

Could the issue be on DataCite's and EZID's ends (maybe with the way they're updating metadata they make available over OAI-PMH)? Or with how DataCite produces the DataCite XML we can download for each work on DataCite Search?

It's important that the existing metadata records that EZID and DataCite have (and make available over OAI-PMH) are updated as Dataverse continues to improve the amount of metadata it sends to these data hubs, which redistribute this metadata and rely on some of it, like the relatedIdentifier metadata, to help generate citation metrics (for the Make Data Count work, #4821, which will be less effective if the metadata that DataCite has for old datasets doesn't include related identifier metadata).

@djbrooke
Copy link
Contributor

djbrooke commented Oct 3, 2018

For now - Document the metadata is sent and when
For now - Verify that complete information is being sent for new datasets

Different issue - Resending metadata to PID provider when the schema is modified (Julian to create a new issue)

@kcondon
Copy link
Contributor

kcondon commented Oct 17, 2018

  • related identifier now works for EZID.
  • update metadata endpoint throws 404: curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/admin/$dataset-id/modifyRegistrationMetadata

@landreev
Copy link
Contributor

@sekmiller and everybody:
Martin Fenner @mfenner (DataCite Technical Director) has pointed this out after the migration from EZID:

something I see a lot in your DataCite metadata is multiple authors in one <creatorName>, e.g. (for DOI 10.7910/dvn/l2ltsc registered yesterday)
<creator>
<creatorName>M. Takayasu, L. Chiesa, J.V. Minervini</creatorName>
</creator>
It should instead be
<creator>
<creatorName>Takayasu, M.</creatorName>
</creator>
<creator>
<creatorName>Chiesa, L.</creatorName>
</creator>
<creator>
<creatorName>Minervini, J.V.</creatorName>
</creator>
Mutliple authors in one <creatorName> not only is not what the schema documentation says, but also makes many things hard or impossible, e.g. adding an ORCD or generating a properly formatted citation in any of the available citation styles.

This does seem like a real issue. Since this (#5060) is related to DOI metadata, and it's already in dev., should we take a look at this problem as well, while we're at it?
Otherwise, let me know and I'll open a new issue.

@landreev
Copy link
Contributor

A quick investigation: I checked a couple of brand-new DOIs that have been minted w/ DataCite since the upgrade yesterday - and they appear to have separate creatorName entries; for example:

<identifier identifierType="DOI">10.7910/DVN/YLUPSB</identifier>  
  <creators> 
    <creator> 
      <creatorName>Dossou-Yovo, Elliott</creatorName>  
      <nameIdentifier schemeURI="https://orcid.org/" nameIdentifierScheme="ORCID">0000-0002-3565-8879</nameIdentifier>  
      <affiliation>(Africa Rice Center)</affiliation> 
    </creator>  
    <creator> 
      <creatorName>Baggie, Idriss</creatorName>  
      <affiliation>(Sierra Leone Agricultural Research Institute)</affiliation> 
    </creator>  
    <creator> 
      <creatorName>Djagba, Justin Fagnombo</creatorName>  
      <affiliation>(Africa Rice Center)</affiliation> 
    </creator>  
    <creator> 
      <creatorName>Swart, Sander</creatorName>  
      <affiliation>(Africa Rice Center)</affiliation> 
    </creator> 
  </creators>  

So my first guess was that this was only a problem with the DOIs migrated from EZID; but then the example Martin provided, above, is ALSO brand-new; minted yesterday.
So this means, I'm guessing, that those 3 authors are actually stored as the single author name in our database (??)
And that would mean this is a problem with our metadata, not with our DOI implementation... OK, either way, this needs to be investigated, whether in the context of this github issue or not.

@sekmiller
Copy link
Contributor

There was a typo in the doc the curl command for the api is:
curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$dataset-id/modifyRegistrationMetadata

@jggautier
Copy link
Contributor Author

Regarding @landreev comment: "So this means, I'm guessing, that those 3 authors are actually stored as the single author name in our database (??)", I'm not sure why I never looked into this until now, but Leonid's hunch was right. Many of the datasets in the collection at https://dataverse.harvard.edu/dataverse/MIT-PSFC have multiple authors added in one field. This problem's been documented at #4035, though I suppose the name of the issue should be broadened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants