Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide ability to batch assign and update DOI metadata and URLs #7

Open
mjordan opened this issue Oct 13, 2017 · 3 comments
Open

Provide ability to batch assign and update DOI metadata and URLs #7

mjordan opened this issue Oct 13, 2017 · 3 comments

Comments

@mjordan
Copy link
Collaborator

mjordan commented Oct 13, 2017

This is a general issue to track ideas for batch assignment and updating of DOIs. Work on individual aspects of the functionality described below should be addressed in their own issues.

Batch assignment of DOIs

A known use case (Simon Fraser University Library's) is that after enabling this module, the repo admin wants to assign DOIs to a large set of existing objects.

This should be in principle easy, since we can reuse the functionality we already have for assigning DOIs to individual objects via the GUI. The biggest challenge is that only objects that have metadata that meets DataCite's (and probably other registrars') requirements can be assigned DOIs, and in practice this is not very common. One particularly challenging aspect of DataCite's metadata requirements is that its mandatory 'resourceType' element must use values from the following list: 'Audiovisual', 'Collection', 'Dataset', 'Event', 'Image', 'InteractiveResource', 'Model', 'PhysicalObject', 'Service', 'Software', 'Sound', 'Text', 'Workflow', and 'Other'.

Any bulk DOI assignment tool needs to be able to detect whether an object has the required metadata, and report out and skip objects that do not. It might also be useful to be able to "audit" a set of objects prior to assignment to see how many pass and how many fail. The repo admin can then add metadata to the failures and try again.

What does it mean to "update" a DOI?

Assuming that the DOI identifier itself never changes (that's its purpose), "updating" a DOI should be limited to updating the metadata and location (URL) of the resource identified by the DOI. There is already an option to do this manually in the GUI, and the DataCite module supports it. The most common use case here is that we want to push changes in local metadata to the registrar, e.g., after the local object's MODS datastream is updated.

But why would be update an object's URL? URLs for Islandora are persistent. Is the only use case for updating a URL associated with a DOI for an Islandora object if the Islandora URL is not longer viable, e.g., after migration from Islandora to another platform?

Batch updating of DOIs

How likely is it that we will need to update the metadata for a set of objects? If we have a way of updating the DOI metadata when the local objects are modified, probably not very likely. If we don't automatically update the metadata, we may want to be able to give the repo admin the ability to "sync" the local metadata with the DOIs.

As suggested above, updating the URLs associated with a DOI will probably only happen after a migration away from Islandora, in which case the URLs associated with the DOIs will need to be updated.

@bondjimbond
Copy link

Some thoughts on those questions...

What does it mean to "update" a DOI?

Does "Update" also include "delete", or is that a separate task? Cases where a DOI should be deleted - if object already has one from another source and a new one was minted accidentally.

URL change use cases

Yes, ideally a URL would only change with migrations, but in reality humans work a bit more messily. e.g. one of our sites batch loaded thousands of theses, decided the metadata was no good, and mass deleted them all. Then they re-ingested those same objects, which of course got new PIDs assigned and therefore new URLs.

This happens fairly frequently. If DOIs were minted for each of these, their URLs would certainly need to be updated.

The above is also a use case for the batch updating question.

@mjordan
Copy link
Collaborator Author

mjordan commented Oct 13, 2017

Thanks, excellent.

Does "Update" also include "delete", or is that a separate task? Cases where a DOI should be deleted - if object already has one from another source and a new one was minted accidentally.

The DataCite REST API doesn't allow deletions of DOIs, just deletion of the metadata associated with a DOI: https://support.datacite.org/v1.1/reference#delete-metadata-record-for-doi-name. Although I can't find any specific coverage of the situation you mention, DataCite does provide information on tombstones that may cover that situation . Not sure how other DOI registrars handle "deletions". I think we need to get an authoritative answer on this before doing any work.

I'd characterize your URL change use case, which I can see as more common than we'd like to admit, as a delete and remint action, since the object has a new PID. But... if what I said in the previous paragraph is accurate, we can't delete DOIs and and remint new ones. Without the same PID, how do we determine if two objects are the same? Local identifiers in MODS, etc. could be used, but it would be complicated. Oh gawd I think I'm gonna be sick 🤢

mjordan added a commit that referenced this issue Oct 21, 2017
mjordan added a commit that referenced this issue Oct 22, 2017
mjordan added a commit that referenced this issue Oct 23, 2017
mjordan added a commit that referenced this issue Oct 24, 2017
mjordan added a commit that referenced this issue Oct 24, 2017
mjordan added a commit that referenced this issue Oct 24, 2017
@mjordan
Copy link
Collaborator Author

mjordan commented Oct 25, 2017

As of c70a5ca, there is a drush command in the DataCite module that will assign DOIs for objects listed in a PID file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants