New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reserve PID at publication if needed #7147
Reserve PID at publication if needed #7147
Conversation
…_publish_when_needed
throw new IllegalCommandException(BundleUtil.getStringFromBundle("publishDatasetCommand.pidNotReserved"), this); | ||
//Backward compatibility and recovery from PID provider outage - if we should have reserved at creation time but didn't, try here as well to keep moving. If we still can't reserve (service down, connflict, etc.) then bail | ||
try { | ||
String returnString = idServiceBean.createIdentifier(this.getDataset()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going back to trying to register an unreserved DOI during publishing, it should probably happen in the other, asynchronous part of the workflow - in FinalizeDatasetPublicationCommand. Otherwise we can again end up in a situation where Datacite decides to hang for a while, and this is happening while the dataset is still unlocked, etc.
And in the finalize command we already have this code:
if ( theDataset.getGlobalIdCreateTime() == null ) {
try {
registerExternalIdentifier(theDataset, ctxt);
}
...
but of course that registerExternalIdentifier method will need to be modified, because in its current form it would get a new pid assigned to the dataset, if the old one already exists on the Datacite end...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does DataCite hang/delay an http call (versus fail)? If so I can move this/integrate with registerExternalIdentifier to avoid waiting for the call to succeed/fail. It seems like going async and not being able to respond to the user when the first step fails (before file validation, file DOI registrations) except with the async notification isn't great either, but if there are cases where reserving the dataset DOI might induce people to click again, I guess it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it is a network call - so it can hang... I believe we have seen registration calls actually taking a long time and/or timing out, during the worst of their service troubles.
But yes, one nice thing about the current implementation is that we can exit right away with a clear message, before the dataset gets locked and the command goes async.
(we could alternatively try and leave it in the first part of the command - but make the call with a forced short timeout?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - updated registerExternalIdentifier and moved things to Finalize. As far as I can tell, the httpClient should be limiting the wait for a response to one minute (system default) but that could be lowered in the DataCIteRESTfullClient (easiest to set once for all calls since there's one shared client) and I think the whole call to registerExternalIdentifier could move to Publish then with a cut/paste.
} | ||
theDataset.setGlobalIdCreateTime(getTimestamp()); | ||
theDataset.setIdentifierRegistered(true); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just to avoid the retry attempts - same set of calls otherwise.
} | ||
try { | ||
if (globalIdServiceBean.alreadyExists(theDataset)) { | ||
int attempts = 0; | ||
|
||
while (globalIdServiceBean.alreadyExists(theDataset) && attempts < FOOLPROOF_RETRIAL_ATTEMPTS_LIMIT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With while, there's an extra call to alreadyExists in the first iteration, so switched to do while loop.
} | ||
|
||
if (globalIdServiceBean.alreadyExists(theDataset)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar - extra call to alreadyExists since attempts will be over the limit if the while call above doesn't find alreadyExists()==false by the end.
if ( theDataset.getGlobalIdCreateTime() == null ) { | ||
registerExternalIdentifier(theDataset, ctxt); | ||
registerExternalIdentifier(theDataset, ctxt, idServiceBean.registerWhenPublished()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent of the larger discussion about the PR, if we keep the new boolean flag, it could always be set to false here - PID providers that do not register draft DOIs would then try once to create a DOI at publication time and fail if the DOI is already in use, versus retrying and causing all of the files to be 'lost' (at the path for the old/conflicted DOI instead of one found in a retry). Retries would then only occur in the create command (for providers that can reserve).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does make sense - maybe we should just hard-code it to false, when the method is called from the publication command. It's not exactly clear what the point would be, in keeping the current behavior for handles.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made one last comment/question; but I'm still moving it into QA.
files don't disappear, but they are stored with a path reflecting the initial PID so changing the PID means that can't be retrieved using any newly created PID - the reason reserve at create time has been implemented for DataCIte and is preferred overall
IQSS/7146-reserve_at_publish_when_needed Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
Should this issue be tested as well? Hat tip to @mheppler who found it. Feels related.
|
At a minimum the recent changes would make that issue obsolete. I think between the several PRs to reserve PIDs, add notification, etc. most/all of the recommendations there are implemented. |
What this PR does / why we need it: Reserves a PID if needed at publication time (using a reserve at creation-time PID provider but none has been created, so either a backward-compatibility issue or after a PID provider outage)
Which issue(s) this PR closes:
Closes #7146
Special notes for your reviewer: Would slightly change the guides documentation if adopted
Suggestions on how to test this: Try publishing without having reserved DOIs at DataCite
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: No
Additional documentation: