Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve PID at publication if needed #7147

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Jul 29, 2020

What this PR does / why we need it: Reserves a PID if needed at publication time (using a reserve at creation-time PID provider but none has been created, so either a backward-compatibility issue or after a PID provider outage)

Which issue(s) this PR closes:

Closes #7146

Special notes for your reviewer: Would slightly change the guides documentation if adopted

Suggestions on how to test this: Try publishing without having reserved DOIs at DataCite

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: No

Additional documentation:

@qqmyers qqmyers changed the title reserve if needed Reserve PID at publication if needed Jul 29, 2020
@coveralls
Copy link

coveralls commented Jul 29, 2020

Coverage Status

Coverage increased (+0.002%) to 19.547% when pulling 09a8401 on GlobalDataverseCommunityConsortium:IQSS/7146-reserve_at_publish_when_needed into 8ca934b on IQSS:develop.

throw new IllegalCommandException(BundleUtil.getStringFromBundle("publishDatasetCommand.pidNotReserved"), this);
//Backward compatibility and recovery from PID provider outage - if we should have reserved at creation time but didn't, try here as well to keep moving. If we still can't reserve (service down, connflict, etc.) then bail
try {
String returnString = idServiceBean.createIdentifier(this.getDataset());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going back to trying to register an unreserved DOI during publishing, it should probably happen in the other, asynchronous part of the workflow - in FinalizeDatasetPublicationCommand. Otherwise we can again end up in a situation where Datacite decides to hang for a while, and this is happening while the dataset is still unlocked, etc.
And in the finalize command we already have this code:

if ( theDataset.getGlobalIdCreateTime() == null ) {
   try {
      registerExternalIdentifier(theDataset, ctxt);
   }
   ...

but of course that registerExternalIdentifier method will need to be modified, because in its current form it would get a new pid assigned to the dataset, if the old one already exists on the Datacite end...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does DataCite hang/delay an http call (versus fail)? If so I can move this/integrate with registerExternalIdentifier to avoid waiting for the call to succeed/fail. It seems like going async and not being able to respond to the user when the first step fails (before file validation, file DOI registrations) except with the async notification isn't great either, but if there are cases where reserving the dataset DOI might induce people to click again, I guess it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is a network call - so it can hang... I believe we have seen registration calls actually taking a long time and/or timing out, during the worst of their service troubles.
But yes, one nice thing about the current implementation is that we can exit right away with a clear message, before the dataset gets locked and the command goes async.
(we could alternatively try and leave it in the first part of the command - but make the call with a forced short timeout?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - updated registerExternalIdentifier and moved things to Finalize. As far as I can tell, the httpClient should be limiting the wait for a response to one minute (system default) but that could be lowered in the DataCIteRESTfullClient (easiest to set once for all calls since there's one shared client) and I think the whole call to registerExternalIdentifier could move to Publish then with a cut/paste.

}
theDataset.setGlobalIdCreateTime(getTimestamp());
theDataset.setIdentifierRegistered(true);
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was just to avoid the retry attempts - same set of calls otherwise.

@djbrooke djbrooke added this to the Dataverse 5 milestone Jul 30, 2020
}
try {
if (globalIdServiceBean.alreadyExists(theDataset)) {
int attempts = 0;

while (globalIdServiceBean.alreadyExists(theDataset) && attempts < FOOLPROOF_RETRIAL_ATTEMPTS_LIMIT) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With while, there's an extra call to alreadyExists in the first iteration, so switched to do while loop.

}

if (globalIdServiceBean.alreadyExists(theDataset)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar - extra call to alreadyExists since attempts will be over the limit if the while call above doesn't find alreadyExists()==false by the end.

if ( theDataset.getGlobalIdCreateTime() == null ) {
registerExternalIdentifier(theDataset, ctxt);
registerExternalIdentifier(theDataset, ctxt, idServiceBean.registerWhenPublished());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent of the larger discussion about the PR, if we keep the new boolean flag, it could always be set to false here - PID providers that do not register draft DOIs would then try once to create a DOI at publication time and fail if the DOI is already in use, versus retrying and causing all of the files to be 'lost' (at the path for the old/conflicted DOI instead of one found in a retry). Retries would then only occur in the create command (for providers that can reserve).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make sense - maybe we should just hard-code it to false, when the method is called from the publication command. It's not exactly clear what the point would be, in keeping the current behavior for handles.
?

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made one last comment/question; but I'm still moving it into QA.

files don't disappear, but they are stored with a path reflecting the
initial PID so changing the PID means that can't be retrieved using any
newly created PID - the reason reserve at create time has been
implemented for DataCIte and is preferred overall
@kcondon kcondon self-assigned this Aug 3, 2020
IQSS/7146-reserve_at_publish_when_needed

Conflicts:
	src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
@pdurbin
Copy link
Member

pdurbin commented Aug 3, 2020

Should this issue be tested as well? Hat tip to @mheppler who found it. Feels related.

@kcondon kcondon merged commit 2b6be39 into IQSS:develop Aug 3, 2020
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA 🔎✅ to Done 🚀 Aug 3, 2020
@qqmyers
Copy link
Member Author

qqmyers commented Aug 4, 2020

At a minimum the recent changes would make that issue obsolete. I think between the several PRs to reserve PIDs, add notification, etc. most/all of the recommendations there are implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Backward compatibility/error recovery for PID reservation during publish
6 participants