Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

Closed
landreev opened this issue Jun 6, 2021 · 1 comment · Fixed by #8354
Closed
Assignees

Comments

@landreev
Copy link
Contributor

landreev commented Jun 6, 2021

Short version:
Current implementation of designating an AlternativePersistentIdentifier for storage ignores the storageLocationDesignator boolean. It's only working in prod. because all of our existing alternative identifiers (handles) are being used for storage, AND we only have 1 of those per dataset.

TL;DR:

We have 10K+ legacy datasets with handle identifiers in prod. They all had DOIs assigned which are now their primary ids. The handles are preserved as AlternativePersistentIdentifiers. A mechanism was created to allow keeping the dataset files in their old directories/S3 pseudo folders named after the handle, by designating an AlternativePersistentIdentifier a storage location, via a boolean in the table.

As part of cleaning up our prod. storage, I wanted to finally move all such files in the S3 storage buckets into pseudofolders uniformly named after the DOIs (that would make checking on files more straightforward etc.). But realized I can't: unsetting the storageLocationDesignator boolean in the table isn't going to work, because the code never consults it as it should. (And I'm sure we don't want to delete these AlternativePersistentIdentifiers completely - because we still want the handles to work; in case anyone has them bookmarked, etc.)

This is what we have in the code, in Dataset.java:

public String getIdentifierForFileStorage(){
   String retVal = getIdentifier(); 
   if (this.getAlternativePersistentIndentifiers() != null && !this.getAlternativePersistentIndentifiers().isEmpty()) {
      for (AlternativePersistentIdentifier api : this.getAlternativePersistentIndentifiers()) {
         retVal = api.getIdentifier();
       }
   }
   return retVal;
}

Similar code in getAuthorityForStorage(). All it needs is an if (api.isStorageLocationDesignator()) around the retVal = api.getIdentifier();. Otherwise the above always uses an alternative identifier for storage; but also note that if a dataset has more than one, it will end up using the last one (randomly). So this is only working for us, because we don't have any datasets with more than one alternative id; and because we've been using every one of them for storage.

See the deprecated method getFileSystemDirectory() in that class for an example of the right logic.

@djbrooke
Copy link
Contributor

djbrooke commented Jan 12, 2022

  • We could investigate storing under DB ID instead of DOI (as that should not change)
  • This is a code-only change and won't require any changes as part of the release. No script required to move files around - the files can stay where they are

@djbrooke djbrooke added the Small label Jan 12, 2022
@landreev landreev moved this from Up Next 🛎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 19, 2022
@landreev landreev self-assigned this Jan 19, 2022
landreev added a commit that referenced this issue Jan 19, 2022
…ating file directories based on Alternative Persistent Identifiers (#7925)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants