Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

landreev · 2021-06-06T15:46:25Z

Short version:
Current implementation of designating an AlternativePersistentIdentifier for storage ignores the storageLocationDesignator boolean. It's only working in prod. because all of our existing alternative identifiers (handles) are being used for storage, AND we only have 1 of those per dataset.

TL;DR:

We have 10K+ legacy datasets with handle identifiers in prod. They all had DOIs assigned which are now their primary ids. The handles are preserved as AlternativePersistentIdentifiers. A mechanism was created to allow keeping the dataset files in their old directories/S3 pseudo folders named after the handle, by designating an AlternativePersistentIdentifier a storage location, via a boolean in the table.

As part of cleaning up our prod. storage, I wanted to finally move all such files in the S3 storage buckets into pseudofolders uniformly named after the DOIs (that would make checking on files more straightforward etc.). But realized I can't: unsetting the storageLocationDesignator boolean in the table isn't going to work, because the code never consults it as it should. (And I'm sure we don't want to delete these AlternativePersistentIdentifiers completely - because we still want the handles to work; in case anyone has them bookmarked, etc.)

This is what we have in the code, in Dataset.java:

public String getIdentifierForFileStorage(){
   String retVal = getIdentifier(); 
   if (this.getAlternativePersistentIndentifiers() != null && !this.getAlternativePersistentIndentifiers().isEmpty()) {
      for (AlternativePersistentIdentifier api : this.getAlternativePersistentIndentifiers()) {
         retVal = api.getIdentifier();
       }
   }
   return retVal;
}

Similar code in getAuthorityForStorage(). All it needs is an if (api.isStorageLocationDesignator()) around the retVal = api.getIdentifier();. Otherwise the above always uses an alternative identifier for storage; but also note that if a dataset has more than one, it will end up using the last one (randomly). So this is only working for us, because we don't have any datasets with more than one alternative id; and because we've been using every one of them for storage.

See the deprecated method getFileSystemDirectory() in that class for an example of the right logic.

The text was updated successfully, but these errors were encountered:

djbrooke · 2022-01-12T19:33:36Z

We could investigate storing under DB ID instead of DOI (as that should not change)
This is a code-only change and won't require any changes as part of the release. No script required to move files around - the files can stay where they are

…ating file directories based on Alternative Persistent Identifiers (#7925)

PaulBoon mentioned this issue Sep 22, 2021

Thumbnails aren't created during s3/direct upload #7749

Open

djbrooke added this to Up Next 🛎 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 12, 2022

djbrooke added the Small label Jan 12, 2022

landreev moved this from Up Next 🛎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 19, 2022

landreev self-assigned this Jan 19, 2022

landreev added a commit that referenced this issue Jan 19, 2022

Added code to check the isStorageLocationDesignator boolean, when loc…

9517d0a

…ating file directories based on Alternative Persistent Identifiers (#7925)

landreev mentioned this issue Jan 19, 2022

Fix for the bug in the implementation of storage dirs based on Alternative Persistent Identifiers #8354

Merged

landreev removed this from IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jan 19, 2022

kcondon closed this as completed in #8354 Jan 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

landreev commented Jun 6, 2021

djbrooke commented Jan 12, 2022 •

edited

Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

Use of AlternativePersistentIdentifiers for storage location not properly implemented #7925

Comments

landreev commented Jun 6, 2021

djbrooke commented Jan 12, 2022 • edited

djbrooke commented Jan 12, 2022 •

edited