You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Short version:
Current implementation of designating an AlternativePersistentIdentifier for storage ignores the storageLocationDesignator boolean. It's only working in prod. because all of our existing alternative identifiers (handles) are being used for storage, AND we only have 1 of those per dataset.
TL;DR:
We have 10K+ legacy datasets with handle identifiers in prod. They all had DOIs assigned which are now their primary ids. The handles are preserved as AlternativePersistentIdentifiers. A mechanism was created to allow keeping the dataset files in their old directories/S3 pseudo folders named after the handle, by designating an AlternativePersistentIdentifier a storage location, via a boolean in the table.
As part of cleaning up our prod. storage, I wanted to finally move all such files in the S3 storage buckets into pseudofolders uniformly named after the DOIs (that would make checking on files more straightforward etc.). But realized I can't: unsetting the storageLocationDesignator boolean in the table isn't going to work, because the code never consults it as it should. (And I'm sure we don't want to delete these AlternativePersistentIdentifiers completely - because we still want the handles to work; in case anyone has them bookmarked, etc.)
This is what we have in the code, in Dataset.java:
public String getIdentifierForFileStorage(){
String retVal = getIdentifier();
if (this.getAlternativePersistentIndentifiers() != null && !this.getAlternativePersistentIndentifiers().isEmpty()) {
for (AlternativePersistentIdentifier api : this.getAlternativePersistentIndentifiers()) {
retVal = api.getIdentifier();
}
}
return retVal;
}
Similar code in getAuthorityForStorage(). All it needs is an if (api.isStorageLocationDesignator()) around the retVal = api.getIdentifier();. Otherwise the above always uses an alternative identifier for storage; but also note that if a dataset has more than one, it will end up using the last one (randomly). So this is only working for us, because we don't have any datasets with more than one alternative id; and because we've been using every one of them for storage.
See the deprecated method getFileSystemDirectory() in that class for an example of the right logic.
The text was updated successfully, but these errors were encountered:
We could investigate storing under DB ID instead of DOI (as that should not change)
This is a code-only change and won't require any changes as part of the release. No script required to move files around - the files can stay where they are
Short version:
Current implementation of designating an AlternativePersistentIdentifier for storage ignores the storageLocationDesignator boolean. It's only working in prod. because all of our existing alternative identifiers (handles) are being used for storage, AND we only have 1 of those per dataset.
TL;DR:
We have 10K+ legacy datasets with handle identifiers in prod. They all had DOIs assigned which are now their primary ids. The handles are preserved as AlternativePersistentIdentifiers. A mechanism was created to allow keeping the dataset files in their old directories/S3 pseudo folders named after the handle, by designating an AlternativePersistentIdentifier a storage location, via a boolean in the table.
As part of cleaning up our prod. storage, I wanted to finally move all such files in the S3 storage buckets into pseudofolders uniformly named after the DOIs (that would make checking on files more straightforward etc.). But realized I can't: unsetting the storageLocationDesignator boolean in the table isn't going to work, because the code never consults it as it should. (And I'm sure we don't want to delete these AlternativePersistentIdentifiers completely - because we still want the handles to work; in case anyone has them bookmarked, etc.)
This is what we have in the code, in Dataset.java:
Similar code in getAuthorityForStorage(). All it needs is an
if (api.isStorageLocationDesignator())
around theretVal = api.getIdentifier();
. Otherwise the above always uses an alternative identifier for storage; but also note that if a dataset has more than one, it will end up using the last one (randomly). So this is only working for us, because we don't have any datasets with more than one alternative id; and because we've been using every one of them for storage.See the deprecated method
getFileSystemDirectory()
in that class for an example of the right logic.The text was updated successfully, but these errors were encountered: