Skip to content
This repository has been archived by the owner on Mar 14, 2022. It is now read-only.

TACO & SDR3 Identifier schema

Christina Harlow edited this page Apr 6, 2018 · 3 revisions

DRUIDs are reserved for collections and objects, consistent across their user-defined versions. This is important for their relationship to discovery, access, and publication.

Files and filesets, however, are not identified by a DRUID. Original filenames must be preserved for users (thus are a required field in the File resource MAP). So, files can be named or identified differently for purposes of management.

SDR3 Identifier Schema

externalIdentifier

  1. Collection (includes all aggregations, such as admin sets): ⇒ own DRUID across all versions (i.e. DRUID from version 1)

  2. Object: ⇒ own DRUID across all versions (i.e. DRUID from version 1)

  3. Fileset: ⇒ own completely opaque identifier (UUID) across all versions (i.e. UUID from version 1)

  4. File: ⇒ own completely opaque identifier (UUID) across all versions (i.e. UUID from version 1)

TACO Internal Identifier Schema

tacoIdentifier, identification.identifier

All resources (Collection, Object, Fileset, File) gets an internal UUID that is unique for every version change. We have the following set up with our metadata persistence layer for managing expected lookups:

  1. database primary id == internal UUID

  2. global secondary index with hash identification.identifier (this is the DRUID or UUID) and range version (integer representing the version).

  3. global secondary index with hash identification.identifier (this is the DRUID or UUID) and range currentVersion (boolean representing whether or not this is the current version).

  4. secondary index of identification.sourceId for objects + collections to make sure we do not create a new object or collection that has the same sourceId.

Other Identifiers

  • dedupeIdentifier == a copy of the provided identification.sourceID for deduplication check purposes within TACO.

  • identification == a microschema of identification related metadata for ease of sharing, limiting, ownership outside of TACO but within SDR3.