Skip to content
This repository has been archived by the owner on Mar 14, 2022. It is now read-only.

TACO Internal Processing Steps

Christina Harlow edited this page Apr 2, 2018 · 20 revisions

Nota bene: The higher level expected processing that guides this analysis is that we are Object-focused (i.e. a client would provide low-level calls to TACO API via processing the Object first, then the File or Collection).

THIS IS ACTIVELY BEING REDESIGNED AND SHOULD NOT BE CONSIDERED VALID

Taken from the original work in the Prototype Processing Framework.

Questions Answered by this Documentation

  1. Clarify what processing steps happen when + where within TACO Code
    1. What other code (using the TACO codebase packages framing) is recommended for each step.
    2. What other data specs are required for each step.
  2. What data serialization is required for each step: Recommendation is to use map[string]interface{}.
  3. Can we separate out Swagger (+ generated code), MAPs, Example updates to make PRs / shifts smaller.

—————————————————

Deposit Resource

  • Input: operations.DepositResourceParams (struct) > Params
  • Request handling: save Params data to variables:
    • Save Params.Payload (map[string]interface{}) to Resource variable
    • Save params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable
    • Save resource["@type"] value (URI string) to Resourcetype (string, controlled value) variable
    • If any of the above are empty or an unrecognized value (for Resourcetype), return data error to Client.
  • Permissions check:
    • Pass Agent, Resourcetype, Deposit to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Validation Checks: Based on Resourcetype value, map the Resource to the appropriate deposit validation call
    • Input: Resource (map[string]interface{}) & Resourcetype (string)
    • Encode / Marshal Resource (map[string]interface{}) to a JSON String
    • Run JSON Schema Validation on the Resource JSON string against its maps/DepositType.json (knows which to validate against due to Resourcetype)
      • If invalid, return data error to API caller (e.g. “Missing %v”, field)
      • If valid, proceed (below)
    • Run any other type-dependent validation checks
      • If Resourcetype is Collection or Object:
        • Check Resource (map[string]interface{}) for Identification.SourceId (string: map > string: string, will require casting)
          • If Resource.Identification.SourceId exists, retrieve & save to sourceId (string):
            • Call to Dynamo’s SourceId Secondary Index to check that sourceId value does not already exist
              • If it already exists, return error to API caller (i.e. “Item is not unique; source ID already exists.”)
              • If it does not, proceed
          • If Identification.SourceId is not in Resource, proceed
      • If Resourcetype == Collection:
        • Check if Resource for a Collection has Structural.HasMember (array of string, druid + version)
          • If so, check that all members are either type Collection or type Object
          • If not, proceed
      • If Resourcetype == Object:
        • Check if Resource for a Object has Structural.HasMember (array of string, druid + version)
          • If so, check that all members have @type == Object URI
          • If not, proceed
        • Check if Resource for a Object has Structural.Contains (array of string, UUID + version)
          • If so, check that all contained resources have @type == Fileset URI
          • If not, proceed
      • If Resourcetype == Fileset:
        • Check if Resource for a Fileset has Structural.ContainedBy (array of string, druid + version)
          • If so, check that all parents (ContainedBys) ave @type == Object URI
          • If not, return structural data error (e.g. "You are creating a Fileset without a representative Object")
      • If Resourcetype == File:
        • Check if Resource for a File has Structural.ContainedBy (array of string, UUID + version)
          • If so, check that all parents (ContainedBys) have @type == Fileset URI
          • If not, return structural data error (e.g. "You are creating a File without a container Fileset")
  • Identifiers Minting: Request a new ID(s) for Resource based on Resourcetype
    • If ResourceType == Collection or Object, request a DRUID (SDR Identifier)
      • If DRUID minted, assert the DRUID to Resource at Identification.Identifier (map, string) (requires casting)
      • If DRUID cannot be minted, error out / return a process cannot be completed at this time error to client.
    • If ResourceType == Fileset or File, generate a UUID (SDR Identifier)
      • If UUID generated, assert the UUID to Resource at Identification.Identifier (map, string) (requires casting)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
    • For all resourceType values, generate a secondary UUID (TACO Internal Identifier)
      • If UUID can be minted, asserted the UUID to Resource at Id (string)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
  • System Data Munging:
    • Add Version information
      • On Resource, assert version (top level field) == 1 (integer).
      • On Resource, assert currentVersion (top level) == True (boolean).
    • Add System Metadata
      • On Resource, assert depositor (top level field, Agent object) using Agent value (open policy question here).
  • Persist Resource Metadata:
    • Write Resource (map[string]interface{}) to DynamoDB.
  • Notify Processing Stream:
    • Send Message to Kinesis Router Stream with TACO Internal Identifier (Resource.Id [string, UUID]), Resourcetype, & Action == Deposit (this may and probably will continue to evolve).
  • Return to client the SDR Identifier (Resource.identification.identifier [string, DRUID]).

—————————————————

Deposit File

  • Input: operations.DepositFileParams (struct) > Params
  • Request handling: save Params data to variables:
    • Save Params.Upload (runtime.File) to Binary variable
    • Save Params.Filename (runtime.Filename) to Filename variable
    • Save Params.FilesetID (string) to FilesetID variable
    • Save Params.MIMEType value (string, MIME Type) to FileMIMEType variable
    • Save (always) File value (string) to ResourceType variable
    • Save params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable
    • If any of the above are empty or an unrecognized value, return data error to Client.
  • Permissions check:
    • Pass Agent, Resourcetype, Deposit, Context (FilesetID) to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Build Minimal File (metadata) Resource:
    • Create a Resource (map[string]interface{}) instance
    • On Resource:
      • Assert Filename on Resource as label
      • Assert MIMEType on Resource as hasMimeType
      • Assert Resourcetype on Resource as @type (need to generate File type URI)
      • Assert FilesetID on Resource as structural.isContainedBy (map, string)
      • Assert File processing defaults:
        • Access.Access set to Dark
        • Access.Download set to Dark
        • Admin.Preserve set to False
  • Identifiers Minting: Request a new ID(s) for Resource based on Resourcetype
    • Generate a UUID (SDR Identifier) & Assert UUID on Resource at Identification.Identifier (map, string)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
    • Generate a secondary UUID (TACO Internal Identifier) & Assert the UUID to Resource at Id (string)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
  • Persist File Binary:
    • Save Binary to the persistence file store
    • Assert the returned filestore address on Resource as file-location
  • System Data Munging:
    • Add Version information
      • On Resource, assert version (top level field) == 1 (integer).
      • On Resource, assert currentVersion (top level) == True (boolean).
    • Add System Metadata
      • On Resource, assert depositor (top level field, Agent object) using Agent value (open policy question here).
  • Persist File Resource Metadata:
    • Write Resource (map[string]interface{}) to DynamoDB.
  • Notify Processing Stream:
    • Send Message to Kinesis Router Stream with TACO Internal Identifier (Resource.Id [string, UUID]), File, & Action == Deposit (this may and probably will continue to evolve).
  • Return to client the SDR Identifier (Resource.identification.identifier [string, UUID]).

—————————————————

Retrieve Resource

  • Input: operations.RetrieveResourceParams (struct) > Params
  • Request handling: save Params data to variables:
    • params.ID to variable requestedID (string): provided resource identifier to retrieve.
    • params.Version to variable requestedVersion (integer | null): provided version of resource to retrieve.
    • params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable.
  • Query Resource (ID, Version) from Metadata Persistence Layer:
    • If Version exists / was provided:
      • Query DynamoDB resources table for id == ID, version == Version
      • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
      • if no records are returned from Dynamo:
        • return a Resource Not Found error to client
      • if a record is found from Dynamo:
        • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
        • Return GenericResource to Resource
    • If Version is null / was not provided:
      • Query DynamoDB resources table for id == ID && currentVersion == true
      • if no records are returned from Dynamo:
        • return a Resource Not Found error to client
      • if a record is found from Dynamo:
        • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
        • Return GenericResource to Resource
  • Permissions check:
    • Pass Agent, Resourcetype, Retrieve, requestedID to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Save Resource (map[string]interface{}) as instance of models.GenericResource
  • Return to client Resource as payload through operations.NewRetrieveResourceOK().WithPayload(response)

—————————————————

Update Resource

  • Input: operations.DepositResourceParams (struct) > Params
  • Request handling: save Params data to variables:
    • params.ID to variable requestedID (string): provided resource identifier to retrieve.
    • params.Payload (map[string]interface{}) to NewResource variable
    • params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable
    • resource["@type"] value (URI string) to Resourcetype (string, controlled value) variable
    • If any of the above are empty or an unrecognized value (for Resourcetype), return data error to Client.
  • Query Resource (ID, Version) from Metadata Persistence Layer:
    • Query DynamoDB resources table for id == ID && currentVersion == true
    • If no records are returned from Dynamo:
      • Return a Resource Not Found error to client
    • if a record is found from Dynamo:
      • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
      • Return GenericResource to ExistingResource
  • Permissions check:
    • Pass Agent, Resourcetype, Update, requestedID to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Validation Checks: Based on Resourcetype value, map the NewResource to the appropriate validation call
    • Input: NewResource (map[string]interface{}) & Resourcetype (string)
    • Encode / Marshal NewResource (map[string]interface{}) to a JSON String
    • Run JSON Schema Validation on the NewResource JSON string against its maps/[ResourceType].json
      • If invalid, return data error to API caller (e.g. “Missing %v”, field)
      • If valid, proceed (below)
    • Run any other type-dependent validation checks
      • If Resourcetype is Collection or Object:
        • Check NewResource (map[string]interface{}) for a new or changed Identification.SourceId (string: map > string: string, will require casting)
          • If NewResource.Identification.SourceId exists and is changed, retrieve & save to sourceId (string):
            • Call to Dynamo’s SourceId Secondary Index to check that sourceId value does not already exist (except for OldRecord)
              • If it already exists, return error to API caller (i.e. “Item is not unique; source ID already exists.”)
              • If it does not, proceed
          • If Identification.SourceId is not in NewResource or has not changed from ExistingResource, proceed
      • If Resourcetype == Collection:
        • Check if NewResource for a Collection has new or changed Structural.HasMember (array of string, druid + version)
          • If so, check that all members are either type Collection or type Object
          • If not (or it hasn't changed from ExistingResource), proceed
      • If Resourcetype == Object:
        • Check if Resource for a Object has new or changed Structural.HasMember (array of string, druid + version)
          • If so, check that all members have @type == Object URI
          • If not (or it hasn't changed from ExistingResource), proceed
        • Check if Resource for a Object has new or changed Structural.Contains (array of string, UUID + version)
          • If so, check that all contained resources have @type == Fileset URI
          • If not (or it hasn't changed from ExistingResource), proceed
      • If Resourcetype == Fileset:
        • Check if Resource for a Fileset has new or changed Structural.ContainedBy (array of string, druid + version)
          • If so, check that all parents (ContainedBys) ave @type == Object URI
          • If not, return structural data error (e.g. "You are creating a Fileset without a representative Object")
          • If it hasn't changed from ExistingResource, proceed
      • If Resourcetype == File:
        • Check if Resource for a File has new or changed Structural.ContainedBy (array of string, UUID + version)
          • If so, check that all parents (ContainedBys) have @type == Fileset URI
          • If not, return structural data error (e.g. "You are creating a File without a container Fileset")
          • If it hasn't changed from ExistingResource, proceed
  • System Data Munging:
    • Handle Version changes
      • If NewResource.version (integer) is same as ExistingResource.version (integer):
        • Proceed with overlaying NewResource on ExistingResource
          • Add System Metadata:
            • On merged NewResource, assert Administrative.remediatedBy (Agent object) using Agent value.
          • Persist Resource Metadata:
            • Write merged NewResource (map[string]interface{}) to DynamoDB.
          • Notify Processing Stream:
            • Send Message to Kinesis Router Stream with TACO Internal Identifier (NewResource.Id [string, UUID]), Resourcetype, & Action == Update (this may and probably will continue to evolve).
      • If NewResource.version (integer) differs from ExistingResource.version (integer):
        • Proceed with overlaying NewResource on ExistingResource
        • Create new DynamoDB record with merged NewResource:
          • Identifier Minting: Request a new TACO Internal ID for merged NewResource
            • Generate a secondary UUID (TACO Internal Identifier) & overwrite the UUID to Resource at Id (string)
              • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
          • Update Version Metadata on NewResource:
            • On merged NewResource, assert version (top level field) == NewResource.version + 1 (integer).
            • On merged NewResource, assert currentVersion (top level) == True (boolean).
            • On merged NewResource, assert precedingVersion (top level) == OldResource.id (string, UUID).
          • Add System Metadata:
            • On merged NewResource, assert Administrative.remediatedBy (Agent object) using Agent value.
          • Persist Resource Metadata:
            • Write merged NewResource (map[string]interface{}) to DynamoDB.
          • Notify Processing Stream:
            • Send Message to Kinesis Router Stream with TACO Internal Identifier (NewResource.Id [string, UUID]), Resourcetype, & Action == NewVersion (this may and probably will continue to evolve).
        • Update existing DynamoDB record for ExistingResource:
          • Update Version Metadata on NewResource:
            • On ExistingResource, assert currentVersion (top level) == False (boolean).
            • On ExistingResource, assert followingVersion (top level) == merged NewResource.id (string, UUID).
          • Update System Metadata:
            • On ExistingResource, assert Administrative.remediatedBy (Agent object) using Agent value.
            • On ExistingResource, assert Access.access, Access.download == Dark (need to confirm).
            • On ExistingResource, assert Permissions to admin only editing (need to confirm).
          • Persist Resource Metadata:
            • Write updated ExistingResource (map[string]interface{}) to DynamoDB.
          • Notify Processing Stream:
            • Send Message to Kinesis Router Stream with TACO Internal Identifier (ExistingResource.Id [string, UUID]), Resourcetype, & Action == OldVersion (this may and probably will continue to evolve).
      • Return to client the SDR Identifier (NewResource.identification.identifier [string, DRUID], should not have changed).

—————————————————

Update File (WIP)

  • Input: operations.DepositFileParams (struct) > Params
  • Request handling: save Params data to variables:
    • Save Params.ID (string) to ResourceID variable
    • Save Params.Upload (runtime.File) to Binary variable
    • Save Params.Filename (runtime.Filename) to Filename variable
    • Save Params.MIMEType value (string, MIME Type) to FileMIMEType variable
    • Save (always) File value (string) to ResourceType variable
    • Save params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable
    • If any of the above are empty or an unrecognized value, return data error to Client.
  • Query File [metadata] Resource (ID, Version) from Metadata Persistence Layer:
    • Query DynamoDB resources table for id == ID && currentVersion == true
    • If no records are returned from Dynamo:
      • Return a Resource Not Found error to client
    • if a record is found from Dynamo:
      • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
      • Return GenericResource to ExistingResource
    • Save DynamoDB response as Resource
    • Overlay Resource with Params.Filename, Params.MIMEType
    • Retrieve Resource.structural.isContainedBy and save to FilesetID.
  • Permissions check:
    • Pass Agent, Resourcetype, Update, Context (FilesetID) to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Identifiers Minting: Request a new ID(s) for Resource based on Resourcetype
    • Generate a UUID (SDR Identifier) & Assert UUID on Resource at Identification.Identifier (map, string)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
    • Generate a secondary UUID (TACO Internal Identifier) & Assert the UUID to Resource at Id (string)
      • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
  • Persist New File Binary:
    • Save Binary to the persistence file store
    • Assert the returned filestore address on Resource as file-location
  • Validation Checks: Based on Resourcetype value, map the NewResource to the appropriate validation call
    • Run JSON Schema Validation on the Resource as encoded JSON string against its maps/File.json
      • If invalid, return data error to API caller (e.g. “Missing %v”, field)
      • If valid, proceed (below)
    • Run any other type-dependent validation checks
      • Check if Resource for a File has new or changed Structural.ContainedBy (array of string, UUID + version)
        • If so, check that all parents (ContainedBys) have @type == Fileset URI
        • If not, return structural data error (e.g. "You are creating a File without a container Fileset")
        • If it hasn't changed from ExistingResource, proceed
  • System Data Munging:
    • Handle Version changes (new binary File always triggers a version uptick)
      • Create new DynamoDB record with merged Resource:
        • Identifier Minting: Request a new TACO Internal ID for merged Resource
          • Generate a secondary UUID (TACO Internal Identifier) & overwrite the UUID to Resource at Id (string)
            • If UUID cannot be generated, error out / return a process cannot be completed at this time error to client.
        • Update Version Metadata on Resource:
          • On merged Resource, assert version (top level field) == NewResource.version + 1 (integer).
          • On merged Resource, assert currentVersion (top level) == True (boolean).
          • On merged Resource, assert precedingVersion (top level) == OldResource.id (string, UUID).
        • Add System Metadata:
          • On merged Resource, assert Administrative.remediatedBy (Agent object) using Agent value.
        • Persist Resource Metadata:
          • Write merged Resource (map[string]interface{}) to DynamoDB.
        • Notify Processing Stream:
          • Send Message to Kinesis Router Stream with TACO Internal Identifier (Resource.Id [string, UUID]), Resourcetype, & Action == NewVersion (this may and probably will continue to evolve).
      • Update exsting DynamoDB record for ExistingResource:
        • Update Version Metadata on NewResource:
          • On ExistingResource, assert currentVersion (top level) == False (boolean).
          • On ExistingResource, assert followingVersion (top level) == merged NewResource.id (string, UUID).
        • Update System Metadata:
          • On ExistingResource, assert Administrative.remediatedBy (Agent object) using Agent value.
          • On ExistingResource, assert Access.access, Access.download == Dark (need to confirm).
          • On ExistingResource, assert Permissions to admin only editing (need to confirm).
        • Persist Resource Metadata:
          • Write updated ExistingResource (map[string]interface{}) to DynamoDB.
        • Notify Processing Stream:
          • Send Message to Kinesis Router Stream with TACO Internal Identifier (ExistingResource.Id [string, UUID]), Resourcetype, & Action == OldVersion (this may and probably will continue to evolve).
      • Return to client the SDR Identifier (NewResource.identification.identifier [string, DRUID], should not have changed).

—————————————————

Delete Resource

  • Input: operations.RetrieveResourceParams (struct) > Params
  • Request handling: save Params data to variables:
    • params.ID to variable requestedID (string): provided resource identifier to retrieve.
    • params.Version to variable requestedVersion (integer | null): provided version of resource to retrieve.
    • params.HTTPRequest.Header.["On-Behalf-Of"] (string) to Agent variable.
  • Query Resource (ID, Version) from Metadata Persistence Layer:
    • If Version exists / was provided:
      • Query DynamoDB resources table for id == ID, version == Version
      • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
      • if no records are returned from Dynamo:
        • return a Resource Not Found error to client
      • if a record is found from Dynamo:
        • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
        • Return GenericResource to Resource
    • If Version is null / was not provided:
      • Query DynamoDB resources table for id == ID && currentVersion == true
      • if no records are returned from Dynamo:
        • return a Resource Not Found error to client
      • if a record is found from Dynamo:
        • Save response map[string]*dynamoAttribute to GenericResource map (i.e. map[string]interface{})
        • Return GenericResource to Resource
  • Permissions check:
    • Pass Agent, Resourcetype, Delete, requestedID to Permissions Service via HTTP call
      • If response from Permissions Service is True, proceed
      • If response from Permissions Service is False, return a permissions error response to User (“You are not allowed etc.”)
  • Delete Resource:
    • If ResourceType == Collection or Object or Fileset:
      • Delete that DynamoDB record that matches id == requestedID (string) and Resource.version == Version (integer)
    • If ResourceType == File:
      • Delete the S3 file at Resource.location (string, IRI).
      • Delete that DynamoDB record that matches id == requestedID (string) and Resource.version == Version (integer)
  • Notify Processing Stream:
    • Send Message to Kinesis Router Stream with TACO Internal Identifier (Resource.Id [string, UUID]), Resourcetype, & Action == Delete (this may and probably will continue to evolve).
  • Return to client Resource.Id as payload through operations.NewRetrieveResourceOK().WithPayload(response)