Skip to content
This repository has been archived by the owner on Mar 14, 2022. It is now read-only.

Robots & Workflows Replacement Analysis

Christina Harlow edited this page Apr 30, 2018 · 1 revision

Current Robots + Workflows Steps

Reviewed for first pass of mapping required processing steps to our TACO Processing Framework.

Core

Assembly (Items Only): assemblyWF (robot suite | workflow definition | config.yaml)

  1. Start-assembly (Initiate assembly of the object)
    1. Just starts assembly workflow
  2. Content-metadata-create (method) (Create content-metadata from stub content metadata if it exists)
    1. Checks resource is an Item (Object)
    2. Raise an issue if both stub content metadata + full content metadata exists
    3. Checks if full content metadata exists (as to not overwrite it)
    4. If stub content metadata exists: iterate through list of Files and add assertions to Item for each file contained, order, filepath
    5. If no content metadata exists: queries the current filepath / directory for all files, iterates through that lists and adds files to Item metadata
  3. Jp2-create (method) (Create JP2 derivatives for any images in object)
    1. For each supported image type that is part of specific resource types, generate a jp2 derivative and modify content metadata XML to reflect the new file.
    2. For an Item, grab all files matching a supported type
    3. Create a new Image object for the JP2
    4. Fail / stop job if a JP2 for the starting / original File already exists
    5. Create the JP2
    6. Rename the JP2 based off starting / original File’s name
    7. Associate with Image object and Item object
  4. Checksum-compute (method) (Compute and compare checksums for any files referenced in contentMetadata)
    1. For each file in the Item (Object):
    2. Compute new checksum (mda5 and sha1)
    3. If File metadata contains a checksum, compare & raise error if fails
    4. Otherwise, assert checksum in File metadata
  5. Exif-collect (method) (Calculate and add exif, mimetype, file size and other attributes to each file node in contentMetadata)
    1. For each file in the Item (Object):
    2. Add the following regardless of Item type:
      1. mimetype (unless already exists)
      2. filesize (unless already exists)
      3. Add preservation, release information based on mimetype unless that information provided
        1. NEEDS MORE INFO
        2. preserve?
        3. publish?
        4. Shelve?
    3. If image, adds image info (height + width) metadata
    4. Otherwise, just ensures type in metadata is ‘File’
  6. Accessioning-initiate (method) (Initiate workspace and start common accessioning)
    1. Creates druid-compliant workspace via dor-services
    2. Posts resource to that workspace via POST to dor-services
    3. kicks off accessioning / accessionWF via dor-workflow-services

Accession: accessionWF (robot suite | workflow definition)

  1. start-accession (Start Accessioning)
    1. Just starts common accessioning / accession workflow
  2. Descriptive-metadata (method) (default XML) (Descriptive Metadata)
    1. Check that newer descriptive metadata data stream doesn’t exist
    2. Builds a descriptive metadata datastream in Fedora based on a file in the local workspace
  3. Rights-metadata (method) (default XML) (Rights Metadata)
    1. Check that newer descriptive metadata data stream doesn’t exist
    2. Builds a rights metadata datastream in Fedora based on a file in the local workspace
  4. content-metadata (method) (default XML) (Content Metadata)
    1. Check that newer descriptive metadata data stream doesn’t exist
    2. Builds a content metadata datastream in Fedora based on a file in the local workspace
  5. technical-metadata (method) (Technical Metadata)
    1. Check that newer descriptive metadata data stream doesn’t exist
    2. Builds a technical metadata datastream in Fedora based on a file in the local workspace
  6. shelve (method) (Shelve content in Digital Stacks)
    1. Determine if Files have changed between the object save and what is in persistence
    2. Determine the location of the Files in persistence
    3. Determine the workspace location of the Files to be added / removed
    4. Either:
      1. Remove the file from Stacks
      2. Rename the file in Stacks
      3. Or Move the file into Stacks
  7. publish (method) (Publish Metadata)
    1. Check rights > access > discover for world
    2. If so, copies copies public_xml over to Purl’s cache
    3. Otherwise, prunes from current document cache
  8. provenance-metadata (method) (Provenance Metadata)
    1. Add repository, object DRUID, who / workflow process, event text to provenance streams
  9. sdr-ingest-transfer (method) (Initiate Ingest into Preservation)
    1. Transfers an object to SDR Ingest Service. Appears to have once required an Agreement for this, but robot passes in “”.
  10. sdr-ingest-received (method) (Signal from SDR that object has been received)
    1. ??
    2. Receive notice the ingest was received (where?)
  11. reset-workspace (method) (Reset workspace by renaming the druid-tree to a versioned directory)
    1. Clean up workspace based on DRUID and version
  12. End-accession (method) (Clean up any diff caches and set disseminationWF:cleanup to waiting)
    1. Search for a special additional dissemination workflow from the object’s APO
    2. Moves on to dissemination

Dissemination: disseminationWF (robot suite [??] | workflow definition)

  1. Clean up work space?

Other

gisAssemblyWF

  • Start-gis-assembly-workflow
  • Register-druid
  • Author-metadata
  • Approve-metadata
  • extract-thumbnail
  • extract-iso19139
  • generate-geo-metadata
  • generate-mods
  • assign-placenames
  • finish-metadata
  • wrangle-data
  • approve-data
  • package-data
  • normalize-data
  • extract-boundingbox
  • finish-data
  • generate-content-metadata
  • load-geo-metadata
  • finish-gis-assembly-workflow
  • start-assembly-workflow
  • Start-delivery-workflow

eemsAccessionWF

  • Register-object
  • submit-tech-services
  • eems-transfer
  • submit-marc
  • check-marc
  • catalog-status
  • other-metadata
  • Start-accession

hydrusAssemblyWF

  • start-deposit
  • submit
  • approve
  • Start-assembly

etdSubmitWF

  • register-object
  • submit
  • reader-approval
  • registrar-approval
  • submit-marc
  • check-marc
  • catalog-status
  • other-metadata
  • start-accession
  • Binder-transfer

wasSeedPreassemblyWF

  • start
  • build-was-seed-druid-tree
  • desc-metadata-generator
  • thumbnail-generator
  • content-metadata-generator
  • End-was-seed-preassembly

wasCrawlPreassemblyWF

  • start
  • build-was-crawl-druid-tree
  • metadata-extractor
  • content-metadata-generator
  • technical-metadata-generator
  • desc-metadata-generator
  • End-was-crawl-preassembly

registrationWF

  • register
  • Digitization

digitizationWF

  • initiate
  • digitize
  • Start-accession

dpgImageWF

  • initiate
  • tracking_db
  • scan
  • completeness
  • postprocessing
  • imageqc
  • import_files
  • md5_gen
  • copy_to_assembly
  • md5_verify_assembly
  • delete_scratch
  • Digitized

goobiWF

  • start
  • goobi-notify

versioningWF

  • start-version
  • submit-version
  • start-accession

gisDeliveryWF

  • start-gis-delivery-workflow
  • load-vector
  • load-raster
  • load-geoserver
  • load-geowebcache
  • seed-geowebcache
  • finish-gis-delivery-workflow
  • Start-gis-discovery-workflow

releaseWF

  • start
  • release-members
  • release-publish
  • update-marc

wasCrawlDisseminationWF

  • start
  • cdx-generator
  • cdx-merge-sort-publish
  • Path-indexer

wasDisseminationWF

  • start
  • Start-special-dissemination

wasSeedDisseminationWF

  • start
  • update-thumbnail-generator

gisDiscoveryWF

  • start-gis-discovery-workflow
  • generate-geoblacklight
  • load-geoblacklight
  • export-opengeometadata
  • Finish-gis-discovery-workflow

swIndexWF

  • indexed_to_localhost
  • indexed_to_sw-solr-test

preservationAuditWF

  • moab-valid
  • Preservation-audit

sdrAuditWF

  • Audit-verify

sdrIngestWF

  • start-ingest
  • register-sdr
  • transfer-object
  • validate-bag
  • verify-agreement
  • complete-deposit
  • update-catalog
  • create-replica
  • Ingest-cleanup

sdrMigrationWF

  • migration-complete
  • migration-metadata
  • migration-register
  • migration-start
  • migration-transfer

googleScannedBookWF

  • register-object
  • descriptive-metadata
  • google-convert
  • google-download
  • process-content
  • sdr-ingest-transfer
  • sdr-ingest-deposit
  • shelve
  • cleanup
  • sdr-ingest-archive