Structural and Checksum Validation: a quick walk through
This is a quick-ish walkthrough of the Moab validation code, from higher level moab-versioning methods to the actual checksum calculation calls. (accurate as of 2021-05-21)
Below is the script to validate a Moab from the README. Note that it can be used for any Moab anywhere as long as the configuration can point to the correct location.
It only requires the moab-versioning gem and rails console
Usage:
- Copy the script below to a box with Ruby installed.
- Have a Moab you want to check on that box, with the druid-tree directory layout under
sdr2objects
(or whatever your storage trunk is called) - Call the script with a single argument: the druid that you wish to check. E.g.:
[~/ruby ]$ ruby moab_check.rb bb294sf0065
Script:
# moab_check.rb
require 'moab'
require 'moab/stanford'
require 'druid-tools'
Moab::Config.configure do
storage_roots ['/pres-01', '/pres-02', '/pres-03' ]
storage_trunk 'sdr2objects'
deposit_trunk 'deposit'
path_method :druid_tree
end
# Read druid from command line arg.
druid = "druid:#{ARGV[0]}"
# druid = 'cq580gn5234'
moab_path = Moab::StorageServices.object_path(druid)
puts "#{druid} found at #{path}"
moab = Moab::StorageObject.new(druid, moab_path)
# Validation checks for file existence, but not content, of a well-formed Moab.
# It does not read files or perform checksum validation.
object_validator = Stanford::StorageObjectValidator.new(moab)
validation_errors = object_validator.validation_errors # Returns an array of hashes with error codes
puts "\nChecking stuctural validition of #{druid}\n"
if validation_errors.empty?
puts "\nYay! Moab #{moab.digital_object_id} passed structural validation.\n"
else
puts validation_errors
end
puts "\n"
# Iterate thru each moab version and perform verification. This includes discovery and checksum verification of files.
moab.version_list.each do |ver|
puts "\nChecking version #{ver.version_id}\n"
# add to_hash(verbose: true) or .to_json for more details on each
puts "\nVerify signature catalog (ensures all files listed in signatureCatalog.xml exist)\n"
puts ver.verify_signature_catalog.to_hash
# verify_version_storage includes:
# verify_manifest_inventory, (which computes and compares v000x/manifest file checksums)
# verify_version_inventory,
# verify_version_additions (which computes v000x/data file checksums and compares them with values in signatureCatalog.xml)
puts "\nVerify version storage (includes checksum validation of v000x/data and v000x/manifest files)\n"
puts ver.verify_version_storage.to_hash
end
StorageObjectValidator#validation_errors
(detect structural, file layout, expected file/directory, etc errors for the moab as a whole)
https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/storage_object_validator.rb#L44-L50 (calls check_correctly_named_version_dirs
, check_sequential_version_dirs
, check_correctly_formed_moab
)
inits a VerificationResult
for the druid version
calls FileInventory.new(type: 'directory').inventory_from_directory(data_directory)
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/storage_object_version.rb#L315
calls FileGroup.new(group_id: group_id).group_from_directory(data_dir)
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_inventory.rb#L180
calls harvest_directory(directory, true)
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_group.rb#L190
calls add_physical_file
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_group.rb#L214
calls FileSignature.new.signature_from_file(pathname)
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_group.rb#L232
calls FileSignature.from_file(pathname)
-- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_signature.rb#L169
which actually computes the checksums using the file paths -- https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/file_signature.rb#L73-L89
also called directly by the script, see below for how they work. prob redundant to call them again from the script?
calls FileInventory.new.inventory_from_directory(@version_pathname.join('manifests'), 'manifests')
-- see verify_version_storage
walkthrough for explanation of how this computes checksums for the directory it's given. this time called on the manifests
dir instead of the data
dir.
loads signature catalog, returns "true if files & signatures listed in version inventory can all be found". my read of the code is that this only looks for the presence of the files listed in signature catalog, and that it doesn't compute checksum values anew from content of the files as they are on disk when this is run.
...or maybe it validates checksums by re-computing from the files? a little hard to tell without digging into https://github.com/sul-dlss/moab-versioning/blob/1cfe59cfb4f9fea7a1cc35b936a8f5a7ddd94029/lib/moab/storage_object_version.rb#L284-L296 (esp catalog_entry = signature_catalog.signature_hash[file.signature]
)
this really does appear to just check for the presence of all the files listed in signature catalog.