-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deposit_bag_validator class added #122
Conversation
4fef405
to
fa918f9
Compare
In case it helps, some vague parallels here: Methods:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you got the main test cases for errors.
There are some edge cases I can think of but I think they might be too edge casey. I went back and forth from your methods and the methods in which you abstracted them and everything looks good.
PS. I don't mind the name payload
but if others want it to be changed then thats fine.
Edit: Went through specs. This looks good to me. I would like to see this get tested, maybe inserting this code in place of the existing validate-bag
step in the sdrIngestWF
.
- code has at least as much actual validation of the bag as the original
- code makes sense (suggestions for improvements welcome!)
- tests check what they should
- test code makes sense (ditto)
# returns Array of tiny error hashes, allowing multiple occurrences of a single error code | ||
def validation_errors | ||
return [single_error_hash(BAG_DIR_NOT_FOUND, bag_dir: deposit_bag_pathname)] unless deposit_bag_pathname.exist? | ||
return result_array unless required_bag_files_exist? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want an error code here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required_bag_files_exist?
is a method that adds errors to result_array if all required files aren't present. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your question points out that that method is doing two things: writing errors and answering a boolean. perhaps that is bad.
end | ||
end | ||
|
||
def verify_manifest_checksums(manifest_type, manifests_checksum_hash, generated_checksum_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verify_manifest_checksums
is the combination of verify_manifests
and manifest_diff
. I see manifest_diff
uses FileFixity.new(file_id: file_id)
and we are using an empty hash here. Just wanted some clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class FileFixity
def initialize(options=nil)
@checksums=Hash.new
options = {} if options.nil?
options.each do |key,value|
#instance_variable_set("@#{key}", value)
send "#{key}=", value
end
end
.....
So they are passing in file_id
to this hash obj but we are leaving it empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the original, they use this "FileFixity" structure as a mean of associating a file name with its checksums. And there is a bunch of code about making FileFixity a class that can be used as a hash key ... but it never is.
When I unraveled what was going on, it appeared we needed something simpler: a hash with the relative file name as the key, and a hash of checksums as the value. No need for all the indirection.
We don't have to worry about file name clashes, because they can't happen with relative file names in the bag. And we are only verifying the checksums that exist for the file, so it seemed expedient to simply work with those.
It may be that some comments would be helpful explaining the hashes.
It may also be that I am wrong in my assessment.
A further note: FileFixity in archive-utils is similar to FileSignature in moab-versioning. Since I was already up to my neck in alligators, I did not take the next step to using existing code in moab-versioning. I saw that the bag validation code was totally separate for unknown reasons so I kept my focus there and didn't look to re-use moab-versioning existing classes. Perhaps that was a bad decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Perhaps it was a bad decision that the original ingest-robots didn't leverage the moab-versioning code for the bag validation, since they do leverage it for deposit-bag. I have no idea why the existing production code is so inconsistent and strange. -- wait -- yes, yes I do. Le sigh.)
storage_obj = instance_double(Moab::StorageObject, deposit_bag_pathname: bag_pathname, current_version_id: 0) | ||
dbv = described_class.new(storage_obj) | ||
code = described_class::CHECKSUM_MISMATCH | ||
from_file = '6666666666666666666666666666666666666666666666666666666666666666' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have 3 small nitpicks. if you think they're off base, or would prefer to ticket them, or think they're a good point but not worth the effort, i'm happy to just approve and merge.
nice work, definitely more readable, and likely more maintainable!
pathname = deposit_bag_pathname.join(filename) | ||
result_array << single_error_hash(REQUIRED_FILE_NOT_FOUND, file_pathname: pathname) unless pathname.exist? | ||
end | ||
result_array.empty? ? true : false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not totally opposed to this, but sort of assumes that required_bag_files_exist?
will be called before any other checks, since it assumes result_array
will be empty if it didn't add anything, right? comment maybe? is a local result array overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding comment - good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty?
already returns true
or false
. No need to reinterpret that, right?
lib/moab/deposit_bag_validator.rb
Outdated
nil | ||
end | ||
|
||
def verify_version_in_xml_file(file_pathname, found) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about verify_version_from_xml_file
? small difference, but every time i read the method name, it made me think it was going to read the xml, but it's doing stuff with already-extracted values.
lib/moab/deposit_bag_validator.rb
Outdated
diff_hash = {} | ||
# NOTE: these are intentionally & and | instead of && and || | ||
matching_checksum_types = (left_checksums.keys & right_checksums.keys) | ||
matching_checksum_types = (left_checksums.keys | right_checksums.keys) if matching_checksum_types.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about... checksum_types_to_check
? checksum_types_to_match
? checksum_types_to_comp
?
fa918f9
to
088b957
Compare
@jmartin-sul I believe I have addressed your concerns. Thanks for the 👀 . The code is better with the small tweaks. |
This is a Shameless Green implementation of Moab::DepositBagValidator, to support the validate_bag robot in sul-dlss/preservation_robots (which will be a replacement for the old robot code in sul-dlss/sdr-preservation-core).
The logic represented by this code was originally in:
The code's new home is here because the common_accessioning robots ultimately calls (via dor-services preservable concern ...) Moab::Bagger in this moab-versioning gem to create the deposit bag. So it makes sense to have the bag validation code live here as well.
Please check for:
closes #117
connects to sul-dlss/preservation_robots#11