-
Notifications
You must be signed in to change notification settings - Fork 4
Remove hash value from Datafile GCS metadata #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove hash value from Datafile GCS metadata #156
Conversation
Codecov Report
@@ Coverage Diff @@
## release/0.1.17 #156 +/- ##
==================================================
- Coverage 94.77% 94.76% -0.01%
==================================================
Files 55 55
Lines 1569 1567 -2
==================================================
- Hits 1487 1485 -2
Misses 82 82
Continue to review full report at Codecov.
|
…or/remove-hash-value-from-gcs-metadata
thclark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good; one minor question on a default value of the hash to resolve before merging
octue/resources/datafile.py
Outdated
| id=kwargs.get("id", custom_metadata.get("id", ID_DEFAULT)), | ||
| path=storage.path.generate_gs_path(bucket_name, datafile_path), | ||
| hash_value=kwargs.get("hash_value", custom_metadata.get("hash_value", metadata.get("crc32c", None))), | ||
| hash_value=metadata.get("crc32c", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentionally default None, rather than EMPTY_STRING_HASH_VALUE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was getting set to EMPTY_STRING_HASH_VALUE deeper in Datafile._calculate_hash, but I've now realised I can refactor it so it doesn't have to go deeper at all by changing the default above to EMPTY_STRING_HASH_VALUE 👌
Contents
Breaking changes
hash_valuefromDatafileGCS metadataDatafiles, only hash represented file (i.e. stop hashing metadata)Datasets andManifests, only hash the files contained (i.e. stop hashing metadata)Hashableinstance with_ATTRIBUTES_TO_HASH=Nonethe empty string hash value"AAAAAA=="Fixes
Datafileif GCScrc32cmetadata isn't presentManifest,Dataset, andDatafileMinor improvements
kwargswhen usingDatafile.from_cloud