Skip to content

Conversation

@cortadocodes
Copy link
Member

@cortadocodes cortadocodes commented May 4, 2021

Contents

Breaking changes

  • Close Redundant hash value in metadata #148: remove hash_value from Datafile GCS metadata
  • When hashing Datafiles, only hash represented file (i.e. stop hashing metadata)
  • When hashing Datasets and Manifests, only hash the files contained (i.e. stop hashing metadata)
  • Make hash of Hashable instance with _ATTRIBUTES_TO_HASH=None the empty string hash value "AAAAAA=="

Fixes

  • Use the empty string hash value for Datafile if GCS crc32c metadata isn't present
  • Stop serialising hash value of Manifest, Dataset, and Datafile

Minor improvements

  • Remove ability to set custom hash value via kwargs when using Datafile.from_cloud

@cortadocodes cortadocodes self-assigned this May 4, 2021
@cortadocodes cortadocodes requested a review from thclark May 4, 2021 11:19
@codecov-commenter
Copy link

codecov-commenter commented May 4, 2021

Codecov Report

Merging #156 (06b6782) into release/0.1.17 (ee8cc7f) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@                Coverage Diff                 @@
##           release/0.1.17     #156      +/-   ##
==================================================
- Coverage           94.77%   94.76%   -0.01%     
==================================================
  Files                  55       55              
  Lines                1569     1567       -2     
==================================================
- Hits                 1487     1485       -2     
  Misses                 82       82              
Impacted Files Coverage Δ
octue/mixins/hashable.py 100.00% <100.00%> (ø)
octue/resources/datafile.py 98.06% <100.00%> (-0.03%) ⬇️
octue/resources/dataset.py 100.00% <100.00%> (ø)
octue/resources/manifest.py 94.36% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee8cc7f...06b6782. Read the comment docs.

Copy link
Contributor

@thclark thclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good; one minor question on a default value of the hash to resolve before merging

id=kwargs.get("id", custom_metadata.get("id", ID_DEFAULT)),
path=storage.path.generate_gs_path(bucket_name, datafile_path),
hash_value=kwargs.get("hash_value", custom_metadata.get("hash_value", metadata.get("crc32c", None))),
hash_value=metadata.get("crc32c", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally default None, rather than EMPTY_STRING_HASH_VALUE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was getting set to EMPTY_STRING_HASH_VALUE deeper in Datafile._calculate_hash, but I've now realised I can refactor it so it doesn't have to go deeper at all by changing the default above to EMPTY_STRING_HASH_VALUE 👌

@cortadocodes cortadocodes merged commit 4720602 into release/0.1.17 May 5, 2021
@cortadocodes cortadocodes deleted the refactor/remove-hash-value-from-gcs-metadata branch May 5, 2021 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants