Skip to content

Conversation

@cortadocodes
Copy link
Member

@cortadocodes cortadocodes commented May 5, 2021

Contents

New Features

  • Allow Datafile to be used as a context manager for changes to local datafiles
  • Allow Datafile.from_cloud to be used as a context manager for changes to cloud datafiles
  • Allow Datafile to remember where in the cloud it came from
  • Add the following methods to Datafile:
    • get_cloud_metadata
    • update_cloud_metadata
    • clear_from_file_cache
    • _get_cloud_location
    • _store_cloud_location
    • _check_for_attribute_conflict
  • Avoid re-uploading Datafile file or metadata if they haven't changed
  • Raise error if implicit cloud location is missing from Datafile
  • Add GoogleCloudStorageClient.update_metadata method
  • Allow option to not update cloud metadata in Datafile cloud methods
  • Allow tags to contain capitals and forward slashes (but not start or end in a forward slash)
  • Allow datetime and posix timestamps for Datafile.timestamp
  • Add Datafile.posix_timestamp property

Breaking changes

  • Close Redundant hash value in metadata #148: remove hash_value from Datafile GCS metadata
  • When hashing Datafiles, only hash represented file (i.e. stop hashing metadata)
  • When hashing Datasets and Manifests, only hash the files contained (i.e. stop hashing metadata)
  • Make hash of Hashable instance with _ATTRIBUTES_TO_HASH=None the empty string hash value "AAAAAA=="

Minor improvements

  • Simplify output of GoogleCloudStorageClient.get_metadata
  • Make Hashable instances re-calculate their hash_value every time unless an immutable_hash_value is explicitly provided (e.g. for cloud datafiles where you don't have the file locally to hash)
  • Add private Identifiable._set_id method
  • Close Improved ease of construction for object metadata #147: pull metadata gathering for Datafile into method
  • Get datetime objects directly from GCS blob instead of parsing string serialisations
  • Add time utils module
  • Add hash preparation function to Hashable for datetime instances
  • Use the empty string hash value for Datafile if GCS crc32c metadata isn't present
  • Stop serialising hash value of Manifest, Dataset, and Datafile

Fixes

  • Close Key values in object metadata are stringified #146: Stop serialising GCS metadata as JSON. This avoids strings in the metadata appearing in two sets of quotation marks on Google Cloud Storage. This is a breaking change for any files already persisted with JSON-encoded metadata.
  • Remove ability to set custom hash value via kwargs when using Datafile.from_cloud

Testing

  • Factor out cloud datafile creation in datafile tests

Quality Checklist

  • New features are fully tested (No matter how much Coverage Karma you have)

cortadocodes and others added 30 commits May 3, 2021 16:29
…ia-method

Refactor: Pull Datafile metadata gathering into method
This avoids needless deserialisation.
Posix timestamps are converted to datetime timestamps on Datafile
instantiation
This avoids strings in the metadata appearing in two sets of
quotation marks on Google Cloud Storage.
…-not-appear-in-quotation-marks

Ensure Google Cloud Storage metadata strings do not appear in quotation marks
…s-and-slashes-in

Feature: Allow tags to have capitals and forward slashes in
…ata-via-method

Allow datetime timestamps for datafiles; use GCS custom time field for Datafile.timestamp
…om-gcs-metadata

Remove hash value from Datafile GCS metadata
@cortadocodes cortadocodes self-assigned this May 7, 2021
@cortadocodes cortadocodes requested a review from thclark May 7, 2021 17:56
@codecov-commenter
Copy link

codecov-commenter commented May 7, 2021

Codecov Report

Merging #158 (6ad67e9) into main (aa1f9cc) will increase coverage by 0.39%.
The diff coverage is 99.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
+ Coverage   94.61%   95.00%   +0.39%     
==========================================
  Files          54       55       +1     
  Lines        1540     1622      +82     
==========================================
+ Hits         1457     1541      +84     
+ Misses         83       81       -2     
Impacted Files Coverage Δ
octue/resources/datafile.py 99.02% <99.15%> (+0.52%) ⬆️
octue/cloud/storage/client.py 97.29% <100.00%> (+1.35%) ⬆️
octue/exceptions.py 100.00% <100.00%> (ø)
octue/mixins/hashable.py 100.00% <100.00%> (ø)
octue/mixins/identifiable.py 100.00% <100.00%> (ø)
octue/resources/dataset.py 100.00% <100.00%> (ø)
octue/resources/manifest.py 94.36% <100.00%> (ø)
octue/resources/tag.py 99.00% <100.00%> (+0.99%) ⬆️
octue/utils/time.py 100.00% <100.00%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa1f9cc...6ad67e9. Read the comment docs.

@cortadocodes cortadocodes marked this pull request as ready for review May 7, 2021 17:57
@cortadocodes cortadocodes merged commit 8924d88 into main May 7, 2021
@cortadocodes cortadocodes deleted the release/0.1.17 branch May 7, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant hash value in metadata Improved ease of construction for object metadata Key values in object metadata are stringified

4 participants