Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e8d8b4f
IMP: Allow Datafile to remember where in the cloud it came from
cortadocodes May 4, 2021
c3fb7b6
IMP: Add Datafile.update_metadata method
cortadocodes May 4, 2021
a54f64a
MRG: Merge remote-tracking branch 'origin/release/0.1.17' into refact…
cortadocodes May 5, 2021
c6b6d7e
MRG: Merge branch 'release/0.1.17' into refactor/consolidate-cloud-da…
cortadocodes May 5, 2021
521f6c4
IMP: Raise error if implicit cloud location is missing from Datafile
cortadocodes May 5, 2021
7446c8b
IMP: Add Datafile._store_cloud_location method and use in cloud methods
cortadocodes May 5, 2021
a98bbc0
TST: Test Datafile cloud functions with/without implicit cloud locations
cortadocodes May 5, 2021
02819d5
TST: Factor out cloud datafile creation in datafile tests
cortadocodes May 5, 2021
5d3cfd7
IMP: Avoid re-uploading Datafile file or metadata if they haven't cha…
cortadocodes May 5, 2021
9570807
REF: Simplify output of GoogleCloudStorageClient.get_metadata
cortadocodes May 5, 2021
7a9d9cd
FIX: Add missing dictionary subscription
cortadocodes May 5, 2021
debb757
TST: Ensure file cache doesn't leak between tests
cortadocodes May 5, 2021
fdcd715
IMP: Allow Datafile to be used as a context manager for cloud changes
cortadocodes May 5, 2021
d27ec46
FIX: Get empty dict if custom metadata empty
cortadocodes May 5, 2021
265a658
TST: Test Datafile can be used as context manager for local changes
cortadocodes May 5, 2021
d91cc80
IMP: Allow option to not update cloud metadata in Datafile cloud methods
cortadocodes May 5, 2021
a81c3a9
FIX: Propagate __exit__ exception parameters
cortadocodes May 5, 2021
d727669
MRG: Merge remote-tracking branch 'origin/release/0.1.17' into refact…
cortadocodes May 7, 2021
8b819b9
REF: Rename context manager inside Datafile
cortadocodes May 7, 2021
2bc160a
REF: Move DatafileContextManager out of Datafile and make private
cortadocodes May 7, 2021
33396e9
DOC: Add docstring to _DatafileContextManager
cortadocodes May 7, 2021
68be5dd
IMP: Use hash of local file if cloud datafile's file has been downloaded
cortadocodes May 7, 2021
aa34562
DOC: Add more docstrings to datafile module
cortadocodes May 7, 2021
f7d3c93
DOC: Add documentation on Datafile usages
cortadocodes May 7, 2021
cfb4022
DOC: Update datafile documentation with image and correction
cortadocodes May 7, 2021
759de13
TST: Add Hashable immutable hash test
cortadocodes May 7, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions docs/source/datafile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,134 @@ the following main attributes:
- ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
- ``tags`` - a space-separated string or iterable of tags relevant to this file
- ``timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data


-----
Usage
-----

``Datafile`` can be used functionally or as a context manager. When used as a context manager, it is analogous to the
builtin ``open`` function context manager. On exiting the context (``with`` block), it closes the datafile locally and,
if it is a cloud datafile, updates the cloud object with any data or metadata changes.


.. image:: images/datafile_use_cases.png


Example A
---------
**Scenario:** Download a cloud object, calculate Octue metadata from its contents, and add the new metadata to the cloud object

**Starting point:** Object in cloud with or without Octue metadata

**Goal:** Object in cloud with updated metadata

.. code-block:: python

from octue.resources import Datafile


project_name = "my-project"
bucket_name = "my-bucket",
datafile_path = "path/to/data.csv"

with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as datafile, f:
data = f.read()
new_metadata = metadata_calculating_function(data)

datafile.timestamp = new_metadata["timestamp"]
datafile.cluster = new_metadata["cluster"]
datafile.sequence = new_metadata["sequence"]
datafile.tags = new_metadata["tags"]


Example B
---------
**Scenario:** Add or update Octue metadata on an existing cloud object *without downloading its content*

**Starting point:** A cloud object with or without Octue metadata

**Goal:** Object in cloud with updated metadata

.. code-block:: python

from datetime import datetime
from octue.resources import Datafile


project_name = "my-project"
bucket_name = "my-bucket"
datafile_path = "path/to/data.csv"

datafile = Datafile.from_cloud(project_name, bucket_name, datafile_path):

datafile.timestamp = datetime.now()
datafile.cluster = 0
datafile.sequence = 3
datafile.tags = {"manufacturer:Vestas", "output:1MW"}

datafile.to_cloud() # Or, datafile.update_cloud_metadata()


Example C
---------
**Scenario:** Read in the contents and Octue metadata of an existing cloud object without intent to update it in the cloud

**Starting point:** A cloud object with Octue metadata

**Goal:** Cloud object data (contents) and metadata held locally in local variables

.. code-block:: python

from octue.resources import Datafile


project_name = "my-project"
bucket_name = "my-bucket"
datafile_path = "path/to/data.csv"

datafile = Datafile.from_cloud(project_name, bucket_name, datafile_path)

with datafile.open("r") as f:
data = f.read()

metadata = datafile.metadata()


Example D
---------
**Scenario:** Create a new cloud object from local data, adding Octue metadata

**Starting point:** A file-like locally (or content data in local variable) with Octue metadata stored in local variables

**Goal:** A new object in the cloud with data and Octue metadata

For creating new data in a new local file:

.. code-block:: python

from octue.resources import Datafile


sequence = 2
tags = {"cleaned:True", "type:linear"}


with Datafile(path="path/to/local/file.dat", timestamp=None, sequence=sequence, tags=tags, mode="w") as datafile, f:
f.write("This is some cleaned data.")

datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")


For existing data in an existing local file:

.. code-block:: python

from octue.resources import Datafile


sequence = 2
tags = {"cleaned:True", "type:linear"}

datafile = Datafile(path="path/to/local/file.dat", timestamp=None, sequence=sequence, tags=tags)
datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")
Binary file added docs/source/images/datafile_use_cases.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 28 additions & 8 deletions octue/cloud/storage/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ def __init__(self, project_name, credentials=OCTUE_MANAGED_CREDENTIALS):
credentials = credentials

self.client = storage.Client(project=project_name, credentials=credentials)
self.project_name = project_name

def create_bucket(self, name, location=None, allow_existing=False, timeout=_DEFAULT_TIMEOUT):
"""Create a new bucket. If the bucket already exists, and `allow_existing` is `True`, do nothing; if it is
Expand Down Expand Up @@ -82,6 +83,17 @@ def upload_from_string(self, string, bucket_name, path_in_bucket, metadata=None,
self._update_metadata(blob, metadata)
logger.info("Uploaded data to Google Cloud at %r.", blob.public_url)

def update_metadata(self, bucket_name, path_in_bucket, metadata):
"""Update the metadata for the given cloud file.

:param str bucket_name:
:param str path_in_bucket:
:param dict metadata:
:return None:
"""
blob = self._blob(bucket_name, path_in_bucket)
self._update_metadata(blob, metadata)

def download_to_file(self, bucket_name, path_in_bucket, local_path, timeout=_DEFAULT_TIMEOUT):
"""Download a file to a file from a Google Cloud bucket at gs://<bucket_name>/<path_in_bucket>.

Expand Down Expand Up @@ -118,14 +130,22 @@ def get_metadata(self, bucket_name, path_in_bucket, timeout=_DEFAULT_TIMEOUT):
"""
bucket = self.client.get_bucket(bucket_or_name=bucket_name)
blob = bucket.get_blob(blob_name=self._strip_leading_slash(path_in_bucket), timeout=timeout)
metadata = blob._properties

# Get timestamps from blob rather than properties so they are datetime.datetime objects rather than strings.
metadata["updated"] = blob.updated
metadata["timeCreated"] = blob.time_created
metadata["timeDeleted"] = blob.time_deleted
metadata["customTime"] = blob.custom_time
return metadata

if blob is None:
return None

return {
"custom_metadata": blob.metadata or {},
"crc32c": blob.crc32c,
"size": blob.size,
"updated": blob.updated,
"time_created": blob.time_created,
"time_deleted": blob.time_deleted,
"custom_time": blob.custom_time,
"project_name": self.project_name,
"bucket_name": bucket_name,
"path_in_bucket": path_in_bucket,
}

def delete(self, bucket_name, path_in_bucket, timeout=_DEFAULT_TIMEOUT):
"""Delete the given file from the given bucket.
Expand Down
6 changes: 6 additions & 0 deletions octue/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,9 @@ class AttributeConflict(OctueSDKException):

class MissingServiceID(OctueSDKException):
"""Raise when a specific ID for a service is expected to be provided, but is missing or None."""


class CloudLocationNotSpecified(OctueSDKException):
"""Raise when attempting to interact with a cloud resource implicitly but the implicit details of its location are
missing.
"""
18 changes: 10 additions & 8 deletions octue/mixins/hashable.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ class Hashable:
_ATTRIBUTES_TO_HASH = None
_HASH_TYPE = "CRC32C"

def __init__(self, hash_value=None, *args, **kwargs):
self._hash_value = hash_value
def __init__(self, immutable_hash_value=None, *args, **kwargs):
self._immutable_hash_value = immutable_hash_value
self._ATTRIBUTES_TO_HASH = self._ATTRIBUTES_TO_HASH or []
super().__init__(*args, **kwargs)

Expand All @@ -40,11 +40,10 @@ class Holder(cls):
@property
def hash_value(self):
"""Get the hash of the instance."""
if self._hash_value:
return self._hash_value
if self._immutable_hash_value is None:
return self._calculate_hash()

self._hash_value = self._calculate_hash()
return self._hash_value
return self._immutable_hash_value

@hash_value.setter
def hash_value(self, value):
Expand All @@ -53,14 +52,17 @@ def hash_value(self, value):
:param str value:
:return None:
"""
self._hash_value = value
if self._immutable_hash_value is not None:
raise ValueError(f"The hash of {self!r} is immutable - hash_value cannot be set.")

self._immutable_hash_value = value

def reset_hash(self):
"""Reset the hash value to the calculated hash (rather than whatever value has been set).

:return None:
"""
self._hash_value = self._calculate_hash()
self._immutable_hash_value = None

def _calculate_hash(self, hash_=None):
"""Calculate the hash of the sorted attributes in self._ATTRIBUTES_TO_HASH. If hash_ is not None and is
Expand Down
51 changes: 30 additions & 21 deletions octue/mixins/identifiable.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,27 +25,7 @@ def __init__(self, *args, id=None, name=None, **kwargs):
"""Constructor for Identifiable class"""
self._name = name
super().__init__(*args, **kwargs)

# Store a boolean record of whether this object was created with a previously-existing uuid or was created new.
self._created = True if id is None else False

if isinstance(id, uuid.UUID):
# If it's a uuid, stringify it
id = str(id)

elif isinstance(id, str):
# If it's a string (or something similar which can be converted to UUID) check it's valid
try:
id = str(uuid.UUID(id))
except ValueError:
raise InvalidInputException(f"Value of id '{id}' is not a valid uuid string or instance of class UUID")

elif id is not None:
raise InvalidInputException(
f"Value of id '{id}' must be a valid uuid string, an instance of class UUID or None"
)

self._id = id or gen_uuid()
self._set_id(id)

def __str__(self):
return f"{self.__class__.__name__} {self._id}"
Expand All @@ -60,3 +40,32 @@ def id(self):
@property
def name(self):
return self._name

def _set_id(self, value):
"""Set the ID to the given value.

:param str|uuid.UUID|None value:
:return None:
"""
# Store a boolean record of whether this object was created with a previously-existing uuid or was created new.
self._created = True if value is None else False

if isinstance(value, uuid.UUID):
# If it's a uuid, stringify it
value = str(value)

elif isinstance(value, str):
# If it's a string (or something similar which can be converted to UUID) check it's valid
try:
value = str(uuid.UUID(value))
except ValueError:
raise InvalidInputException(
f"Value of id '{value}' is not a valid uuid string or instance of class UUID"
)

elif value is not None:
raise InvalidInputException(
f"Value of id '{value}' must be a valid uuid string, an instance of class UUID or None"
)

self._id = value or gen_uuid()
Loading