Octue SDK is currently integrated with Google Cloud Storage. Later on, we intend to integrate with other cloud providers, too. If you need a certain cloud provider and are interested in contributing to or sponsoring its development in Octue SDK, please join the discussion in this issue.
All of the data container classes in the SDK have a to_cloud
and a from_cloud
method, which handles their
upload/download to/from the cloud, including all relevant metadata from the instance (e.g. labels, ID). Data integrity is
checked before and after upload and download to ensure any data corruption is avoided.
Assuming you have an instance of Datafile
called my_datafile
:
my_datafile.to_cloud(project_name=<project-name>, bucket_name=<bucket-name>, path_in_bucket=<path/in/bucket>)
>>> gs://<bucket-name>/<path/in/bucket>
downloaded_datafile = Datafile(path="gs://<bucket-name>/<path/in/bucket>, project_name=<project-name>)
Datasets are uploaded into an output directory. Within, a directory named after the dataset is created containing the dataset's datafiles:
<output-directory>
|
|--- <dataset.name>
|
|--- <file-0>
|--- <file-1>
|--- <file-2>
...
Datasets are downloaded from their directory. Assuming you have an instance of Dataset
called my_dataset
:
my_dataset.to_cloud(project_name=<project-name>, bucket_name=<bucket-name>, output_directory=<output-directory>)
>>> gs://<bucket-name>/<output-directory>/<my_dataset.name>
downloaded_dataset = Dataset.from_cloud(project_name=<project-name>, bucket_name=<bucket-name>, path_to_dataset_directory=<output-directory>/<my_dataset.name>)
Manifests are uploaded as a file to the given path and downloaded by providing the same path. Assuming you have an
instance of Manifest
called my_manifest
:
my_manifest.to_cloud(project_name=<project-name>, bucket_name=<bucket-name>, path_to_manifest_file=<path/to/manifest/file.json>)
>>> gs://<bucket-name>/<path/to/manifest/file.json>
downloaded_manifest = Manifest.from_cloud(project_name=<project-name>, bucket_name=<bucket-name>, path_to_manifest_file=<path/to/manifest/file.json>)
The datasets specified in a manifest can be located anywhere in any bucket as long as their paths are correct in their
Dataset
instances. A manifest that takes advantage of this can be created locally with datasets whose paths are set
to cloud locations.