Definitions
Manifest <octue.resources.manifest.Manifest>
A set of related cloud and/or local
datasets <dataset>
, metadata, and helper methods. Typically produced by or needed for processing by an Octue service.
Tip
Use a manifest to send datasets <dataset>
to an Octue service as a question (for processing) - the service will send an output manifest back with its answer if the answer includes output datasets.
Make a clear grouping of datasets needed for a particular analysis.
from octue.resources import Manifest
manifest = Manifest(
datasets={
"my_dataset_0": "gs://my-bucket/my_dataset_0",
"my_dataset_1": "gs://my-bucket/my_dataset_1",
"my_dataset_2": "gs://another-bucket/my_dataset_2",
}
)
Get an Octue service to analyse data for you as part of a larger analysis.
from octue.resources import Child
child = Child(
name="wind_speed",
id="4acbf2aa-54ce-4bae-b473-a062e21b3d57",
backend={"name": "GCPPubSubBackend", "project_name": "my-project"},
)
answer = child.ask(input_manifest=manifest)
See here <asking_questions>
for more information.
Get output datasets from an Octue service from the cloud when you're ready.
answer["output_manifest"]["an_output_dataset"].files
>>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
Hint
Datasets in an output manifest are stored in the cloud. You’ll need to keep a reference to where they are to access them - the output manifest is this reference. You’ll need to use it straight away or save it to make use of it.
You can include local datasets in your manifest if you can guarantee all services that need them can access them. A use case for this is, for example, a supercomputer cluster running several octue
services locally that process and transfer large amounts of data. It is much faster to store and access the required datasets locally than upload them to the cloud and then download them again for each service (as would happen with cloud datasets).
Warning
If you want to ask a child a question that includes a manifest containing one or more local datasets, you must include the allow_local_files <octue.resources.child.Child.ask>
parameter. For example, if you have an analysis object with a child called "wind_speed":
input_manifest = Manifest(
datasets={
"my_dataset_0": "gs://my-bucket/my_dataset_0",
"my_dataset_1": "local/path/to/my_dataset_1",
}
)
analysis.children["wind_speed"].ask(
input_values=analysis.input_values,
input_manifest=analysis.input_manifest,
allow_local_files=True,
)