# Pipelines

The GeoAnalytics Canada Pipeline system helps with developing and building 
portable, scalable Earth Observation pre-processing pipelines and machine 
learning (ML) workflows based on Docker containers.
The underlying system uses Argo Workflows and the Argo Project documentation
can be useful in debugging/expanding the pipelines functionality. 

**The Pipelines platform consists of:**

* A UI for managing and tracking pipelines and their execution.
* An engine for scheduling a pipeline’s execution
* An SDK for defining, building, and deploying pipelines in Python. 
* The SDK we use is the 
* [Hera python library](https://https://hera-workflows.readthedocs.io).

A pipeline is a representation of a workflow containing the parameters required 
to run the workflow and the inputs and outputs of each component. 
Each pipeline component is a self-contained code block, packaged as a Docker image.

A Workflow can also leverage GEOAnalytics Canada's Cloud Storage as an 
Artifact store to share artifacts between Tasks/Steps. 


In [8]:
# Import the required libraries 

import os

from hera.shared import global_config
from hera.workflows import Container, Workflow, Steps

### Configuration

Setting some global values within the notebook helps keep the workflow easier to
update to different Container Registries, Containers, and Images. 

In [9]:
# CONFIG
# -------------------------
CR_URL="someregistry.domain.com"
IMG_LABEL="repository/imagename"
TAG_LABEL="0.1.0"
IMG_TAG=f"{CR_URL}/{IMG_LABEL}:{TAG_LABEL}"
# -------------------------

### Setting Up The Workflow

The next Cell implements a template of how a single-step Workflow would be
implemented. 
This Workflow is created with a single Container which is then executed 
during the Steps procedure within the Workflow. 

In [10]:
global_config.api_version = "argoproj.io/v1"
global_config.host = os.getenv('PIPELINE_HOST')

with Workflow(
    name='nameofworkflow'.lower(),
    namespace=os.getenv('PIPELINE_NS'),
    entrypoint='name-of-entry-task-step',
    parallelism=1, # Number of tasks to run in parallel
) as w:
    t = Container(
    name='unique-container-name',
    image=f'{IMG_TAG}',
    command=["sh", "./entrypoint.sh"], # 
)

    with Steps(name="name-of-step-template"):
        t(name="name-of-task-step")


### Running a Workflow

Here we set 

In [None]:
# CONFIG
# -------------------------
CR_URL="docker.io"
IMG_LABEL="docker/whalesay"
TAG_LABEL="latest"
IMG_TAG=f"{CR_URL}/{IMG_LABEL}:{TAG_LABEL}"
# -------------------------

global_config.api_version = "argoproj.io/v1"
global_config.host = os.getenv('PIPELINE_HOST')

with Workflow(
    name='whales-can-talk'.lower(),
    namespace=os.getenv('PIPELINE_NS'),
    entrypoint='whalesay-docker',
    parallelism=1, # Number of tasks to run in parallel
) as w:
    t = Container(
    name='whalesay-docker',
    image=IMG_TAG,
    command=["cowsay"],
    args=['Hi!']
)


#### Submitting A Workflow to Pipelines

All that is left to do to submit your workflow, is to run the `.create()` 
method on the workflow object. 
This uses the GEOAnalytics Canada preconfigured backend settings to ensure 
your workflows are submitted with the correct permissions and security. 

In [None]:
w.create()

#### Inspect Workflow

You can inspect portions of your workflow by leveraging the 
`IPython.display.JSON` widget. 
Some parameters may be missing - there are some defaults that are 
intserted by the Workflow Controller that enable the workflow to execute
in the GEOAnalytics Platform. 

In [None]:
from IPython.display import JSON
JSON(w.to_dict())

#### References

- [Argo Workflows](https://argoproj.github.io/argo-workflows/) - The Pipelines underlying tool: Argo Workflows
- [hera API Reference](https://hera-workflows.readthedocs.io/en/latest/api/shared/) - API documentation for `hera` library
- [hera GitHub](https://github.com/argoproj-labs/hera) - Source code for `hera`