# Exporting pipelines to Argo

[Ploomber](https://github.com/ploomber/ploomber) is an open-source framework to write data pipelines. By using Ploomber in combination with Soopervisor you can quickly convert your Jupyter notebooks into production-ready pipelines that run on Kubernetes. No need to write a complex Argo YAML spec!

This tutorial will show you how to export Ploomber pipelines to Kubernetes (via Argo workflows).

```{note}
This example requires:

- [kind](https://kind.sigs.k8s.io/) (for creating a local Kubernetes cluster)
- Docker
- A Python 3 environment
```

In [2]:
# conda activate {env} doesn't work well here
# so we manually modify the path
PATH=$CONDA_PREFIX/envs/soopervisor/bin:$PATH

In [5]:
kind delete cluster

Deleting cluster "kind" ...


Let's create a local Kubernetes cluster using `kind`:

In [6]:
kind create cluster --config kind-config.yaml

Creating cluster "kind" ...
 [32m✓[0m Ensuring node image (kindest/node:v1.24.0) 🖼7l
 [32m✓[0m Preparing nodes 📦 7l
 [32m✓[0m Writing configuration 📜7l
 [32m✓[0m Starting control-plane 🕹️7l
 [32m✓[0m Installing CNI 🔌7l
 [32m✓[0m Installing StorageClass 💾7l
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂


Note that we're passing a configuration file, we'll mount the `./outputs` directory in the cluster in (`/host`). This will allow us to get the outputs upon execution:

In [34]:
cat kind-config.yaml

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: outputs
        containerPath: /host


Let's check if the cluster is ready:

In [3]:
kubectl get nodes

NAME                 STATUS   ROLES           AGE   VERSION
kind-control-plane   Ready    control-plane   82s   v1.24.0


Install Argo (for details, see the [docs](https://argoproj.github.io/argo-workflows/quick-start/)):

In [4]:
kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.3.9/install.yaml

namespace/argo created
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtaskresults.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtasksets.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io created
serviceaccount/argo created
serviceaccount/argo-server created
role.rbac.authorization.k8s.io/argo-role created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/argo-cluster-role created
clust

Patch the UI so can bypass the login form:

In [5]:
kubectl patch deployment \
  argo-server \
  --namespace argo \
  --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [
  "server",
  "--auth-mode=server"
]}]'

deployment.apps/argo-server patched


```{note}
To access Argo's UI, open a terminal and execute:

`kubectl -n argo port-forward deployment/argo-server 2746:2746`

Then, open: https://localhost:2746/
```

Let's give it a few seconds for the cluster to fully initialize:

In [37]:
sleep 5

If the pods show as "Running", we're ready to go:

In [11]:
kubectl get pods -n argo

NAME                                   READY   STATUS    RESTARTS   AGE
argo-server-57cf87c886-wfrg6           1/1     Running   0          27s
workflow-controller-77c44779bf-lb64s   1/1     Running   0          29s


Let's intall the dependencies:

In [31]:
pip install ploomber soopervisor --quiet

We now download an example:

In [14]:
ploomber examples -n templates/ml-intermediate -o ml-intermediate
cd ml-intermediate

Loading examples...
Next steps:

$ cd ml-intermediate/
$ ploomber install[34m

Open ml-intermediate/README.md for details.
[0m[0m

Install the example's dependencies:

In [38]:
cp requirements.txt requirements.lock.txt

In [17]:
pip install -r requirements.txt --quiet

We now create a new `soopervisor` environment that uses Argo as backend:

In [18]:
soopervisor add training --backend argo-workflows

No pipeline.training.yaml found, looking for pipeline.yaml instead
Found /Users/Edu/dev/soopervisor/kind/doc/ml-intermediate/pipeline.yaml. Loading...
[34m= Adding /Users/Edu/dev/soopervisor/kind/doc/ml-intermediate/training/Dockerfile... =[0m
Environment added, to export it:
	 $ soopervisor export training
To force execution of all tasks:
	 $ soopervisor export training --mode force

[0m

Let's download our configuration files:

In [19]:
curl https://raw.githubusercontent.com/ploomber/soopervisor/master/tutorials/kubernetes/env-k8s.yaml -o env.yaml

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    38  100    38    0     0    151      0 --:--:-- --:--:-- --:--:--   151
You have new mail in /var/mail/Edu


This will allow us to store all pipeline outputs in a shared folder:

In [46]:
cat env.yaml

sample: False
root: /mnt/shared-folder

In [20]:
curl https://raw.githubusercontent.com/ploomber/soopervisor/master/tutorials/kubernetes/soopervisor-k8s.yaml -o soopervisor.yaml

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   284  100   284    0     0   1378      0 --:--:-- --:--:-- --:--:--  1378


And this will mount the `/host` folder in the pod (so we can see the outputs in our local `outputs/` directory):

In [48]:
cat soopervisor.yaml

training:
  backend: argo-workflows
  # we are not uploading the docker image, set to null
  repository: null
  # mount the /host folder, it will be visible to pods in /mnt/shared-folder
  mounted_volumes:
    - name: shared-folder
      spec:
        hostPath:
          path: /host
You have new mail in /var/mail/Edu


The `soopervisor export` command will generate the Argo YAML spec and build the docker image:

In [21]:
soopervisor export training --skip-tests --ignore-git --mode force

No pipeline.training.yaml found, looking for pipeline.yaml instead
Found /Users/Edu/dev/soopervisor/kind/doc/ml-intermediate/pipeline.yaml. Loading...
No pipeline.training.yaml found, looking for pipeline.yaml instead
Found /Users/Edu/dev/soopervisor/kind/doc/ml-intermediate/pipeline.yaml. Loading...
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 3221.43it/s]
Copying soopervisor.yaml -> dist/ml-intermediate/soopervisor.yaml
Copying env.serve.yaml -> dist/ml-intermediate/env.serve.yaml
Copying tasks/__init__.py -> dist/ml-intermediate/tasks/__init__.py
Copying tasks/serve.py -> dist/ml-intermediate/tasks/serve.py
Copying tasks/features.py -> dist/ml-intermediate/tasks/features.py
Copying tasks/join.py -> dist/ml-intermediate/tasks/join.py
Copying tasks/get.py -> dist/ml-intermediate/tasks/get.py
Copying fit.py -> dist/ml-intermediate/fit.py
Copying requirements.txt -> dist/ml-intermediate/requirements.txt
Copying environment.yml -> dist/ml-intermediate/environment.y

Load the docker image into the `kind` cluster:

In [22]:
kind load docker-image ml-intermediate:latest-default

Image: "ml-intermediate:latest-default" with ID "sha256:e3d92729b49c8dcfb33f1b092fad49cecdb5a44a47d369960c22216ba3e7eda1" not yet present on node "kind-control-plane", loading...
You have new mail in /var/mail/Edu


```{note}
This is only required for this example. When using
a production Kubernetes cluster, `soopervisor export` will
automatically push the image to the registry.
```

Let's now submit the workflow:

In [23]:
argo submit -n argo training/argo.yaml

Name:                ml-intermediate-c56tj
Namespace:           argo
ServiceAccount:      default
Status:              Pending
Created:             Sun Aug 28 00:47:00 -0500 (now)
Progress:            
You have new mail in /var/mail/Edu


Wait for the workflow to finish:

In [24]:
argo wait @latest -n argo

@latest Succeeded at 2022-08-28 00:48:43 -0500 CDT
You have new mail in /var/mail/Edu


Let's get the status:

In [25]:
argo get @latest -n argo

Name:                ml-intermediate-c56tj
Namespace:           argo
ServiceAccount:      default
Status:              Succeeded
Conditions:          
 PodRunning          False
 Completed           True
Created:             Sun Aug 28 00:47:00 -0500 (2 minutes ago)
Started:             Sun Aug 28 00:47:00 -0500 (2 minutes ago)
Finished:            Sun Aug 28 00:48:43 -0500 (41 seconds ago)
Duration:            1 minute 43 seconds
Progress:            5/5
ResourcesDuration:   1m59s*(1 cpu),1m59s*(100Mi memory)

[39mSTEP[0m                      TEMPLATE  PODNAME                           DURATION  MESSAGE
 [32m✔[0m ml-intermediate-c56tj  dag                                                     
 ├─[32m✔[0m get                  run-task  ml-intermediate-c56tj-4292949941  26s         
 ├─[32m✔[0m petal-area           run-task  ml-intermediate-c56tj-469948449   14s         
 ├─[32m✔[0m sepal-area           run-task  ml-intermediate-c56tj-2014493210  14s         
 ├─[32m✔[0m join

All the outputs are stored in the `outputs` directory:

In [30]:
ls ../outputs/sample=False

get.parquet         model.pickle        petal_area.parquet
join.parquet        nb.html             sepal_area.parquet
