# From notebook to Kubernetes pipeline

This tutorial will show you how to automatically convert a Jupyter notebook into a Kubernetes pipeline.

Let's download a sample notebook:

In [2]:
# conda activate {env} doesn't work well here
# so we manually modify the path
PATH=$CONDA_PREFIX/envs/soopervisor/bin:$PATH

You have new mail in /var/mail/Edu


In [4]:
mkdir pipeline
cd pipeline

In [5]:
curl -O https://raw.githubusercontent.com/ploomber/soorgeon/main/examples/machine-learning/nb.ipynb

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5782  100  5782    0     0  27145      0 --:--:-- --:--:-- --:--:-- 27145


```{note}
The sample notebook is a typical Machine
Learning pipeline, you can see it
[here](https://github.com/ploomber/soorgeon/blob/main/examples/machine-learning/nb.ipynb)
```

Let's now use [soorgeon](https://github.com/ploomber/soorgeon) to refactor the notebook:

In [7]:
pip install soorgeon --quiet
soorgeon refactor nb.ipynb -p /mnt/shared-folder -d parquet

Added README.md
[32mFinished refactoring 'nb.ipynb', use Ploomber to continue.[0m

Install dependencies (this will install ploomber):
    $ pip install -r requirements.txt

List tasks:
    $ ploomber status

Execute pipeline:
    $ ploomber build

Plot pipeline:
    $ ploomber plot

* Documentation: https://docs.ploomber.io
* Jupyter integration: https://ploomber.io/s/jupyter
* Other editors: https://ploomber.io/s/editors

[0m

```{note}
Soorgeon uses static analysis to split notebooks into
several files, the output is a [Ploomber](https://github.com/ploomber/ploomber)
pipeline that then we can export to Kubernetes.

The `-p` tells Soorgeon that it should store all the pipeline
outputs in a `/mnt/shared-folder` directory, and the `-d`
option states we should use `.parquet` files for the outputs.
```

We now configure the Argo workflows backend:

In [8]:
# soopervisor add requires a requirements.lock.txt file
cp requirements.txt requirements.lock.txt

In [9]:
# add the taget environment
soopervisor add training --backend argo-workflows

No pipeline.training.yaml found, looking for pipeline.yaml instead
Found /Users/Edu/.Trash/pipeline/pipeline/pipeline.yaml. Loading...
Environment added, to export it:
	 $ soopervisor export training
To force execution of all tasks:
	 $ soopervisor export training --mode force

[0m

Soopervisor uses a `soopervisor.yaml` to configure your project, we'll download a pre-configured one:

In [10]:
curl https://raw.githubusercontent.com/ploomber/soopervisor/master/tutorials/workflow/soopervisor-workflow.yaml -o soopervisor.yaml

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   152  100   152    0     0    730      0 --:--:-- --:--:-- --:--:--   727


In [11]:
cat soopervisor.yaml

training:
  backend: argo-workflows
  repository: null
  mounted_volumes:
    - name: shared-folder
      spec:
        hostPath:
          path: /host


The `soopervisor export` command will create the Docker image and the Argo YAML spec:

In [None]:
soopervisor export training --skip-tests --ignore-git --mode force

Here's the generated Argo YAML spec:

## Executing