# Jsonnet Pipeline Specs

A **jsonnet pipeline specification** is a way to create pipeline specifications with `jsonnet` (pronounced "jay sonnet"), a wrapping layer that can return JSON pipeline specs. This provides a few different benefits as we'll see in this notebook:  

1. Wrap a pipeline definition in a function.
2. Parameterize at creation time
3. Validate internal logic
4. Pipeline specs can be hosted - Create pipeline from URL

## Step 0: OpenCV Setup

Create a repo of images that will be used for edges and montage pipelines

In [1]:
!pachctl create repo images

In [2]:
!pachctl list repo

NAME   CREATED                SIZE (MASTER) ACCESS LEVEL 
images Less than a second ago ≤ 0B          [repoOwner]   


In [3]:
!pachctl put file images@master:liberty.png -f http://imgur.com/46Q8nDz.png
!pachctl put file images@master:AT-AT.png -f http://imgur.com/8MN9Kg0.png
!pachctl put file images@master:kitten.png -f http://imgur.com/g2QnNqa.png

## Step 1: Jsonnet Pipeline Specs Intro

Previously, Pachyderm Pipelines could be written in `json` or `yaml`, with Jsonnet Pipeline Spec, you can now write specs with a new language called Jsonnet.

Unlike other templating languages (like the Go templates used by helm, for example), Jsonnet has functions and is actually Turing complete.

Let's take a look at a jsonnet file.

In [4]:
!cat edges.jsonnet

////
// Template arguments:
//
// suffix : An arbitrary suffix appended to the name of this pipeline, for
//          disambiguation when multiple instances are created.
// src : the repo from which this pipeline will read the images to which
//       it applies edge detection.
////
function(suffix, src)
{
  pipeline: { name: "edges-"+suffix },
  description: "OpenCV edge detection on "+src,
  input: {
    pfs: {
      name: "images",
      glob: "/*",
      repo: src,
    }
  },
  transform: {
    cmd: [ "python3", "/edges.py" ],
    image: "pachyderm/opencv:0.0.1"
  }
}


The first thing to notice is that the format is not that different from the `json` version of the edges pipeline. The `jsonnet` file allows us to create templates allowing us to parameterize our pipelines at creation. 

This means that pipeline specs written in `jsonnet` can take parameters and have internal logic, and generally can be made much more adaptable and reusable.

For this pipeline, the parameters are just `suffix`, which only affects the pipeline's name and `src`, which is the repo where this pipeline reads its images from. 

Jsonnet pipeline specs can be called and parameterized directly from the CLI. We can do this by adding the `--jsonnet` flag and setting our args on the command line. 

```shell
$ pachctl create pipeline -h 

Create a new pipeline from a pipeline specification. For details on the format, see https://docs.pachyderm.com/latest/reference/pipeline_spec/.

Usage:
  pachctl create pipeline [flags]

Flags:
      --arg strings       Top-level argument passed to the Jsonnet template in --jsonnet (which must be set if any --arg arugments are passed). Value must be of the form 'param=value'. For multiple args, --arg may be set more than once, or it may be passed a comma-separated list of 'param=value' pairs.
  -f, --file string       A JSON file (url or filepath) containing one or more pipelines. "-" reads from stdin (the default behavior). Exactly one of --file and --jsonnet must be set.
  -h, --help              help for pipeline
      --jsonnet string    BETA: A Jsonnet template file (url or filepath) for one or more pipelines. "-" reads from stdin. Exactly one of --file and --jsonnet must be set. Jsonnet templates must contain a top-level function; strings can be passed to this function with --arg (below)
  -p, --push-images       If true, push local docker images into the docker registry.
  -r, --registry string   The registry to push images to. (default "index.docker.io")
  -u, --username string   The username to push images as.

Global Flags:
      --no-color   Turn off colors.
  -v, --verbose    Output verbose logs
  ```

For the edges pipeline, we are going to set the `suffix` arg to `1`, which will set our pipeline name to `edges-1` and we'll set our `src` to the images repo that we created earlier. 

In [5]:
!pachctl create pipeline --jsonnet edges.jsonnet --arg suffix=1 --arg src=images

In [6]:
!pachctl list pipeline

NAME    VERSION INPUT     CREATED        STATE / LAST JOB  DESCRIPTION                     
edges-1 1       images:/* 20 seconds ago [32mrunning[0m / [32msuccess[0m OpenCV edge detection on images 


In [7]:
!pachctl inspect pipeline edges-1

Name: edges-1
Description: OpenCV edge detection on images
Created: About a minute ago 
State: [32mrunning[0m
Reason: 
Workers Available: 1/1
Stopped: false
Parallelism Spec: <nil>


Datum Timeout: (duration: nil Duration)
Job Timeout: (duration: nil Duration)
Input:
{
  "pfs": {
    "name": "images",
    "repo": "images",
    "repo_type": "user",
    "branch": "master",
    "glob": "/*"
  }
}

Output Branch: master
Transform:
{
  "image": "pachyderm/opencv:0.0.1",
  "cmd": [
    "python3",
    "/edges.py"
  ]
}




## Step 2: Create a Pipeline from a URL
Next, we'll create the montage pipeline. Only, this time we'll create it from a hosted version of the pipeline, passing the address of the template on GitHub. 

To view this template, [click here](https://raw.githubusercontent.com/pachyderm/examples/master/jsonnet/montage.jsonnet).

The montage pipeline has a few additional parameters that we can set, for example specifying `repo` or `repo@branch`. It allso shows how we can incorporate assertions, allowing us to validate arguments before a pipeline is created. 

Even with all of this logic we are still able to configure the pipeline during execution using the `--arg` parameters like so: 

In [8]:
!pachctl create pipeline \
--jsonnet https://raw.githubusercontent.com/pachyderm/examples/master/jsonnet/montage.jsonnet \
--arg suffix=1 \
--arg left=images@master \
--arg right=edges-1@master

In [9]:
!pachctl inspect pipeline montage-1

Name: montage-1
Description: A pipeline that combines images from images@master and edges-1@master into a montage.
Created: 34 seconds ago 
State: [32mrunning[0m
Reason: 
Workers Available: 1/1
Stopped: false
Parallelism Spec: <nil>


Datum Timeout: (duration: nil Duration)
Job Timeout: (duration: nil Duration)
Input:
{
  "cross": [
    {
      "pfs": {
        "name": "left",
        "repo": "images",
        "repo_type": "user",
        "branch": "master",
        "glob": "/"
      }
    },
    {
      "pfs": {
        "name": "right",
        "repo": "edges-1",
        "repo_type": "user",
        "branch": "master",
        "glob": "/"
      }
    }
  ]
}

Output Branch: master
Transform:
{
  "image": "dpokidov/imagemagick:7.0.10-58",
  "cmd": [
    "sh"
  ],
  "stdin": [
    "montage -shadow -background SkyBlue -geometry 300x300+2+2 /pfs/left/* /pfs/right/* /pfs/out/montage.png"
  ]
}




## Step 3: Reusing pipelines (Montage within a montage)
To show off where this feature is going, though let's create another instance of the montage pipeline. 

Our montage pipeline takes in two repos of images and creates a montage. Normally, we would have to duplicate our pipeline spec, but now we don'thave to touch the pipeline spec at all. We just modify the arguments we send to the jsonnet pipeline spec, and we have another instance of the pipeline (`montage-2`). 

This means that we can create another pipeline using the same template and just change the `suffix` argument in this case. 

In [10]:
!pachctl create pipeline \
--jsonnet https://raw.githubusercontent.com/pachyderm/examples/master/jsonnet/montage.jsonnet \
--arg suffix=2 \
--arg left=montage-1@master \
--arg right=edges-1@master

In [11]:
!pachctl inspect pipeline montage-2

Name: montage-2
Description: A pipeline that combines images from montage-1@master and edges-1@master into a montage.
Created: 13 seconds ago 
State: [32mrunning[0m
Reason: 
Workers Available: 1/1
Stopped: false
Parallelism Spec: <nil>


Datum Timeout: (duration: nil Duration)
Job Timeout: (duration: nil Duration)
Input:
{
  "cross": [
    {
      "pfs": {
        "name": "left",
        "repo": "montage-1",
        "repo_type": "user",
        "branch": "master",
        "glob": "/"
      }
    },
    {
      "pfs": {
        "name": "right",
        "repo": "edges-1",
        "repo_type": "user",
        "branch": "master",
        "glob": "/"
      }
    }
  ]
}

Output Branch: master
Transform:
{
  "image": "dpokidov/imagemagick:7.0.10-58",
  "cmd": [
    "sh"
  ],
  "stdin": [
    "montage -shadow -background SkyBlue -geometry 300x300+2+2 /pfs/left/* /pfs/right/* /pfs/out/montage.png"
  ]
}




## Import hosted Jsonnet

One addtional feature is that we can also import hosted jsonnet as well. This allows you to compose logic and reuse common functions across templates. This works excellently when you have standard validation that is shared across pipeline specifications. 

E.g.

```json
import https://raw.githubusercontent.com/pachyderm_specs/myspecs/functions.jsonnet

function(suffix, src)
{
  pipeline: { name: "edges-"+suffix },
  description: "OpenCV edge detection on "+src,
...
```

## Cleanup Example

In [49]:
# Delete pipelines created
!pachctl delete pipeline montage-2
!pachctl delete pipeline montage-1
!pachctl delete pipeline edges-1

# Delete images repo
!pachctl delete repo images