# Submit Kubeflow Pipelines as `ScheduledWorkflow`s
This notebook shows examples of submitting Kubeflow [`ScheduledWorkflow`s](https://github.com/kubeflow/pipelines/blob/0.1.31/backend/src/crd/pkg/apis/scheduledworkflow/register.go) (SWFs) that run [Kubeflow Pipelines](https://www.kubeflow.org/docs/pipelines/) (KFPs).

It contains helper-functions for easy fetching and wrapping of KFPs in SWFs, as well as [examples demonstrating these capabilities](#examples).

## Wrapping a Kubeflow Pipeline in a `ScheduledWorkflow`
Conceptually, a "scheduled" Kubeflow Pipeline must take exactly one argument: the timestamp it was scheduled to run at. Any other parameters it normally accepts must be [partially-applied](https://en.wikipedia.org/wiki/Partial_application) away (i.e. assigned specific, concrete values), leaving just the timestamp parameter to be filled by the SWF machinery on each pipeline instantiation. 

## Past and Present Runs

The timestamp passed by the SWF controller to each KFP instance it spawns will generally be one of two types:
- a timestamp close to the current time (when the SWF is caught up to the present, and spawning KFPs according to a `crontab`-style schedule or fixed interval)
- a timestamp in the past (when the SWF is created with a "start" time in the past, and is back-filling KFP runs to catch up to the present)


## TODOs
- [ ] factor out helpers
- tests:
  - example pipelines:
      - [x] `gs://ml-pipeline-playground/coin.tar.gz`
      - [ ] [`https://storage.googleapis.com/ml-pipeline-playground/coin.tar.gz`](https://storage.googleapis.com/ml-pipeline-playground/coin.tar.gz)
      - [x] [`https://raw.githubusercontent.com/kubeflow/pipelines/0.1.31/samples/core/condition/condition.py`](https://raw.githubusercontent.com/kubeflow/pipelines/0.1.31/samples/core/condition/condition.py)
      - [ ] `component.yaml` example(s) (TODO: which one(s)?)
      - [ ] Bitcoin BigQuery monthly rollups
      - [ ] Current Weather (OpenWeather)
      - [ ] Stock movements (IEX API)
  - scheduling:
      - [ ] intervals
      - [x] start/end times
      - [ ] updating/clearing time-bounds
- [x] `!pip install` requirements
- [ ] (optionally) pass scheduled time to pipeline
- [ ] verify provided pipelines take no arguments (other than e.g. scheduled datetime)
- [ ] parameterize notebook w/ `papermill`
- [ ] support wrapping `component.yaml`s into Pipelines and running
  - [ ] verify no extra arguments to provided components
- [x] publish runner container publicly
- [x] publish SWF+KFP YAML template

## Install KFP SDK

In [1]:
from sys import executable as python
kfp_version = '0.1.31'
!{python} -m pip install kfp=={kfp_version} --upgrade -q

# Helper Functions
Below are utilities for:
- [fetching/loading pipelines](#Pipeline-fetching/loading)
- [parsing their YAML definitions](#Pipeline-spec-accessors)
- [building `ScheduledWorkflow` resources](#ScheduledWorkflow-builders)
- [`import`ing remote `.py` files](#importing-remote-modules)

## Pipeline fetching/loading

In [121]:
from contextlib import contextmanager

def url_to_stream(url):
    """Return a file-like-object from a local or remote ("gs" or "https?") URL"""
    from urllib.parse import urlparse
    parsed = urlparse(url)
    scheme = parsed.scheme
    if not scheme:
        return open(url, 'rb')
    elif scheme == 'http' or scheme == 'https':
        @contextmanager
        def UrlOpen(url):
            from urllib.request import urlopen
            stream = urlopen(url)
            try:
                yield stream
            finally:
                stream.close()

        return UrlOpen(url)
    elif scheme == 'gs':
        from google.cloud import storage
        gcs = storage.Client()
        from tempfile import NamedTemporaryFile, TemporaryDirectory
        tmp = TemporaryDirectory()
        from os.path import basename, join, exists
        name = basename(parsed.path)
        path = join(tmp.name, name)

        with open(path, 'wb') as dest:
            gcs.download_blob_to_file(url, dest)

        @contextmanager
        def GcsFetchOpen(url):
            try:
                yield open(path, 'rb')
            finally:
                with tmp:
                    pass
                
        return GcsFetchOpen(url)
    else:
        raise Exception("Unsure how to handle URL scheme '%s' (%s)" % (scheme, url))

def url_to_bytes(url):
    """Return the contents of a local or remote ("gs" or "https?") URL"""
    with url_to_stream(url) as f:
        return f.read()

In [122]:
def try_extract_pipeline_tar(bytes):
    """Attempt to parse `bytes` as a TAR archive and extract a `pipeline.yaml`"""
    import tarfile
    from tarfile import TarError
    try:
        from io import BytesIO
        with tarfile.open(fileobj=BytesIO(bytes), mode='r') as f:
            names = f.getnames()
            if names == ['pipeline.yaml']:
                tar_info = f.getmember('pipeline.yaml')
                if tar_info.isfile():
                    return f.extractfile(tar_info).read()
                raise Exception('"pipeline.yaml" in TAR archive is not a regular file')
            raise Exception('Expected TAR archive to contain only a "pipeline.yaml"; found %s' % names)
    except TarError:
        return None

In [123]:
def try_extract_pipeline_zip(bytes):
    """Attempt to parse `bytes` as a ZIP archive and extract a `pipeline.yaml`"""
    from zipfile import BadZipFile, ZipFile
    try:
        from io import BytesIO
        with ZipFile(BytesIO(bytes), mode='r') as f:
            names = f.namelist()
            if names == ['pipeline.yaml']:
                return f.read('pipeline.yaml')
            raise Exception('Expected ZIP archive to contain only a "pipeline.yaml"; found %s' % names)
    except BadZipFile:
        return None

In [124]:
def is_pipeline_func(pipeline):
    return \
        hasattr(pipeline, '__call__') and \
        hasattr(pipeline, '_pipeline_name') and \
        hasattr(pipeline, '_pipeline_description')

In [125]:
def load_pipeline(pipeline):
    """Load a pipeline's spec as a dict
    
    Input pipeline can be:
    - a @dsl.pipeline function (in which case it is compiled to YAML which is then parsed)
    - a local or remote ("gs" and "https?" schemes supported) YAML file (or ZIP or TAR archive containing a pipeline.yaml)
    """
    import yaml
    if is_pipeline_func(pipeline):
        from kfp.compiler import Compiler
        compiler = Compiler()
        pipeline_yaml = compiler.compile(pipeline, package_path=None)
        return yaml.safe_load(pipeline_yaml)
    bytes = url_to_bytes(pipeline)
    pipeline_yaml = try_extract_pipeline_tar(bytes)
    if pipeline_yaml is None:
        pipeline_yaml = try_extract_pipeline_zip(bytes)
        if pipeline_yaml is None:
            pipeline_yaml = bytes
    return yaml.safe_load(pipeline_yaml)

In [126]:
def load_yaml(path):
    """Load YAML from a local or remote URL"""
    import yaml
    return yaml.safe_load(url_to_bytes(path))

## Pipeline-spec accessors
Utilities for pulling various fields out of a Kubeflow Pipeline's spec.

In [127]:
def get_metadata(pipeline):
    if 'metadata' not in pipeline:
        raise Exception('No "metadata" found in pipeline')
    return pipeline['metadata']

In [144]:
def get_annotations(pipeline):
    metadata = get_metadata(pipeline)
    if 'annotations' not in metadata:
        return ("", "")
    annotations = metadata['annotations']
    if 'pipelines.kubeflow.org/pipeline_spec' not in annotations:
        return ("", "")
    import json
    annotations = json.loads(annotations['pipelines.kubeflow.org/pipeline_spec'])
    return annotations['name'], annotations['description']

In [145]:
def get_name(pipeline):
    metadata = get_metadata(pipeline)

    if 'generateName' not in metadata:
        raise Exception('No "generateName" found in pipeline metadata')

    name = metadata['generateName']
    if name[-1] == '-':
        name = name[:-1]

    return name

In [146]:
def get_description(pipeline):
    (_, description) = get_annotations(pipeline)
    return description

## `ScheduledWorkflow` builders

### Load template `ScheduledWorkflow`+KFP YAML
To construct `ScheduledWorkflow`s below, we start with this template and fill in a few missing fields (pipeline spec, name, and description, as well as any other parameter overrides the user provides).

In [202]:
SWF_TEMPLATE_URL = 'https://raw.githubusercontent.com/ryan-williams/pipelines/swf/backend/src/crd/samples/scheduledworkflow/kfp.yaml'
SWF_TEMPLATE = load_yaml(SWF_TEMPLATE_URL)

In [16]:
!{python} -m pip install -q pandas
import pandas as pd

### Generate `ScheduledWorkflow`s for running Kubeflow Pipelines
`make_swf_kfp` generates the spec for a `ScheduledWorkflow` resource that will run a given KFP on a desired schedule.

#### Parameters
| Type | Name | Description |
| :-:| :-:| :---|
| `pipeline` | `str` | `dsl.pipeline` function or path to a file (or ZIP or TAR archive) containing the pipeline's YAML specification (paths can be to local files or `gs`- or `http`-schemed URLs). |
| `name` | `str` | Name of the ScheduledWorkflow resource to create; constructed from the underlying pipeline's name by default. |
| `description` | `str` | Description of the ScheduledWorkflow resource to create; constructed from the underlying pipeline's description by default. |
| `cron` | `str` | Crontab-formatted string specifying the schedule the pipeline should be run on; if neither `cron` nor `intervalSecond` is provided, the `DEFAULT_CRON_SCHEDULE` above is used. At most one of `cron` and `intervalSecond` should be provided. |
| `intervalSecond` | `int` | Interval at which to trigger runs of the provided pipeline. At most one of `cron` and `intervalSecond` should be provided. |
| `start` | `datetime` \| `str` | If provided, begin scheduling pipelines at this date+time (UTC is assumed if timezone is not made explicit) |
| `end` | `datetime` \| `str` | If provided, stop scheduling pipelines at this date+time (UTC is assumed if timezone is not made explicit) |
| `maxHistory` | `int` | Maximum number of pipeline runs' histories to store (default: 10) |
| `maxConcurrency` | `int` | Maximum number of concurrent runs of the pipeline (default: 10) |
| `enabled` | `bool` | Whether the generated ScheduledWorkflow should be enabled when it is created (default: True) |


In [185]:
from copy import deepcopy
class KfpSwf(object):

    DEFAULT_CRON_SCHEDULE = "1 * * * * *"
    ARGS = dict(
        group="kubeflow.org",
        version="v1beta1",
        namespace="default",
        plural="scheduledworkflows",
    )    
    
    def __init__(self, pipeline, **kwargs):
        self._init_pipeline(pipeline)
        self.kwargs = kwargs
        self._build()
        from kubernetes import config
        from kubernetes.client import CustomObjectsApi
        config.load_kube_config()
        self.api = CustomObjectsApi()
        self.name = None
        self.description = None
    
    def _init_pipeline(self, pipeline):
        if pipeline is None:
            return
        self.pipeline = pipeline
        self.parsed_pipeline = load_pipeline(pipeline)

    def create(self):
        obj = api.create_namespaced_custom_object(body=self.body, **self.ARGS)
        self.name = obj['metadata']['name']
        self.description = obj
        return self.description
    
    def describe(self):
        self.description = api.get_namespaced_custom_object(name=self.name, **self.ARGS)
        return self.description
    
    def history(self, key=None):
        history = self.describe()['status']['workflowHistory']

        if key is None:
            return history

        if key in history:
            runs = pd.DataFrame(self.history()[key])
            for k in [ 'started', 'scheduled', 'created', 'finished' ]:
                key = k + 'At'
                runs[key] = pd.to_datetime(runs[key])

            return runs
    
    def completed(self):
        return self.history(key='completed')

    def active(self):
        return self.history(key='active')

    def patch(self, **kwargs):
        name = self.name
        if name is None:
            raise Exception('Must call create() before you can patch()')

        new_pipeline = kwargs.pop('pipeline', None)
        self._init_pipeline(new_pipeline)

        new_args = deepcopy(self.kwargs)
        new_args.update(kwargs)
        self.kwargs = new_args
        
        self._build()
        
        return api.patch_namespaced_custom_object(name=name, body=self.body, **self.ARGS)

    def delete(self):
        name = self.name
        if name is None:
            raise Exception('Must call create() before you can patch()')

        from kubernetes.client import V1DeleteOptions
        return api.delete_namespaced_custom_object(name=name, body=V1DeleteOptions(), **self.ARGS)
    
    def _build(self):
        self.body = self.build(self.parsed_pipeline, **self.kwargs)
    
    @classmethod
    def build(
        self, 
        pipeline,
        name=None, description=None, 
        cron=None, intervalSecond=None, 
        start=None, end=None,
        maxHistory=10, maxConcurrency=10, enabled=True
    ):
        """Create a ScheduledWorkflow resource that will run a given pipeline on a desired schedule.

        :param str pipeline @dsl.pipeline function or path to a file (or ZIP or TAR archive) containing the pipeline's YAML specification (paths can be to local files or "gs"- or "http"-schemed URLs).
        :param str name Name of the ScheduledWorkflow resource to create; constructed from the underlying pipeline's name by default.
        :param str description Description of the ScheduledWorkflow resource to create; constructed from the underlying pipeline's description by default.
        :param str cron Crontab-formatted string specifying the schedule the pipeline should be run on; if neither `cron` nor `intervalSecond` is provided, the `DEFAULT_CRON_SCHEDULE` above is used. At most one of `cron` and `intervalSecond` should be provided.
        :param int intervalSecond Interval at which to trigger runs of the provided pipeline. At most one of `cron` and `intervalSecond` should be provided.
        :param datetime|str start If provided, begin scheduling pipelines at this date+time (UTC is assumed if timezone is not made explicit)
        :param datetime|str end If provided, stop scheduling pipelines at this date+time (UTC is assumed if timezone is not made explicit)
        :param int maxHistory Maximum number of pipeline runs' histories to store (default: 10)
        :param int maxConcurrency Maximum number of concurrent runs of the pipeline (default: 10)
        :param bool enabled Whether the generated ScheduledWorkflow should be enabled when it is created (default: True)    
        """
        swf = deepcopy(SWF_TEMPLATE)

        spec = swf['spec']

        if name is None:
            name = get_name(pipeline)
            
        if description is None:
            description = get_description(pipeline)

        if (cron is not None) and (intervalSecond is not None):
            raise Exception('At most one of {"cron","interval"} should be provided; received cron %s, interval %s' % (cron, intervalSecond))

        if (cron is None) and (intervalSecond is None):
            cron = DEFAULT_CRON_SCHEDULE

        def set_date(key, dt):
            from dateutil.parser import parse
            if dt is not None:
                if isinstance(dt, str):
                    dt = parse(dt)
                if dt.tzinfo is None:
                    from pytz import utc
                    dt = utc.localize(dt)
                schedule[key] = dt

        schedule = {}
        set_date('startTime', start)
        set_date('endTime', end)

        msg_parts = []  # accumulate pieces of scheduling metadata here (for inclusion in the ScheduledWorkflow's description)
        trigger = {}
        if cron is not None:
            schedule['cron'] = cron
            msg_parts.append('cron: %s' % cron)
            trigger['cronSchedule'] = schedule
        else:
            schedule['intervalSecond'] = intervalSecond
            msg_parts.append('interval: %ds' % intervalSecond)
            trigger['periodicSchedule'] = schedule

        spec['enabled'] = enabled
        spec['maxHistory'] = maxHistory
        spec['maxConcurrency'] = maxConcurrency
        spec['trigger'] = trigger

        if start is not None:
            msg_parts.append('start: %s' % str(start))
        if end is not None:
            msg_parts.append('end: %s' % str(end))

        spec['description'] = 'ScheduledWorkflow (%s): %s' % (', '.join(msg_parts), description)

        # Name for the SWF resource
        swf_name = 'swf-%s' % name
        spec['name'] = swf_name
        metadata = swf['metadata']
        metadata['name'] = swf_name

        workflow = spec['workflow']

        # Inline pipeline YAML into SWF YAML as a parameter to each run of the SWF, 
        # which will parse the pipeline and submit it
        parameters = workflow['parameters']
        import yaml
        parameters[1]['value'] = yaml.dump(pipeline)

        workflow_spec = workflow['spec']
        templates = workflow_spec['templates']
        template = templates[0]
        container = template['container']
        args = container['args']

        # populate "name" argument in template
        assert args[-2:] == [ '--name', '' ], "Unexpected final SWF args: %s" % args[-2:]
        args[-1] = name

        return swf

## `import`ing remote modules
ContextManager used for downloading (or moving) a file to a temporary location and importing from it

In [137]:
import sys
class Import(object):
    """ContextManager used for downloading (or moving) a file to a temporary location and importing from it"""
    def __init__(self, url):
        self.url = url
        self.tmpdir = None
    
    def __enter__(self):
        from urllib.parse import urlparse
        url = self.url
        parsed = urlparse(url)
        scheme = parsed.scheme

        from tempfile import TemporaryDirectory
        self.tmpdir = TemporaryDirectory()

        from os.path import basename, join
        path = parsed.path
        name = basename(path)  # Preserve file's basename (for ease of importing from it)
        tmpdir = self.tmpdir.name
        dest = join(tmpdir, name)  # File to import from will be downloaded here
        
        if not scheme:
            # Local file: copy it to temporary directory (for consistent import-isolation semantics with remote "import"s)
            with open(url, 'rb') as src, open(dest, 'wb') as dst:
                from shutil import copyfileobj
                copyfileobj(src, dst)
        elif scheme == 'http' or scheme == 'https':
            from urllib.request import urlretrieve
            urlretrieve(url, dest)
        elif scheme == 'gs':
            from google.cloud import storage
            gcs = storage.Client()
            bucket_name = parsed.hostname
            bucket = gcs.get_bucket(bucket_name)
            key = parsed.path
            blob = bucket.blob(key)
            blob.download_to_file(dest)
        else:
            raise Exception("Unsure how to handle URL scheme '%s' (%s)" % (scheme, url))    

        # Add temporary directory to "import" path
        sys.path.append(tmpdir)

        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        # Close+Delete the temporary directory
        self.tmpdir.__exit__(exc_type, exc_val, exc_tb)
        tmpdir = self.tmpdir.name
        # Remove it from sys.path
        sys.path.remove(tmpdir)
        return False

## "Coin flip" sample
- wrap [the "coin flip" example from the Kubeflow Pipelines repo](https://github.com/kubeflow/pipelines/blob/0.1.31/samples/core/condition/condition.py) in a `ScheduledWorkflow`
- create it, let it run for a few minutes, update ("patch") it, and delete it.

# Examples
- ["Coin flip" sample](#coin-flip-sample) (via GitHub)
- ["Coin flip" sample](#coin.tar.gz) (via Google Cloud Storage)

### `import` pipeline definition from latest GitHub release

In [138]:
coin_flip_sample_url = 'https://raw.githubusercontent.com/kubeflow/pipelines/%s/samples/core/condition/condition.py' % kfp_version
with Import(coin_flip_sample_url):
    # The condition.py has been downloaded to a temporary directory and placed on sys.path
    from condition import flipcoin_pipeline

### Make a `ScheduledWorkflow` wrapping the pipeline
Build a `ScheduledWorkflow` spec that runs the "coin flip" pipeline every minute, on the minute:

In [186]:
swf = KfpSwf(flipcoin_pipeline, cron="0 * * * * *", start='2019-09-29 00:00', end='2019-09-29 00:10')

### Create, Read, Update, and Delete the `ScheduledWorkflow`

#### Create:

In [183]:
swf.create()

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'ScheduledWorkflow',
 'metadata': {'creationTimestamp': '2019-09-30T03:29:48Z',
  'generation': 1,
  'name': 'swf-conditional-execution-pipeline',
  'namespace': 'default',
  'resourceVersion': '287261',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/default/scheduledworkflows/swf-conditional-execution-pipeline',
  'uid': 'c91da570-0a32-4101-885f-21404ea8375f'},
 'spec': {'description': 'ScheduledWorkflow (cron: 0 * * * * *, start: 2019-09-29 00:00, end: 2019-09-29 00:10): Shows how to use dsl.Condition().',
  'enabled': True,
  'maxConcurrency': 10,
  'maxHistory': 10,
  'name': 'swf-conditional-execution-pipeline',
  'trigger': {'cronSchedule': {'cron': '0 * * * * *',
    'endTime': '2019-09-29T00:10:00+00:00',
    'startTime': '2019-09-29T00:00:00+00:00'}},
  'workflow': {'parameters': [{'name': 'datetime',
     'value': '[[ScheduledTime.20060102-15:04:05]]'},
    {'name': 'pipeline_yaml',
     'value': 'apiVersion: argoproj.io/v

#### Read

In [196]:
swf.completed()

Unnamed: 0,createdAt,finishedAt,index,name,namespace,phase,scheduledAt,selfLink,startedAt,uid
0,2019-09-30 03:43:57+00:00,2019-09-30 03:44:33+00:00,34,swf-conditional-execution-pipeline-34-1284205302,default,Succeeded,2019-09-30 00:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:57+00:00,557af601-0af5-4537-b29f-b4b856114883
1,2019-09-30 03:43:47+00:00,2019-09-30 03:44:22+00:00,33,swf-conditional-execution-pipeline-33-1334538159,default,Succeeded,2019-09-29 23:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:47+00:00,cb461448-62cb-47ef-942f-c80d640306b2
2,2019-09-30 03:43:37+00:00,2019-09-30 03:44:11+00:00,32,swf-conditional-execution-pipeline-32-1317760540,default,Succeeded,2019-09-29 22:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:37+00:00,54e27532-319d-499e-b4f6-efdc44e609ad
3,2019-09-30 03:43:27+00:00,2019-09-30 03:44:03+00:00,31,swf-conditional-execution-pipeline-31-1368093397,default,Succeeded,2019-09-29 21:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:27+00:00,874ee017-0387-4608-acd8-e0f6a15b2514
4,2019-09-30 03:43:17+00:00,2019-09-30 03:43:53+00:00,30,swf-conditional-execution-pipeline-30-1351315778,default,Succeeded,2019-09-29 20:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:17+00:00,e011998d-0c99-4a5b-bf8d-e1025893ab10
5,2019-09-30 03:43:07+00:00,2019-09-30 03:43:42+00:00,29,swf-conditional-execution-pipeline-29-1301130016,default,Succeeded,2019-09-29 19:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:43:07+00:00,81318487-842a-4241-858b-49ab0755037d
6,2019-09-30 03:42:57+00:00,2019-09-30 03:43:32+00:00,28,swf-conditional-execution-pipeline-28-1317907635,default,Succeeded,2019-09-29 18:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:42:57+00:00,17c684b2-fe3f-4e02-9d91-798c897e009a
7,2019-09-30 03:42:47+00:00,2019-09-30 03:43:21+00:00,27,swf-conditional-execution-pipeline-27-1536016682,default,Succeeded,2019-09-29 17:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:42:47+00:00,00398be3-8efc-4933-8abf-7255e3e3411a
8,2019-09-30 03:42:37+00:00,2019-09-30 03:43:12+00:00,26,swf-conditional-execution-pipeline-26-1552794301,default,Succeeded,2019-09-29 16:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:42:37+00:00,f073bd7b-4610-4881-9329-1b58fa731173
9,2019-09-30 03:42:27+00:00,2019-09-30 03:43:01+00:00,25,swf-conditional-execution-pipeline-25-1502461444,default,Succeeded,2019-09-29 15:00:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 03:42:27+00:00,d92d39c4-8cc9-4a19-b859-56dffaa97d64


#### Update ("patch")

In [192]:
# Make any desired change to the ScheduledWorkflow body here:
swf.patch(cron="0 0 * * * *", start='2019-09-29', end='2019-09-30')

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'ScheduledWorkflow',
 'metadata': {'creationTimestamp': '2019-09-30T03:29:48Z',
  'generation': 46,
  'labels': {'scheduledworkflows.kubeflow.org/enabled': 'true',
   'scheduledworkflows.kubeflow.org/status': 'Enabled'},
  'name': 'swf-conditional-execution-pipeline',
  'namespace': 'default',
  'resourceVersion': '289273',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/default/scheduledworkflows/swf-conditional-execution-pipeline',
  'uid': 'c91da570-0a32-4101-885f-21404ea8375f'},
 'spec': {'description': 'ScheduledWorkflow (cron: 0 0 * * * *, start: 2019-09-29, end: 2019-09-30): Shows how to use dsl.Condition().',
  'enabled': True,
  'maxConcurrency': 10,
  'maxHistory': 10,
  'name': 'swf-conditional-execution-pipeline',
  'trigger': {'cronSchedule': {'cron': '0 0 * * * *',
    'endTime': '2019-09-30T00:00:00+00:00',
    'startTime': '2019-09-29T00:00:00+00:00'}},
  'workflow': {'parameters': [{'name': 'datetime',
     'value': 

#### Delete

In [197]:
swf.delete()

{'apiVersion': 'v1',
 'details': {'group': 'kubeflow.org',
  'kind': 'scheduledworkflows',
  'name': 'swf-conditional-execution-pipeline',
  'uid': 'c91da570-0a32-4101-885f-21404ea8375f'},
 'kind': 'Status',
 'metadata': {},
 'status': 'Success'}

All set!

## `coin.tar.gz`
This example runs the same "coin flip" pipeline, but fetching it from Google Cloud Storage in a couple of ways, and exercising `ScheduledWorkflows`'s scheduling parameters more:

### via `gs`-scheme URL

In [17]:
from os import environ as env
env['GOOGLE_APPLICATION_CREDENTIALS'] = '/Users/ryan/.gcloud/kubeflow-232704@appspot.gserviceaccount.com.json'

In [18]:
# Need at least 1.17.0 to download a blob directly without storage.buckets.get access to its bucket
!{python} -m pip install --upgrade -q google-cloud-storage==1.17.0

In [198]:
url = 'gs://ml-pipeline-playground/coin.tar.gz'

In [203]:
swf = KfpSwf(url, cron="0 * * * * *")

In [204]:
swf.create()

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'ScheduledWorkflow',
 'metadata': {'creationTimestamp': '2019-09-30T04:01:19Z',
  'generation': 1,
  'name': 'swf-pipeline-flip-coin',
  'namespace': 'default',
  'resourceVersion': '293869',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/default/scheduledworkflows/swf-pipeline-flip-coin',
  'uid': '0d67ff01-147f-4a39-ae8d-0881ac3b34f1'},
 'spec': {'description': 'ScheduledWorkflow (cron: 0 * * * * *): ',
  'enabled': True,
  'maxConcurrency': 10,
  'maxHistory': 10,
  'name': 'swf-pipeline-flip-coin',
  'trigger': {'cronSchedule': {'cron': '0 * * * * *'}},
  'workflow': {'parameters': [{'name': 'datetime',
     'value': '[[ScheduledTime.20060102-15:04:05]]'},
    {'name': 'pipeline_yaml',
     'value': 'apiVersion: argoproj.io/v1alpha1\nkind: Workflow\nmetadata: {generateName: pipeline-flip-coin-}\nspec:\n  arguments:\n    parameters: []\n  entrypoint: pipeline-flip-coin\n  serviceAccountName: pipeline-runner\n  templates:\n  - inp

In [205]:
swf.completed()

Unnamed: 0,createdAt,finishedAt,index,name,namespace,phase,scheduledAt,selfLink,startedAt,uid
0,2019-09-30 04:04:28+00:00,2019-09-30 04:04:47+00:00,3,swf-pipeline-flip-coin-3-759420850,default,Succeeded,2019-09-30 04:04:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 04:04:28+00:00,003eb3e3-8ba1-4898-8f6b-9ab4a65f5651
1,2019-09-30 04:03:08+00:00,2019-09-30 04:03:42+00:00,2,swf-pipeline-flip-coin-2-776198469,default,Succeeded,2019-09-30 04:03:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 04:03:08+00:00,0fdc7e2c-1d92-4181-b97a-81fcfbb0d0ee
2,2019-09-30 04:02:28+00:00,2019-09-30 04:03:11+00:00,1,swf-pipeline-flip-coin-1-725865612,default,Succeeded,2019-09-30 04:02:00+00:00,/apis/argoproj.io/v1alpha1/namespaces/default/...,2019-09-30 04:02:28+00:00,64bad254-06fb-467f-a56e-a67011272c82


In [206]:
swf.delete()

{'apiVersion': 'v1',
 'details': {'group': 'kubeflow.org',
  'kind': 'scheduledworkflows',
  'name': 'swf-pipeline-flip-coin',
  'uid': '0d67ff01-147f-4a39-ae8d-0881ac3b34f1'},
 'kind': 'Status',
 'metadata': {},
 'status': 'Success'}

### via `storage.googleapis`

In [None]:
http = 'https://storage.googleapis.com/ml-pipeline-playground/coin.tar.gz'