# Using MLRUN with Dask Distributed Jobs

In [1]:
# recommended, installing the exact package versions as we use in the worker
#!pip install dask==2.12.0 distributed==2.14.0 

## Writing a function code

In [2]:
# function that will be distributed 
def inc(x):
    return x+2

The MLRun context in the case of Dask will have an extra param `dask_client`
which is initialized based on the function spec (below), and can be used to submit Dask commands.

In [3]:
# wrapper function, uses the dask client object
def hndlr(context, x=1,y=2):
    context.logger.info('params: x={},y={}'.format(x,y))
    print('params: x={},y={}'.format(x,y))
    x = context.dask_client.submit(inc, x)
    print(x)
    print(x.result())
    context.log_result('y', x.result())

In [4]:
# nuclio: end-code
# marks the end of a code section, do not delete

In [5]:
# nuclio: ignore
import nuclio

In [6]:
from mlrun import new_function, mlconf, code_to_function, mount_v3io, NewTask
#mlconf.dbpath = 'http://mlrun-api:8080'

## Define a Dask function object
dask functions can be local (local workers), or remote (use containers in the cluster),
in the case of `remote` users can specify the number of replica (optional) or leave blank for auto-scale.

We use `code_to_function()` which packs the function code into the function object/yaml (eliminate the need to update the function image), we can use `new_function()` if the function code is part of the image or can be remotely mounted (e.g. via v3io mount).

Dask function spec have several unique attributes:
* **.remote** - bool, use local or clustered dask
* **.replicas** - number of desired replicas, keep 0 for auto-scale
* **.min_replicas, .max_replicas** - set replicas range for auto-scale
* **.scheduler_timeout** - cluster will be killed after timeout (inactivity), default is `'60 minutes'`
* **.nthreads** - number of worker threads
* **.kfp_image** - optional, container image to use by KFP Pipeline runner (default to mlrun/dask)

If you want to access the dask dashboard from remote you need to use `NodePort` service type (set **.service_type** to 'NodePort'), and the external IP need to be specified in mlrun configuration (`mlconf.remote_host`), this will be set automatically if you are running on an Iguazio cluster.
If you want to use the `NodePort` for remote access to the scheduler you need to also specify `function.use_remote=True`.

In [8]:
# create the function from the notebook code + annotations, add volumes.apply(mount_v3io())
dsf = code_to_function('mydask', kind='dask')

In [9]:
dsf.spec.image = 'mlrun/ml-models'
dsf.spec.remote = True
dsf.spec.replicas = 1
dsf.spec.service_type = 'NodePort'
dsf.spec.image_pull_policy = 'Always'

## Build the function with extra packages
We can skip the build section if we dont add packages (instead need to specify the image e.g. `dsf.spec.image='daskdev/dask:2.9.1'`) 

In [10]:
# uncomment if you want to add packages to the workers and build a new image
# dsf.build_config(base_image='daskdev/dask:2.9.1', commands=['pip install pandas'])
# dsf.deploy()

## Run a task using our distributed dask function (cluster)

In [11]:
myrun = dsf.run(handler=hndlr, params={'x': 12})

[mlrun] 2020-05-03 03:25:11,427 starting run mydask-hndlr uid=f9367fff72c546c48983ce46768c784b  -> http://mlrun-api:8080
[mlrun] 2020-05-03 03:25:17,664 trying dask client at: tcp://mlrun-mydask-f4f532c3-0.kubeflow:8786


tornado.application - ERROR - Exception in callback functools.partial(<bound method Runner.run of <tornado.gen.Runner object at 0x7f39fcbaf490>>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 764, in run
    if not self.handle_yield(yielded):
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 789, in handle_yield
    self.io_loop.add_future(self.future, inner)
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 693, in add_future
    assert is_future(future)
AssertionError

blosc
+-----------+---------+
|           | version |
+-----------+---------+
| client    | None    |
| scheduler | 1.7.0   |
+-----------+---------+

cloudpickle
+-----------+---------+
|           | version |
+-----------+---------+
| client    | 1.4.1   |
| scheduler | 1.1.1   |
+-----------+---------+

da

KeyboardInterrupt: 

tornado.application - ERROR - Exception in callback functools.partial(<bound method PeriodicCallback._run of <tornado.ioloop.PeriodicCallback object at 0x7f39fcba6250>>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 759, in _run_callback
    self.add_future(ret, self._discard_future_result)
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 693, in add_future
    assert is_future(future)
AssertionError
tornado.application - ERROR - Exception in callback functools.partial(<function Runner.handle_yield.<locals>.inner at 0x7f39fc2cf200>, <Future finished result=None>)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
    ret = callback()
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 787, in inner
    self.run()
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 764, in run
    if not self.h

In [13]:
# get the function status and addresses
dsf.status.to_dict()

{'scheduler_address': 'tcp://mlrun-mydask-09b289cf-1.default-tenant:8786',
 'cluster_name': 'mlrun-mydask-09b289cf-1',
 'node_ports': {'dashboard': 31630, 'scheduler': 32384}}

### Accesing the dask client directly
You can get the dask client object and cluster information by reading the client object.

> Note: the cluster can timeout, when you call the client MLrun will also verify the cluster is live and active and if not it will restart the dask cluster and refresh the function object with the latest addresses and status.

In [14]:
c = dsf.client

[mlrun] 2020-04-23 22:45:47,709 trying dask client at: tcp://mlrun-mydask-09b289cf-1.default-tenant:8786
[mlrun] 2020-04-23 22:45:47,716 using remote dask scheduler (mlrun-mydask-09b289cf-1) at: tcp://mlrun-mydask-09b289cf-1.default-tenant:8786



python
+-------------------------+---------------+
|                         | version       |
+-------------------------+---------------+
| client                  | 3.6.8.final.0 |
| scheduler               | 3.7.6.final.0 |
| tcp://10.200.0.53:41445 | 3.7.6.final.0 |
+-------------------------+---------------+


### Access a dask function using the DB
If we want to access the dask function (or its cluster), we can load the function object from the DB (assuming we already .run() or .save() it).

This can be useful if we want to load the same function in a different notebook or container, or if we restarted our notebook

In [15]:
from mlrun import import_function
# Functions url: db://<project>/<name>[:tag]
dsf_obj = import_function('db://default/mydask')
c = dsf_obj.client

[mlrun] 2020-04-23 22:47:46,151 trying dask client at: tcp://mlrun-mydask-09b289cf-1.default-tenant:8786
[mlrun] 2020-04-23 22:47:46,159 using remote dask scheduler (mlrun-mydask-09b289cf-1) at: tcp://mlrun-mydask-09b289cf-1.default-tenant:8786


## Building a Pipeline using dask functions

In [16]:
import kfp
from kfp import dsl
from mlrun import run_pipeline

In [17]:
@dsl.pipeline(name="dask_pipeline")
def dask_pipe(x=1,y=10):
    # use_db option will use a function (DB) pointer instead of adding the function spec to the YAML
    myrun = dsf.as_step(NewTask(handler=hndlr, name="dask_pipeline", params={'x': x, 'y': y}), use_db=True)
    
    # if the step (dask client) need v3io access u should add: .apply(mount_v3io())
    
    # if its a new image we may want to tell Kubeflow to reload the image
    # myrun.container.set_image_pull_policy('Always')

In [18]:
# for pipeline debug
kfp.compiler.Compiler().compile(dask_pipe, 'daskpipe.yaml', type_check=False)



In [19]:
arguments={'x':4,'y':-5}
artifact_path = '/User/test'
run_pipeline(dask_pipe, arguments, artifact_path=artifact_path,
             run="DaskExamplePipeline", experiment="dask pipe")

[mlrun] 2020-04-23 22:47:59,224 Pipeline run id=f0486fd1-bd83-4d4b-8b08-415c6c1c6556, check UI or DB for progress


'f0486fd1-bd83-4d4b-8b08-415c6c1c6556'