# Using MLRUN with Dask Distributed Jobs

In [1]:
from mlrun import new_function, mlconf
mlconf.remote_host = '13.58.34.174'  # remote cluster IP/DNS for link to dask dashboard
mlconf.dbpath = 'http://mlrun-api:8080'

## Writing a function code

In [2]:
# function that will be distributed 
def inc(x):
    return x+2

The MLRun context in the case of Dask will have an extra param `dask_client`
which is initialized based on the function spec (below), and can be used to submit Dask commands.

In [3]:
def hndlr(context, x=1,y=2):
    x = context.dask_client.submit(inc, x)
    print(x)
    print(x.result())
    context.log_result('y', x.result())

## Define the function
dask functions can be local (local workers), or remote (use containers in the cluster),
in the case of `remote` users can specify the number of replica (optional) or leave blank for auto-scale.

In [4]:
dsf = new_function('dask-tst', kind='dask')
dsf.spec.remote = True
dsf.spec.replicas = 1
dsf.spec.service_type = 'NodePort'
dsf.spec.image_pull_policy = 'Always'

## Build the function with extra packages
We can skip the build section if we dont add packages (instead need to specify the image e.g. `dsf.spec.image='daskdev/dask:2.9.1'`) 

In [5]:
dsf.build_config(base_image='daskdev/dask:2.9.1', commands=['pip install pandas'])
dsf.deploy()

[mlrun] 2020-01-10 00:41:53,710 starting remote build, image: .mlrun/func-default-dask-tst-latest
[36mINFO[0m[0000] Resolved base name daskdev/dask:latest to daskdev/dask:latest 
[36mINFO[0m[0000] Resolved base name daskdev/dask:latest to daskdev/dask:latest 
[36mINFO[0m[0000] Downloading base image daskdev/dask:latest   
[36mINFO[0m[0000] Error while retrieving image from cache: getting file info: stat /cache/sha256:2ac5385ebc20fe2982a22f8fcf3cf765e7a01dc5e5003b42aa44493af0a06438: no such file or directory 
[36mINFO[0m[0000] Downloading base image daskdev/dask:latest   
[36mINFO[0m[0000] Built cross stage deps: map[]                
[36mINFO[0m[0000] Downloading base image daskdev/dask:latest   
[36mINFO[0m[0000] Error while retrieving image from cache: getting file info: stat /cache/sha256:2ac5385ebc20fe2982a22f8fcf3cf765e7a01dc5e5003b42aa44493af0a06438: no such file or directory 
[36mINFO[0m[0000] Downloading base image daskdev/dask:latest   
[36mINFO[0m[0000] Un

True

## Run a task using our distributed dask function (cluster)

In [6]:
myrun = dsf.run(handler=hndlr, params={'x': 12})

[mlrun] 2020-01-10 00:42:16,351 starting run hndlr uid=877916f9d4a74ab39fb140b8bc0540b9  -> http://mlrun-api:8080
[mlrun] 2020-01-10 00:42:16,883 saving function: dask-tst, tag: latest
[mlrun] 2020-01-10 00:42:22,229 using remote dask scheduler (mlrun-dask-tst-4c2600bc-d) at: 13.58.34.174:30064
[mlrun] 2020-01-10 00:42:22,229 remote dashboard (node) port: 13.58.34.174:31613
<Future: status: pending, key: inc-08e262e75605e7017d08869db52a832d>
14

[mlrun] 2020-01-10 00:42:26,135 run ended with state 


uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
...0540b9,0,Jan 10 00:42:16,completed,hndlr,kind=daskowner=adminhost=jupyter-dulwoc9x63-ixir3-68dccc6b7-tsk75,,x=12,y=14,


to track results use .show() or .logs() or in CLI: 
!mlrun get run 877916f9d4a74ab39fb140b8bc0540b9  , !mlrun logs 877916f9d4a74ab39fb140b8bc0540b9 
[mlrun] 2020-01-10 00:42:26,180 run executed, status=completed
