## Using MLRun function as a Kubernetes Job

In [1]:
# nuclio: ignore
# do not remove the comment above (it is a directive to nuclio, ignore that cell during build)
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter and restart the kernel 
import nuclio 

We use `%nuclio` magic commands to set package dependencies and configuration:
<a id='nuclio'></a>

In [19]:
%nuclio cmd -c pip install pandas
%nuclio config spec.build.baseImage = "mlrun/mlrun"

%nuclio: setting spec.build.baseImage to 'mlrun/mlrun'


### Import mlrun modules

In [20]:
from mlrun import mlconf,get_or_create_ctx, code_to_function, NewTask
from mlrun.artifacts import TableArtifact

### MLRun api
This was pre-deployed for the scenario

In [21]:
mlconf.dbpath = 'http://mlrun-api:8080'

<a id="build"></a>
### Define cluster jobs and build images

In order to use our function in a cluster we need to package our code and dependencies.

The ```new_function``` call will automatically generate a ```function``` object from the specified Python file with its list of dependencies and runtime configuration.

In [22]:
# create an ML function from the notebook, attache it to iguazio data fabric (v3io)
trainer = code_to_function(name='my-trainer', kind='job',filename='functions.py')

In [23]:
#print(trainer.to_yaml())

The functions need shared storage (file or object) media to pass and store artifacts.

You can add _**Kubernetes**_ resources like volumes, environment variables, secrets, cpu/mem/gpu, etc. to a function.

```mlrun``` uses _**KubeFlow**_ modifiers (apply) to configure resources, you can build your own or use predefined ones e.g. for [AWS resources](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/aws.py).


##### _**Option 1: Using file volumes for artifacts**_
If your are using [Iguazio data science platform](https://www.iguazio.com/) use the `mount_v3io()` auto-mount modifier.<br>
if you use other k8s PVC volumes you can use the `mlrun.platforms.mount_pvc(..)` modifier with the requiered params.

Applying ```mount_v3io()``` will attach the function to Iguazio's real-time data fabric (mounted by default to _**home**_ of the current user).

**Note**: if the notebook is not on the managed platform (running remotely) you need to create and use a v3io secret, run:

`kubectl create -n <namespace> secret generic my-v3io --from-literal=accessKey=<your access key> --from-literal=username=<your user name> --type v3io/fuse`

and use: `trainer.apply(mount_v3io(user='admin', secret='my-v3io'))`.

So for our current ```training``` function, when using Iguazio data science platform run:

______________________________________________

<a id="deploy-build"></a>
### **deploy (build) the function container**

The `deploy()` command will build a custom container image (create a cluster build job) from the outlined function dependencies.

If a pre-built container image already exists, pass the `image` name instead. _**Note that the code and params can be updated per run without building a new image**_.

The image is stored in a container repository, and by default it uses the repository configured on the MLRun API service, you can specify your own docker registry by first creating a secret, and adding that secret name to the build configuration:

and run this: `trainer.build_config(image='target/image:tag', secret='my_docker')`

In [24]:
trainer.deploy()

[mlrun] 2020-04-22 13:54:35,942 running build to add mlrun package, set with_mlrun=False to skip if its already in the image
[mlrun] 2020-04-22 13:54:35,949 starting remote build, image: .mlrun/func-default-my-trainer-latest


True

In [73]:
out='/home/jovyan/mlrun/artifacts/'

<a id="run-on-cluster"></a>
### **run the function on the cluster**


In case we made changes to the code, ```with_code``` will inject the latest code into the function (it doesn't require a new build).

In [74]:
# create the base task (common to both steps), and set the output path and experiment label
base_task = NewTask(artifact_path=out).set_label('stage', 'dev')

In [76]:
# run our training task, with hyper params, and select the one with max accuracy
train_task = NewTask(name='my-training', handler='training', params={'p1': 9}, base=base_task)
train_run = trainer.run(train_task,artifact_path=out)

[mlrun] 2020-04-22 14:48:41,481 starting run my-training uid=5665f29af2944e8eaa88c4f7e2ddde2f  -> http://mlrun-api:8080
[mlrun] 2020-04-22 14:48:41,580 Job is running in the background, pod: my-training-kkhnz
Run: my-training (uid=5665f29af2944e8eaa88c4f7e2ddde2f)
Params: p1=9, p2=2
[mlrun] 2020-04-22 14:48:44,014 started training
[mlrun] 2020-04-22 14:48:44,035 log artifact model at /home/jovyan/mlrun/artifacts/model.txt, size: 10, db: Y

[mlrun] 2020-04-22 14:48:44,045 run executed, status=completed
final state: succeeded


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...ddde2f,0,Apr 22 14:48:43,completed,my-training,category=testshost=my-training-kkhnzkind=jobowner=jovyanstage=dev,,p1=9,accuracy=18loss=27,model


to track results use .show() or .logs() or in CLI: 
!mlrun get run 5665f29af2944e8eaa88c4f7e2ddde2f  , !mlrun logs 5665f29af2944e8eaa88c4f7e2ddde2f 
[mlrun] 2020-04-22 14:48:47,669 run executed, status=completed


In [72]:
# running validation, use the model result from the previos step 
model_path = train_run.outputs['model']
trainer.run(base_task, handler='validation', inputs={'model': model_path}, watch=True)

[mlrun] 2020-04-22 14:45:23,573 starting run my-trainer-validation uid=f80fed0a50e243a3abbd2d1fdd6393a5  -> http://mlrun-api:8080
[mlrun] 2020-04-22 14:45:23,709 Job is running in the background, pod: my-trainer-validation-mxp2n
Run: my-trainer-validation (uid=f80fed0a50e243a3abbd2d1fdd6393a5)
[mlrun] 2020-04-22 14:45:26,385 Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/mlrun-0.4.6-py3.6.egg/mlrun/runtimes/local.py", line 184, in exec_from_params
    val = handler(*args_list)
  File "main.py", line 53, in validation
    print(f'file - {model.url}:\n{model.get()}\n')
  File "/usr/local/lib/python3.6/site-packages/mlrun-0.4.6-py3.6.egg/mlrun/datastore.py", line 245, in get
    return self._store.get(self._path, size=size, offset=offset)
  File "/usr/local/lib/python3.6/site-packages/mlrun-0.4.6-py3.6.egg/mlrun/datastore.py", line 278, in get
    with open(self._join(key), 'rb') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jovya

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...6393a5,0,Apr 22 14:45:26,error,my-trainer-validation,host=my-trainer-validation-mxp2nkind=jobowner=jovyanstage=dev,model,,,


to track results use .show() or .logs() or in CLI: 
!mlrun get run f80fed0a50e243a3abbd2d1fdd6393a5  , !mlrun logs f80fed0a50e243a3abbd2d1fdd6393a5 
[mlrun] 2020-04-22 14:45:29,844 run executed, status=error
runtime error: [Errno 2] No such file or directory: '/home/jovyan/mlrun/artifacts/model.txt'


RunError: [Errno 2] No such file or directory: '/home/jovyan/mlrun/artifacts/model.txt'