# Using MLRun Projects and GIT
  --------------------------------------------------------------------

Loading or creating a full project with multiple functions and workflow and working wit Git.

#### **notebook how-to's**
* Load a project with multiple functions from Git
* Run automated workflows (using KubeFlow)
* Update, maintain and debug code 

<a id='top'></a>
#### **steps**
**[Load project from Git or Archive](#load-project)**<br>
**[Run a pipeline workflow](#run-pipeline)**<br>
**[Updating the project and code](#update-project)**<br>

In [1]:
from mlrun import load_project, mlconf

<a id='load-project'></a>
## Load project from Git or Archive

Projects can be stored in a Git repo or in a tar archive (on object store like S3, V3IO).

`load_project(context, url)` will load/clone the project to the local context dir and build the project object from the `project.yaml` file in the git/archive root directory. 

> Note: If URL is not specified it will use the context and search for Git repo inside it, or use the `init_git=True` flag to initialize a Git repo in the target context directory.

In [2]:
# source Git Rep, it is recommended to fork this to your account
url = 'git://github.com/mlrun/demo-xgb-project.git'

# alternatively can use tar files
#url = 'v3io:///users/admin/tars/src-project.tar.gz'

proj = load_project('/User/my-proj', url)

Examine the project object, note it contains lists of `functions` and `workflows` which will be used in the project. Functions can be local to the project or referenced to (via a URL to .ipynb, .py, .yaml file and/or container image). 

In [3]:
print(proj.to_yaml())

name: iris
functions:
- url: function.yaml
  name: xgb
- url: https://raw.githubusercontent.com/mlrun/functions/master/serving/xgboost/xgb_serving.ipynb
  name: serving
workflows:
  main: workflow.py



In [4]:
proj.source

'git://github.com/mlrun/demo-xgb-project.git'

In [5]:
# read specific function spec
print(proj.func('xgb').to_yaml())

kind: job
metadata:
  name: iris-demo
  project: iris
spec:
  command: iris.py
  args: []
  image: ''
  volumes: []
  volume_mounts: []
  env: []
  description: ''
  build:
    source: git://github.com/mlrun/demo-xgb-project.git
    base_image: mlrun/mlrun
    commands:
    - pip install sklearn
    - pip install xgboost
    - pip install matplotlib
    code_origin: https://github.com/mlrun/demo-xgb-project.git#262443b810b158daca1a6058c45299b410edf7fa



<a id='run-pipeline'></a>
## Run a pipeline workflow
You can check the `workflow.py` file to see how functions objects are initialized and used (by name) inside the workflow.
The `workflow.py` file has two parts, initialize the function objects and define pipeline dsl (connect the function inputs and outputs).

> Note the pipeline can include CI steps like building container images and deploying models.

### Initializing the functions (e.g. mount them on the v3io fabric)
```python
def init_functions(functions: dict, params=None, secrets=None):
    for f in functions.values():
        f.apply(mount_v3io())
        
```
<br>

### Workflow DSL:
```python
@dsl.pipeline(
    name='My XGBoost training pipeline',
    description='Shows how to use mlrun.'
)
def kfpipeline(
        eta=[0.1, 0.2, 0.3], gamma=[0.1, 0.2, 0.3]
):
    # first step build the function container
    builder = funcs['xgb'].deploy_step(with_mlrun=False)

    # use xgb.iris_generator function to generate data (container image from the builder)
    ingest = funcs['xgb'].as_step(name='ingest_iris', handler='iris_generator',
        image=builder.outputs['image'],
        params={'target': df_path},
        outputs=['iris_dataset'], out_path=artifacts_path)

    # use xgb.xgb_train function to train on the data (from the generator step)
    train = funcs['xgb'].as_step(name='xgb_train', handler='xgb_train',
        image=builder.outputs['image'],
        hyperparams={'eta': eta, 'gamma': gamma},
        selector='max.accuracy',
        inputs={'dataset': ingest.outputs['iris_dataset']},
        outputs=['model'], out_path=artifacts_path)

    # deploy the trained model using a nuclio real-time function
    deploy = funcs['serving'].deploy_step(models={'iris_v1': train.outputs['model']})
```

### Run
use the `run` method to execute a workflow, you can provide alternative arguments and specify the default target for workflow artifacts.<br>
The workflow ID is returned and can be used to track the progress or you can use the hyperlinks

> Note: The same command can be issued through CLI commands:<br>
    `mlrun project my-proj/ -r main -p "v3io:///users/admin/mlrun/kfp/{{workflow.uid}}/"`


In [6]:
proj.run('main', arguments={}, artifacts_path='v3io:///users/admin/mlrun/kfp/{{workflow.uid}}/')



[mlrun] 2020-02-22 23:49:56,310 Pipeline run id=0c9d0ecc-3b58-45eb-9f63-6da00420567a, check UI or DB for progress


'0c9d0ecc-3b58-45eb-9f63-6da00420567a'

<a id='update-project'></a>
## Updating the project and code

A user can update the code using the standard Git process (commit, push, ..), if you update/edit the project object you need to run `proj.save()` which will update the `project.yaml` file in your context directory, followed by pushing your updates.

You can use `proj.push(branch, commit_message)` which will do the work for you (save the yaml, commit updates, push)

> Note: you must push updates before you build functions or run workflows since the builder will pull the code from the git repo.

If you are using containerized Jupyter you may need to first set your Git parameters, e.g.:

```
!git config --global user.email "<my@email.com>"
!git config --global user.name "<name>"
!git config --global credential.helper store
```

In [7]:
proj.push('master', 'some edits')

If you want to pull changes done by others use `proj.pull()`, if you need to update the project spec (since the yaml file changed) use `proj.reload()` and for updating the local/remote function specs use `proj.sync_functions()` (or add `sync=True` to `.reload()`.

In [8]:
proj.pull()

## Replacing the source path to speed debug

Instead of updating Git anytime we modify code we can build the code from the shared file system on the cluster (the build container will mount to the same location with the code instead of reading from Git).

We need to change the project source to point to the shared file system URL of our context directory (e.g. v3io), and we can re-run the workflow. 

In [8]:
proj.source = 'v3io:///users/admin/my-proj'

**[back to top](#top)**