# Creating A New MLRun Project
  --------------------------------------------------------------------

creating a full project with multiple functions and workflow and working wit Git.

#### **notebook how-to's**
* Add local or library/remote functions
* Add a workflow
* Save to a remote git
* Run pipeline

<a id='top'></a>
#### **steps**
**[Add functions](#load-functions)**<br>
**[Create and save a workflow](#create-workflow)**<br>
**[Update remote git](#git-remote)**<br>
**[Run a pipeline workflow](#run-pipeline)**<br>

In [1]:
from mlrun import new_project, code_to_function

In [2]:
# update the dir and repo to reflect real locations
# the remote git repo must be initialized in GitHub
project_dir = "/User/new-proj"
remote_git = "https://github.com/<my-org>/<my-repo>.git"
newproj = new_project("new-project", project_dir, init_git=True)

Set the remote git repo and pull to sync in case it has some content

In [None]:
newproj.create_remote(remote_git)

In [4]:
newproj.pull()

<a id='load-functions'></a>
### Load functions from remote URLs or marketplace
We create two functions:
1. Load a function from the function market (converted into a function object)
2. Create a function from file in the context dir (w copy a demo file into the dir) 

In [3]:
newproj.set_function("hub://load_dataset", "ingest").doc()

function: load-dataset
load a toy dataset from scikit-learn
default handler: load_dataset
entry points:
  load_dataset: Loads a scikit-learn toy dataset for classification or regression

The following datasets are available ('name' : desription):

    'boston'          : boston house-prices dataset (regression)
    'iris'            : iris dataset (classification)
    'diabetes'        : diabetes dataset (regression)
    'digits'          : digits dataset (classification)
    'linnerud'        : linnerud dataset (multivariate regression)
    'wine'            : wine dataset (classification)
    'breast_cancer'   : breast cancer wisconsin dataset (classification)

The scikit-learn functions return a data bunch including the following items:
- data              the features matrix
- target            the ground truth labels
- DESCR             a description of the dataset
- feature_names     header for data

The features (and their names) are stored with the target labels in a DataFrame.

### Create a local function (use code from mlrun examples)

In [4]:
!curl -o {project_dir}/handler.py https://raw.githubusercontent.com/mlrun/mlrun/master/examples/handler.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   617  100   617    0     0   3762      0 --:--:-- --:--:-- --:--:--  3739


In [5]:
# add function with build config (base image, run command)
fn = code_to_function("tstfunc", filename="handler.py", kind="job")
fn.build_config(base_image="mlrun/mlrun", commands=["pip install pandas"])
newproj.set_function(fn)
print(newproj.func("tstfunc").to_yaml())

kind: job
metadata:
  name: tstfunc
  tag: ''
  project: new-project
  categories: []
spec:
  command: ''
  args: []
  volumes: []
  volume_mounts: []
  env: []
  default_handler: ''
  description: ''
  build:
    functionSourceCode: CmRlZiBteV9mdW5jKGNvbnRleHQsIHAxOiBpbnQgPSAxLCBwMj0nYS1zdHJpbmcnKToKICAgICIiInRoaXMgaXMgYSB0d28gcGFyYW0gZnVuY3Rpb24KCiAgICA6cGFyYW0gcDEgIGZpcnN0IHBhcmFtCiAgICA6cGFyYW0gcDIgIDJuZCBwYXJhbQogICAgIiIiCiAgICAjIGFjY2VzcyBpbnB1dCBtZXRhZGF0YSwgdmFsdWVzLCBmaWxlcywgYW5kIHNlY3JldHMgKHBhc3N3b3JkcykKICAgIHByaW50KCdSdW46IHt9ICh1aWQ9e30pJy5mb3JtYXQoY29udGV4dC5uYW1lLCBjb250ZXh0LnVpZCkpCiAgICBwcmludCgnUGFyYW1zOiBwMT17fSwgcDI9e30nLmZvcm1hdChwMSwgcDIpKQogICAgY29udGV4dC5sb2dnZXIuaW5mbygncnVubmluZyBmdW5jdGlvbicpCgogICAgIyBSVU4gc29tZSB1c2VmdWwgY29kZSBlLmcuIE1MIHRyYWluaW5nLCBkYXRhIHByZXAsIGV0Yy4KCiAgICAjIGxvZyBzY2FsYXIgcmVzdWx0IHZhbHVlcyAoam9iIHJlc3VsdCBtZXRyaWNzKQogICAgY29udGV4dC5sb2dfcmVzdWx0KCdhY2N1cmFjeScsIHAxICogMikKICAgIGNvbnRleHQubG9nX3Jlc3VsdCgnbG9zcycsIHAxICogMykKICAgIG

<a id='create-workflow'></a>
### Create a workflow file and store it in the context dir

In [6]:
%%writefile {project_dir}/workflow.py
from kfp import dsl
from mlrun import mount_v3io

funcs = {}

def init_functions(functions: dict, project=None, secrets=None):
    functions['ingest'].apply(mount_v3io())

@dsl.pipeline(
    name='demo project', description='Shows how to use mlrun project.'
)
def kfpipeline(p1=3):
    # first step build the function container
    builder = funcs['tstfunc'].deploy_step(with_mlrun=False)
    
    ingest = funcs['ingest'].as_step(name='load-data', params={'dataset': 'boston'})

    # first step
    s1 = funcs['tstfunc'].as_step(name='step-one', handler='my_func',
         image=builder.outputs['image'],
         params={'p1': p1})


Overwriting /User/new-proj/workflow.py


In [7]:
newproj.set_workflow("main", "workflow.py")

In [8]:
print(newproj.to_yaml())

name: new-project
functions:
- url: hub://load_dataset
  name: ingest
- name: tstfunc
  spec:
    kind: job
    metadata:
      name: tstfunc
      tag: ''
      project: new-project
      categories: []
    spec:
      command: ''
      args: []
      env: []
      default_handler: ''
      description: ''
      build:
        functionSourceCode: CmRlZiBteV9mdW5jKGNvbnRleHQsIHAxOiBpbnQgPSAxLCBwMj0nYS1zdHJpbmcnKToKICAgICIiInRoaXMgaXMgYSB0d28gcGFyYW0gZnVuY3Rpb24KCiAgICA6cGFyYW0gcDEgIGZpcnN0IHBhcmFtCiAgICA6cGFyYW0gcDIgIDJuZCBwYXJhbQogICAgIiIiCiAgICAjIGFjY2VzcyBpbnB1dCBtZXRhZGF0YSwgdmFsdWVzLCBmaWxlcywgYW5kIHNlY3JldHMgKHBhc3N3b3JkcykKICAgIHByaW50KCdSdW46IHt9ICh1aWQ9e30pJy5mb3JtYXQoY29udGV4dC5uYW1lLCBjb250ZXh0LnVpZCkpCiAgICBwcmludCgnUGFyYW1zOiBwMT17fSwgcDI9e30nLmZvcm1hdChwMSwgcDIpKQogICAgY29udGV4dC5sb2dnZXIuaW5mbygncnVubmluZyBmdW5jdGlvbicpCgogICAgIyBSVU4gc29tZSB1c2VmdWwgY29kZSBlLmcuIE1MIHRyYWluaW5nLCBkYXRhIHByZXAsIGV0Yy4KCiAgICAjIGxvZyBzY2FsYXIgcmVzdWx0IHZhbHVlcyAoam9iIHJlc3VsdCBtZXRyaWNzKQ

<a id='git-remote'></a>
### Register and push the project to a remote Repo

In [None]:
newproj.push("master", "first push", add=["handler.py", "workflow.py"])

In [18]:
newproj.source

<a id='run-pipeline'></a>
### Run the workflow

In [9]:
newproj.run(
    "main",
    arguments={},
    artifact_path="v3io:///users/admin/mlrun/kfp/{{workflow.uid}}/",
)





[mlrun] 2020-03-30 20:26:21,411 Pipeline run id=526642b2-c595-421c-b81c-c51d7f47fb98, check UI or DB for progress


'526642b2-c595-421c-b81c-c51d7f47fb98'

**[back to top](#top)**