[Runtimes] Support code archives, enhance run cli and add CI/CD suppo…

…rt and docs (#888)
mlrun · May 19, 2021 · a38b6af · a38b6af
1 parent 17fd21e
commit a38b6af
Show file tree

Hide file tree

Showing 20 changed files with 1,387 additions and 251 deletions.
diff --git a/docs/_static/images/git-pipeline.png b/docs/_static/images/git-pipeline.png
diff --git a/docs/ci-pipeline.md b/docs/ci-pipeline.md
@@ -0,0 +1,90 @@
+# Integrating with CI Pipelines
+
+Users may want to run their ML Pipelines using CI frameworks like Github Actions, GitLab CI/CD, etc.
+MLRun support simple and native integration with the CI systems, see the following example in which we combine 
+local code (from the repository) with MLRun marketplace functions to build an automated ML pipeline which:
+
+* runs data preparation
+* train a model
+* test the trained model
+* deploy the model into a cluster
+* test the deployed model
+
+The pipeline uses the `RunNotifications` class for reporting the tracking information into the Git dashboard (as PR comments) and/or to Slack
+, note that the same pipeline script can be executed locally (just comment out the `notifier.git_comment()` line or place it under `if` condition)
+
+```python
+# MLRun CI Example
+# ================
+# this code can run in the IDE or inside a CI/CD script (Github Actions or Gitlab CI/CD)
+# and require setting the following env vars (can be done in the CI system):
+#
+#   MLRUN_DBPATH               - url of the mlrun cluster
+#   V3IO_USERNAME              - username in the remote iguazio cluster
+#   V3IO_ACCESS_KEY            - access key to the remote iguazio cluster
+#   GIT_TOKEN or GITHUB_TOKEN  - Github/Gitlab API Token (will be set automatically in Github Actions)
+#   SLACK_WEBHOOK              - optional, Slack API key when using slack notifications
+#
+
+import json
+from mlrun.utils import RunNotifications
+import mlrun
+from mlrun.platforms import auto_mount
+
+project = "ci"
+mlrun.set_environment(project=project)
+
+# create notification object (console, Git, Slack as outputs) and push start message
+notifier = RunNotifications(with_slack=True).print()
+# use the following line only when running inside Github actions or Gitlab CI
+notifier.git_comment()
+
+notifier.push_start_message(project)
+
+# define and run a local data prep function
+data_prep_func = mlrun.code_to_function("prep-data", filename="../scratch/prep_data.py", kind="job",
+                                        image="mlrun/mlrun", handler="prep_data").apply(auto_mount())
+
+# Set the source-data URL
+source_url = 'https://s3.wasabisys.com/iguazio/data/iris/iris.data.raw.csv'
+prep_data_run = data_prep_func.run(name='prep_data', inputs={'source_url': source_url})
+
+# train the model using a library (hub://) function and the generated data
+train = mlrun.import_function('hub://sklearn_classifier').apply(auto_mount())
+train_run = train.run(name='train',
+                      inputs={'dataset': prep_data_run.outputs['cleaned_data']},
+                      params={'model_pkg_class': 'sklearn.linear_model.LogisticRegression',
+                              'label_column': 'label'})
+
+# test the model using a library (hub://) function and the generated model
+test = mlrun.import_function('hub://test_classifier').apply(auto_mount())
+test_run = test.run(name="test",
+                    params={"label_column": "label"},
+                    inputs={"models_path": train_run.outputs['model'],
+                            "test_set": train_run.outputs['test_set']})
+
+# push results via notification to Git, Slack, ..
+notifier.push_run_results([prep_data_run, train_run, test_run])
+
+# Create model serving function using the new model
+serve = mlrun.import_function('hub://v2_model_server').apply(auto_mount())
+model_name = 'iris'
+serve.add_model(model_name, model_path=train_run.outputs['model'])
+addr = serve.deploy()
+
+notifier.push(f"model {model_name} is deployed at {addr}")
+
+# test the model serving function
+inputs = [[5.1, 3.5, 1.4, 0.2],
+          [7.7, 3.8, 6.7, 2.2]]
+my_data = json.dumps({'inputs': inputs})
+serve.invoke(f'v2/models/{model_name}/infer', my_data)
+
+notifier.push(f"model {model_name} test passed Ok")
+```
+
+**The results will appear in the CI system in the following way:**
+
+<img src="./_static/images/git-pipeline.png" alt="mlrun-architecture" width="800"/><br>
+
+
diff --git a/docs/hyper-params.ipynb b/docs/hyper-params.ipynb
@@ -18,7 +18,7 @@
     "\n",
     "MLRun iterations can be viewed as child runs under the main task/run, each child run will get a set of parameters which will be computed/selected from the input hyper parameters based on the chosen strategy (Grid, List, Random or Custom).\n",
     "\n",
-    "The hyper parameters and options are specified in the `task` or the `function.run()` command through the `hyperparams` (for hyper param values) and `hyper_param_options` (for {py:class}`~mlrun.model.HyperParamOptions`) properties, see examples below. hyper parameters can also be loaded directly from a CSV or Json file (by setting the `param_file` hyper option).\n",
+    "The hyper parameters and options are specified in the `task` or the {py:meth}`~mlrun.runtimes.BaseRuntime.run` command through the `hyperparams` (for hyper param values) and `hyper_param_options` (for {py:class}`~mlrun.model.HyperParamOptions`) properties, see examples below. hyper parameters can also be loaded directly from a CSV or Json file (by setting the `param_file` hyper option).\n",
     "\n",
     "The hyper params are specified as a struct of `key: list` values for example: `{\"p1\": [1,2,3], \"p2\": [10,20]}`, the values can be of any type (int, string, float, ..), the list are used to compute the parameter combinations using one of the following strategies: \n",
     "1. Grid Search (`grid`) - running all the parameter combinations\n",

diff --git a/docs/index.rst b/docs/index.rst
@@ -59,11 +59,18 @@ Table Of Content
 
 .. toctree::
    :maxdepth: 1
-   :caption: ML Pipelines:
+   :caption: Functions and ML Pipelines:
 
-   job-submission-and-tracking
+   runtimes/functions
    hyper-params
    projects
+   ci-pipeline
+   load-from-marketplace
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Online Pipelines & Serving:
+
    serving/index
    model_monitoring/model-monitoring-deployment
 
@@ -77,17 +84,6 @@ Table Of Content
    feature-store/basic-demo
    feature-store/end-to-end-demo/index
 
-.. toctree::
-   :maxdepth: 1
-   :caption: Serverless Runtimes:
-
-   runtimes/functions
-   runtimes/mlrun_jobs
-   runtimes/dask-overview
-   runtimes/horovod
-   runtimes/spark-operator
-   load-from-marketplace
-
 .. toctree::
    :maxdepth: 1
    :caption: Artifact Management: