Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add section on node affinity #1227

Merged
merged 44 commits into from
Aug 24, 2021
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
0f4af81
New content for Model Monitoring feature. Added introduction, Archite…
jasonnIguazio Jul 4, 2021
6ff205a
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 5, 2021
ee8158c
Links to download dashboard files are working. Added and edited deplo…
jasonnIguazio Jul 5, 2021
5fa2a0d
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 8, 2021
90e1af4
Added new screen captures for Iguazio UI with explanations. Moved the…
jasonnIguazio Jul 8, 2021
055c6d1
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 11, 2021
fded57c
Merge branch 'development' of https://github.com/jasonnIguazio/mlrun …
jasonnIguazio Jul 11, 2021
3565b21
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 13, 2021
a691b9b
Merge branch 'development' of https://github.com/jasonnIguazio/mlrun …
jasonnIguazio Jul 13, 2021
9a40861
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 14, 2021
1fba425
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 14, 2021
361e5b0
Merge remote-tracking branch 'jasonorigin/Model_monitoring' into Mode…
jasonnIguazio Jul 14, 2021
5fb8141
Added Model Monitorting demo to the TOC. Split the original Model Mon…
jasonnIguazio Jul 14, 2021
f6947db
Changed definition of error count. Tightened up text in the initial s…
jasonnIguazio Jul 14, 2021
ae0104f
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 15, 2021
5fc9e98
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 15, 2021
ac00019
Edits and changes before doc review and publishing.
jasonnIguazio Jul 15, 2021
13e06a7
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 18, 2021
0f5fe10
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 18, 2021
180f8b0
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 19, 2021
ccb655f
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 19, 2021
8d18eb0
Debugged index.rst from Model Monitoring. Edited model deployment jup…
jasonnIguazio Jul 19, 2021
9b8e6c9
Merge branch 'development' of https://github.com/jasonnIguazio/mlrun …
jasonnIguazio Jul 19, 2021
ae59bf9
Corrected indentation on configuring grafana dashboards.
jasonnIguazio Jul 19, 2021
a6bae74
Corrected indentation on configuring grafana dashboards.
jasonnIguazio Jul 19, 2021
d5fd7c5
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 20, 2021
7c71f27
Merge branch 'mlrun:development' into Model_monitoring
jasonnIguazio Jul 20, 2021
919a830
Adjustments in the Architecture section of model-monitoring-deploymen…
jasonnIguazio Jul 20, 2021
64903c9
Merge remote-tracking branch 'jasonorigin/Model_monitoring' into Mode…
jasonnIguazio Jul 20, 2021
5f060ee
Merge branch 'development' of https://github.com/jasonnIguazio/mlrun …
jasonnIguazio Jul 20, 2021
b9238f6
Merge branch 'mlrun:development' into development
jasonnIguazio Jul 21, 2021
e22fe2f
Merge branch 'mlrun:development' into development
jasonnIguazio Aug 18, 2021
8019ab6
Added section called Node Affinity. Section explains assinging functi…
jasonnIguazio Aug 19, 2021
7246366
Added "Beta" to the title of the Model monitoring.
jasonnIguazio Aug 19, 2021
0abde80
Added "Beta" to the title of the Model monitoring.
jasonnIguazio Aug 19, 2021
0b52a73
Fixed formatting issues in node-affinity.md.
jasonnIguazio Aug 19, 2021
c0673d6
Gammmar corrections in node-affinity.md.
jasonnIguazio Aug 19, 2021
6b968a2
Merge branch 'mlrun:development' into node-affinity
jasonnIguazio Aug 19, 2021
896f4f4
Update docs/runtimes/node-affinity.md
jasonnIguazio Aug 23, 2021
e8fc95b
Removed references to Nuclio.
jasonnIguazio Aug 23, 2021
55ce659
Merge remote-tracking branch 'jasonorigin/node-affinity' into node-af…
jasonnIguazio Aug 23, 2021
e0d1d9a
Merge branch 'mlrun:development' into node-affinity
jasonnIguazio Aug 23, 2021
0782ab1
Added missing comma in conf.py.
jasonnIguazio Aug 23, 2021
458035a
Changes in spacing and formatting in node-affinity.md
jasonnIguazio Aug 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/mlrun_jobs_key_preemtible.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ def current_version():
"replacements",
"linkify",
"substitution",
"myst_nb"
]
myst_url_schemes = ("http", "https", "mailto")
panels_add_bootstrap_css = False
Expand Down
5 changes: 3 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ MLRun provides the following key benefits:
- **Feature management** – ingestion, preparation, and monitoring
- **Works anywhere** – your local IDE, multi-cloud, or on-prem

Table Of Content
----------------
Table Of Contents
-------------------

.. toctree::
:maxdepth: 1
Expand All @@ -67,6 +67,7 @@ Table Of Content
ci-pipeline
load-from-marketplace
secrets
runtimes/node-affinity

.. toctree::
:maxdepth: 1
Expand Down
4 changes: 2 additions & 2 deletions docs/model_monitoring/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _model_monitoring:

Model Monitoring Overview
========================
Model Monitoring Overview (Beta)
==================================

MLRun provides a model monitoring service that tracks the performance of models in production.
See the following sections for details and examples.
Expand Down
2 changes: 1 addition & 1 deletion docs/model_monitoring/initial-setup-configuration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
}
},
"source": [
"# Enable Model Monitoring\n",
"# Enable Model Monitoring (Beta)\n",
"To see tracking results, Model Monitoring needs to be enabled in each model.\n",
"\n",
"To enable Model Monitoring, include `serving_fn.set_tracking()` in the Model Server.\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/model_monitoring/model-monitoring-deployment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
}
},
"source": [
"# Model Monitoring Overview"
"# Model Monitoring Overview (Beta)"
]
},
{
Expand Down
86 changes: 86 additions & 0 deletions docs/runtimes/node-affinity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Node affinity for MLRun jobs and Nuclio functions
Node affinity can be applied to MLRun and Nuclio functions to determine on which nodes
they can be placed. The rules are defined using custom labels on nodes and label selectors.
Node affinity allows towards Spot or On Demand groups of nodes.

## On Demand vs Spot

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) Cloud.
Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster.

Using the Iguazio platform you can deploy two different kinds of EC2 instances, on-demand and spot.
On-Demand Instances provide full control over the EC2 instance lifecycle. You decide when to launch, stop, hibernate, start,
reboot, or terminate it. With spot instances you request EC2 capacity from specific availability zones and is
susceptible to spot capacity availability. This is a good choice if you can be flexible about when your applications run
and if your applications can be interrupted.

## Stateless and Stateful Applications
When deploying your MLRun jobs and Nuclio functions to specific nodes, please take into consideration that on demand
nodes are best designed to run stateful applications while spot nodes are best designed to stateless applications.
MLRun jobs and Nuclio functions which are stateful, and are assigned to run on spot nodes, may be subject to interruption
and will to be designed so that the job/function state will be saved when scaling to zero.

## Node Selector
Using the **Node Selector** you can assign MLRun jobs and Nuclio functions to specific nodes within the cluster.
**Node Selector** is available for all modes of deployment in the platform including the platform UI,
command line, and programmable interfaces.

To assign MLRun jobs and Nuclio functions to specific nodes you use the Kubernetes node label
`app.iguazio.com/lifecycle` with the values of:

* preemptible – assign to EC2 Spot instances

* non-preemptible – assign to EC2 On Demand instances

```{admonition} Note
By default Iguazio uses the key:value pair
<br>
```app.iguazio.com/lifecycle = preemptible```
<br>
or
<br>
```app.iguazio.com/lifecycle = non-preemptible```
<br>
to determine spot or on demand nodes.
```

You can use multiple labels to assign MLRun jobs or Nuclio functions to specific nodes.
However, when you use multiple labels a logical `and` is performed on the labels.

```{admonition} Note
* Do not use node specific labels as this may result in eliminating all possible nodes.
* When assigning MLRun jobs to Spot instances it is the user’s responsibility
to deal with preempting issues within the running application/function.
```

**To assign an MLRun job to a node:**
1. From the platform dashboard, press projects in the left menu pane.
2. Press on a project, and then press Jobs and Workflows.
3. Press New Job, or select a job from the list of running jobs.
4. Scroll to and open the Resources pane.
5. In the **Node Selector** section, press **+**.
<br>
<br/>
<img src="../_static/images/ml_run-job_resources_node_selector.png" width="800"/>
<br>
<br/>
6. Enter a **key:value** pair. For example:
<br>
<br/>
<img src="../_static/images/mlrun_jobs_key_non-preemtible.png" width="800"/>
<br>
<br/>
or
<br>
<br/>
<img src="../_static/images/mlrun_jobs_key_preemtible.png" width="800"/>
<br>
<br/>
When complete press **Run now** or **Schedule for later**.

**Assign an MLRun job to a node using the SDK:**
You can use node selection using the SDK by adding the key:value pairs in your Jupyter notebook.
For On demand use the following function:
```func.with_node_selection(node_selector={'app.iguazio.com/lifecycle': ' non-preemptible '})```
For Spot instances use the following function:
```func.with_node_selection(node_selector={'app.iguazio.com/lifecycle': ' preemptible '})```
jasonnIguazio marked this conversation as resolved.
Show resolved Hide resolved