[Docs] Refresh mlrun_jobs example (#1752)

mlrun · Feb 16, 2022 · 14ace00 · 14ace00
1 parent 448bfe0
commit 14ace00
Showing 1 changed file with 28 additions and 70 deletions.
diff --git a/docs/runtimes/mlrun_jobs.ipynb b/docs/runtimes/mlrun_jobs.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Kubernetes Jobs & Images\n",
     "\n",
-    "This topic describes running a kubernetes-based job using shared data, and building custom container images"
+    "This topic describes running a kubernetes-based job using shared data, and building custom container images."
    ]
   },
   {
@@ -36,22 +36,6 @@
     "import mlrun "
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Use the `%nuclio` magic commands to set package dependencies and configuration:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%nuclio cmd -c pip install pandas"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 3,
@@ -145,7 +129,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Do not remove this cell**_:"
+    "The following end-code annotation tells MLRun to stop parsing the notebook from this cell. _**Do not remove this cell**_:"
    ]
   },
   {
@@ -171,7 +155,7 @@
     "## Convert the Code to a Serverless Job\n",
     "\n",
     "Create a ```function``` that defines the runtime environment (type, code, image, ..) and ```run()``` a job or experiment using that function.\n",
-    "In each run you can specify the function, inputs, parameters/hyper-parameters, etc.\n",
+    "In each run, you can specify the function, inputs, parameters/hyper-parameters, etc.\n",
     "\n",
     "Use the ```job``` runtime for running container jobs, or alternatively use another distributed runner like MpiJob, Spark, Dask, and Nuclio.\n",
     "\n",
@@ -192,11 +176,11 @@
    "metadata": {},
    "source": [
     "<a id=\"build\"></a>\n",
-    "### **Define the cluster jobs and build images**\n",
+    "### **Define the cluster jobs, build images, and set dependencies**\n",
     "\n",
     "To use the function in a cluster you need to package the code and its dependencies.\n",
     "\n",
-    "The ```code_to_function``` call automatically generates a ```function``` object from the current notebook (or specified file) with its list of dependencies and runtime configuration."
+    "The ```code_to_function``` call automatically generates a ```function``` object from the current notebook (or specified file) with its list of dependencies and runtime configuration. In this example the code depends on the pandas package, so so it's specified in the  ```code_to_function``` call."
    ]
   },
   {
@@ -206,7 +190,7 @@
    "outputs": [],
    "source": [
     "# create an ML function from the notebook, attach it to iguazio data fabric (v3io)\n",
-    "trainer = mlrun.code_to_function(name='my-trainer', kind='job', image='mlrun/mlrun')"
+    "trainer = mlrun.code_to_function(name='my-trainer', kind='job', image='mlrun/mlrun', requirements=['pandas'])"
    ]
   },
   {
@@ -219,25 +203,27 @@
     "\n",
     "```mlrun``` uses _**KubeFlow**_ modifiers (apply) to configure resources. You can build your own resources or use predefined resources e.g. [AWS resources](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/aws.py).\n",
     "\n",
-    "The example above uses built-in images. When you move to production, use specific tags. For more details on built-in and custom images, see [MLRun images and external docker images](../images.html#mlrun-images-and-external-docker-images).\n"
+    "The example above uses built-in images. When you move to production, use specific tags. For more details on built-in and custom images, see [MLRun images and external docker images](./images.html#mlrun-images-and-external-docker-images).\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### _**Option 1: Using file volumes for artifacts**_\n",
-    "If you're using the [MLOps platform](https://www.iguazio.com/), use the `mount_v3io()` auto-mount modifier.<br>\n",
+    "#### **Option 1: Using file volumes for artifacts**\n",
+    "MLRun automatically applies the most common storage configuration to functions. As a result, most cases do not require any additional storage configurations before executing a function. See more details in [Applying storage configurations to functions](./function-storage.md).\n",
+    "\n",
+    "If you're using the [Iguazio MLOps platform](https://www.iguazio.com/), and want to configure manually, use the `mount_v3io()` auto-mount modifier.<br>\n",
     "If you're using another k8s PVC volume, use the `mlrun.platforms.mount_pvc(..)` modifier with the required parameters.\n",
     "\n",
-    "This example uses the `auto_mount()` modifier. It auto-selects between the k8s PVC volume and the Iguazio data fabric. You can set the PVC volume configuration with the env var below or with the auto_mount params:\n",
+    "This example uses the `auto_mount()` modifier. It auto-selects between the k8s PVC volume and the Iguazio data fabric. You can set the PVC volume configuration with the env var below or with the `auto_mount` params:\n",
     "```\n",
     "    MLRUN_PVC_MOUNT=<pvc-name>:<mount-path>\n",
     "```\n",
     "\n",
     "If you apply `mount_v3io()` or `auto_mount()` when running the function in the MLOps platform, it attaches the function to Iguazio's real-time data fabric (mounted by default to _**home**_ of the current user).\n",
     "\n",
-    "**Note**: If the notebook is not on the managed platform (it's running remotely) you may need to use secrets."
+    "**Note**: If the notebook is not on the managed platform (it's running remotely) you might need to use secrets."
    ]
   },
   {
@@ -280,40 +266,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When using AWS, you can use S3. You need a `secret` with AWS credentials. Create the AWS secret with the following command:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "`kubectl create -n <namespace> secret generic my-aws --from-literal=AWS_ACCESS_KEY_ID=<access key> --from-literal=AWS_SECRET_ACCESS_KEY=<secret key>`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To use the secret:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# from kfp.aws import use_aws_secret"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# trainer.apply(use_aws_secret(secret_name='my-aws'))\n",
-    "# out = 's3://<your-bucket-name>/jobs/{{run.uid}}'"
+    "When using AWS, you can use S3. See more details in [S3](./store/datastore.html#s3)."
    ]
   },
   {
@@ -324,7 +277,7 @@
     "\n",
     "The `deploy()` command builds a custom container image (creates a cluster build job) from the outlined function dependencies.\n",
     "\n",
-    "If a pre-built container image already exists, pass the `image` name instead. _**Note that the code and params can be updated per run without building a new image**_.\n",
+    "If a pre-built container image already exists, pass the `image` name instead. _**The code and params can be updated per run without building a new image**_.\n",
     "\n",
     "The image is stored in a container repository. By default it uses the repository configured on the MLRun API service. You can specify your own docker registry by first creating a secret, and adding that secret name to the build configuration:"
    ]
@@ -913,7 +866,8 @@
    "source": [
     "import kfp\n",
     "from kfp import dsl\n",
-    "from mlrun import run_pipeline"
+    "from mlrun import run_pipeline\n",
+    "from mlrun import run_function, deploy_function"
    ]
   },
   {
@@ -934,14 +888,15 @@
     "    :param p1: A model parameter.\n",
     "    \"\"\"\n",
     "\n",
-    "    train = trainer.as_step(handler='training',\n",
+    "    train = run_function('my-trainer',\n",
+    "                            handler='training',\n",
     "                            params={'p1': p1},\n",
     "                            outputs=['mymodel'])\n",
     "    \n",
-    "    validate = trainer.as_step(handler='validation',\n",
+    "    validate = run_function('my-trainer',\n",
+    "                               handler='validation',\n",
     "                               inputs={'model': train.outputs['mymodel']},\n",
-    "                               outputs=['validation'])\n",
-    "    "
+    "                               outputs=['validation'])    "
    ]
   },
   {
@@ -955,19 +910,22 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Pipeline results are stored at the `artifact_path` location:"
+    "Pipeline results are stored at the `artifact_path` location. The artifact path for workflows can be one of:\n",
+    "- The project's `artifact_path` (set by `project.spec.artifact_path = '<some path>'`). MLRun adds `/{{workflow.uid}}` to the path if it does not already include it.\n",
+    "- MLRun's default `artifact-path`, if set.  MLRun adds `/{{workflow.uid}}`' to the path if it does not already include it.\n",
+    "- The `artifact_path` as passed to the specific call for `run()`, as shown below. In this case, MLRun does not modify the user-provided path."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can generate a unique folder per workflow by adding ```/{{workflow.uid}}``` to the path ```mlrun```."
+    " If you want to customize the path, per workflow, use:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [