From cef96938e1ca0ceee70fa93d83ace3a5b8d138b0 Mon Sep 17 00:00:00 2001 From: Andrey Velichkevich Date: Fri, 2 Feb 2024 14:53:14 +0000 Subject: [PATCH] Katib: Update Katib Config Guide with the new APIs (#3665) * Katib: Update Katib Config Guide with new APIs * Fix a few lines * Update content/en/docs/components/katib/katib-config.md Co-authored-by: Yuki Iwai * Fix default value for waitAllProcesses --------- Co-authored-by: Yuki Iwai --- .../docs/components/katib/hyperparameter.md | 20 +- .../en/docs/components/katib/katib-config.md | 747 +++++++++--------- 2 files changed, 362 insertions(+), 405 deletions(-) diff --git a/content/en/docs/components/katib/hyperparameter.md b/content/en/docs/components/katib/hyperparameter.md index 819fe9f667..47a08d7587 100644 --- a/content/en/docs/components/katib/hyperparameter.md +++ b/content/en/docs/components/katib/hyperparameter.md @@ -43,8 +43,8 @@ Katib release (e.g. `v0.11.1`), modify `ref=master` to `ref=v0.11.1`. 1. **Basic Installation** - Run the following command to deploy Katib with the main components - (`katib-controller`, `katib-ui`, `katib-mysql`, `katib-db-manager`, and `katib-cert-generator`): + Run the following command to deploy the default Katib Control Plane components: + (`katib-controller`, `katib-ui`, `katib-mysql`, and `katib-db-manager`): ```shell kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master" @@ -73,7 +73,7 @@ Katib release (e.g. `v0.11.1`), modify `ref=master` to `ref=v0.11.1`. kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-cert-manager?ref=master" ``` - This installation uses Cert Manager instead of `katib-cert-generator` + This installation uses Cert Manager instead of Katib certificate generator to provision Katib webhooks certificates. You have to deploy Cert Manager on your Kubernetes cluster before deploying Katib using this installation. @@ -99,7 +99,7 @@ Katib release (e.g. `v0.11.1`), modify `ref=master` to `ref=v0.11.1`. kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-openshift?ref=master" ``` - This installation uses OpenShift service controller instead of `katib-cert-generator` + This installation uses OpenShift service controller instead of Katib certificate generator to provision Katib webhooks certificates. Above installations deploy @@ -123,7 +123,6 @@ Run the following command to verify that Katib components are running: $ kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE -katib-cert-generator-79g7d 0/1 Completed 0 79s katib-controller-566595bdd8-8w7sx 1/1 Running 0 82s katib-db-manager-57cd769cdb-vt7zs 1/1 Running 0 82s katib-mysql-7894994f88-djp7m 1/1 Running 0 81s @@ -133,7 +132,12 @@ katib-ui-5767cfccdc-v9fcs 1/1 Running 0 80s - `katib-controller` - the controller to manage Katib Kubernetes CRDs ([`Experiment`](/docs/components/katib/overview/#experiment), [`Suggestion`](/docs/components/katib/overview/#suggestion), - [`Trial`](/docs/components/katib/overview/#trial)) + [`Trial`](/docs/components/katib/overview/#trial)). + + - (Optional) If certificate generator is enabled in + [Katib Config](/docs/components/katib/katib-config/), Katib controller deployment will create + self-signed certificate for the Katib webhooks. Learn more about the cert generator in the + [developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md#katib-cert-generator). - `katib-ui` - the Katib user interface. @@ -141,10 +145,6 @@ katib-ui-5767cfccdc-v9fcs 1/1 Running 0 80s - `katib-mysql` - the `mysql` DB backend to store Katib experiments metrics. -- (Optional) `katib-cert-generator` - the certificate generator for Katib - standalone installation. Learn more about the cert generator in the - [developer guide](https://github.com/kubeflow/katib/blob/master/docs/developer-guide.md#katib-cert-generator) - ## Accessing the Katib UI You can use the Katib user interface (UI) to submit experiments and to monitor diff --git a/content/en/docs/components/katib/katib-config.md b/content/en/docs/components/katib/katib-config.md index 61859bf8aa..e678ae8630 100644 --- a/content/en/docs/components/katib/katib-config.md +++ b/content/en/docs/components/katib/katib-config.md @@ -6,436 +6,393 @@ weight = 70 +++ This guide describes -[Katib config](https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/components/controller/katib-config.yaml) — -the Kubernetes -[Config Map](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) that contains information about: +[the Katib Config](https://github.com/kubeflow/katib/blob/19268062f1b187dde48114628e527a2a35b01d64/manifests/v1beta1/installs/katib-standalone/katib-config.yaml) — +the main configuration file for every Katib component. We use Kubernetes +[ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) to +fetch that config into [the Katib control plane components](/docs/components/katib/hyperparameter/#katib-install). -1. Current - [metrics collectors](/docs/components/katib/experiment/#metrics-collector) - (`key = metrics-collector-sidecar`). - -1. Current - [algorithms](/docs/components/katib/experiment/#search-algorithms-in-detail) - (suggestions) (`key = suggestion`). - -1. Current - [early stopping algorithms](/docs/components/katib/early-stopping/#early-stopping-algorithms-in-detail) - (`key = early-stopping`). - -The Katib Config Map must be deployed in the +The ConfigMap must be deployed in the [`KATIB_CORE_NAMESPACE`](/docs/components/katib/env-variables/#katib-controller) -namespace with the `katib-config` name. The Katib controller parses the Katib config when -you submit your experiment. +namespace with the `katib-config` name. -You can edit this Config Map even after deploying Katib. - -If you are deploying Katib in the Kubeflow namespace, run this command to edit your Katib config: +Katib config has the initialization: `init` and the runtime: `runtime` parameters. You can modify +these parameters by editing the `katib-config` ConfigMap: ```shell kubectl edit configMap katib-config -n kubeflow ``` -## Metrics Collector Sidecar settings - -These settings are related to Katib metrics collectors, where: - -- key: `metrics-collector-sidecar` -- value: corresponding JSON settings for each metrics collector kind - -Example for the `File` metrics collector with all settings: - -```json -metrics-collector-sidecar: |- -{ - "File": { - "image": "docker.io/kubeflowkatib/file-metrics-collector", - "imagePullPolicy": "Always", - "resources": { - "requests": { - "memory": "200Mi", - "cpu": "250m", - "ephemeral-storage": "200Mi" - }, - "limits": { - "memory": "1Gi", - "cpu": "500m", - "ephemeral-storage": "2Gi" - } - }, - "waitAllProcesses": false - }, - ... -} +## Initialization Parameters + +Katib Config parameters set in `init` represent initialization settings for +the Katib control plane. These parameters can be modified before Katib control plane is deployed. + +```yaml +apiVersion: config.kubeflow.org/v1beta1 +kind: KatibConfig +init: + certGenerator: + enable: true + ... + controller: + trialResources: + - Job.v1.batch + - TFJob.v1.kubeflow.org + ... ``` -All of these settings except **`image`** can be omitted. If you don't specify any other settings, -a default value is set automatically. - -1. `image` - a Docker image for the `File` metrics collector's container (**must be specified**). - -1. `imagePullPolicy` - an [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images) - for the `File` metrics collector's container. - - The default value is `IfNotPresent` - -1. `resources` - [resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) - for the `File` metrics collector's container. In the above example you - can check how to specify `limits` and `requests`. Currently, you can specify - only `memory`, `cpu` and `ephemeral-storage` resources. - - The default values for the `requests` are: - - - `memory = 10Mi` - - `cpu = 50m` - - `ephemeral-storage = 500Mi` - - The default values for the `limits` are: - - - `memory = 100Mi` - - `cpu = 500m` - - `ephemeral-storage = 5Gi` - - You can run your metrics collector's container without requesting - the `cpu`, `memory`, or `ephemeral-storage` resource from the Kubernetes cluster. - For instance, you have to remove `ephemeral-storage` from the container resources to use the - [Google Kubernetes Engine cluster autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#limitations). - - To remove specific resources from the metrics collector's container set the - negative values in requests and limits in your Katib config as follows: - - ```json - "requests": { - "cpu": "-1", - "memory": "-1", - "ephemeral-storage": "-1" - }, - "limits": { - "cpu": "-1", - "memory": "-1", - "ephemeral-storage": "-1" - } - ``` - -1. `waitAllProcesses` - a flag to define whether the metrics collector should - wait until all processes in the training container are finished before start - to collect metrics. - - The default value is `true` - -## Suggestion settings - -These settings are related to Katib suggestions, where: - -- key: `suggestion` -- value: corresponding JSON settings for each algorithm name - -If you want to use a new algorithm, you need to update the Katib config. For example, -using a `random` algorithm with all settings looks as follows: - -```json -suggestion: |- -{ - "random": { - "image": "docker.io/kubeflowkatib/suggestion-hyperopt", - "imagePullPolicy": "Always", - "resources": { - "requests": { - "memory": "100Mi", - "cpu": "100m", - "ephemeral-storage": "100Mi" - }, - "limits": { - "memory": "500Mi", - "cpu": "500m", - "ephemeral-storage": "3Gi" - } - }, - "serviceAccountName": "random-sa" - }, - ... -} +It has settings for the following Katib components: + +1. Katib certificate generator: `certGenerator` + +1. Katib controller: `controller` + +### Katib Certificate Generator Parameters + +The following parameters set in `.init.certGenerator` configure the Katib certificate generator: + +- `enable` - whether to enable Katib certificate generator. + + The default value is `false` + +- `webhookServiceName` - a service name for the Katib webhooks. If it is set, Katib certificate + generator is forcefully enabled. + + The default value is `katib-controller` + +- `webhookSecretName` - a secret name to store Katib webhooks certificates. If it is set, Katib + certificate generator is forcefully enabled. + + The default value is `katib-webhook-cert` + +### Katib Controller Parameters + +The following parameters set in `.init.controller` configure the Katib controller: + +- `experimentSuggestionName` - the implementation of Suggestion interface for + Experiment controller. + + The default value is `default` + +- `metricsAddr` - a TCP address that the Katib controller should bind to + for serving prometheus metrics. + + The default value is `8080` + +- `healthzAddr` - a TCP address that the Katib controller should bind to + for health probes. + + The default value is `18080` + +- `injectSecurityContext` - whether to inject security context to Katib metrics collector sidecar + container from Katib Trial training container. + + The default value is `false` + +- `trialResources` - list of resources that can be used as a Trial template. The Trial resources + must be in this format: Kind.version.group (e.g. `TFJob.v1.kubeflow.org`). + Follow [this guide](/docs/components/katib/trial-template/#use-custom-kubernetes-resource-as-a-trial-template) + to understand how to make Katib Trial work with your Kubernetes CRDs. + + The default value is `[Job.v1.batch]` + +- `webhookPort` - a port number for Katib admission webhooks. + + The default value is `8443` + +- `enableLeaderElection` - whether to enable leader election for Katib controller. If this value + is true only single Katib controller Pod is active. + + The default value is `false` + +- `leaderElectionID` - an ID for the Katib controller leader election. + + The default value is `3fbc96e9.katib.kubeflow.org` + +## Runtime Parameters + +Katib Config parameters set in `runtime` represent runtime settings for +the Katib Experiment. These parameters can be modified before Katib Experiment is created. When +Katib Experiment is created Katib controller fetches the latest configuration from the +`katib-config` ConfigMap. + +```yaml +apiVersion: config.kubeflow.org/v1beta1 +kind: KatibConfig +runtime: + metricsCollectors: + - kind: StdOut + image: docker.io/kubeflowkatib/file-metrics-collector:latest + ... + suggestions: + - algorithmName: random + image: docker.io/kubeflowkatib/suggestion-hyperopt:latest + ... + earlyStoppings: + - algorithmName: medianstop + image: docker.io/kubeflowkatib/earlystopping-medianstop:latest + ... ``` -All of these settings except **`image`** can be omitted. If you don't specify -any other settings, a default value is set automatically. - -1. `image` - a Docker image for the suggestion's container with a `random` - algorithm (**must be specified**). - - Image example: `docker.io/kubeflowkatib/` - - For each algorithm (suggestion) you can specify one of the following - suggestion names in the Docker image: - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Suggestion nameList of supported algorithmsDescription
suggestion-hyperoptrandom, tpeHyperopt optimization framework
suggestion-skoptbayesianoptimizationScikit-optimize optimization framework
suggestion-goptunacmaes, random, tpe, sobolGoptuna optimization framework
suggestion-optunamultivariate-tpe, tpe, cmaes, random, gridOptuna optimization framework
suggestion-hyperbandhyperbandKatib - Hyperband implementation
suggestion-pbtpbtKatib - PBT implementation
suggestion-enasenasKatib - ENAS implementation
suggestion-dartsdartsKatib - DARTS implementation
-
- -1. `imagePullPolicy` - an [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images) - for the suggestion's container with a `random` algorithm. - - The default value is `IfNotPresent` - -1. `resources` - [resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) - for the suggestion's container with a `random` algorithm. - In the above example you can check how to specify `limits` and `requests`. - Currently, you can specify only `memory`, `cpu` and - `ephemeral-storage` resources. - - The default values for the `requests` are: - - - `memory = 10Mi` - - `cpu = 50m` - - `ephemeral-storage = 500Mi` - - The default values for the `limits` are: - - - `memory = 100Mi` - - `cpu = 500m` - - `ephemeral-storage = 5Gi` - - You can run your suggestion's container without requesting - the `cpu`, `memory`, or `ephemeral-storage` resource from the Kubernetes cluster. - For instance, you have to remove `ephemeral-storage` from the container resources to use the - [Google Kubernetes Engine cluster autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#limitations). - - To remove specific resources from the suggestion's container set the - negative values in requests and limits in your Katib config as follows: - - ```json - "requests": { - "cpu": "-1", - "memory": "-1", - "ephemeral-storage": "-1" - }, - "limits": { - "cpu": "-1", - "memory": "-1", - "ephemeral-storage": "-1" - } - ``` - -1. `serviceAccountName` - a [service account](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) - for the suggestion's container with a `random` algorithm. - - In the above example, the `random-sa` service account is attached for each - experiment's suggestion with a `random` algorithm until you change or delete - this service account from the Katib config. - - By default, the suggestion pod doesn't have any specific service account, - in which case, the pod uses the - [default](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server) - service account. - - **Note:** If you want to run your experiments with - [early stopping](/docs/components/katib/early-stopping/), - the suggestion's deployment must have permission to update the experiment's - trial status. If you don't specify a service account in the Katib config, - Katib controller creates required - [Kubernetes Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac) - for the suggestion. - - If you need your own service account for the experiment's - suggestion with early stopping, you have to follow the rules: - - - The service account name can't be equal to - `-` - - - The service account must have sufficient permissions to update - the experiment's trial status. - -### Suggestion volume settings - -When you create an experiment with +### Metrics Collectors Parameters + +Parameters set in `.runtime.metricsCollectors` configure container for +[the Katib metrics collector](docs/components/katib/experiment/#metrics-collector). +The following settings are **required** for each Katib metrics collector that you want to use in your Katib Experiments: + +- `kind` - one of the Katib metrics collector types. + +- `image` - a Docker image for the metrics collector's container. + +The following settings are **optional**: + +- `imagePullPolicy` - an [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images) + for the metrics collector's container. + + The default value is `IfNotPresent` + +- `resources` - [resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) + for the metrics collector's container. + + The default values for the `resources` are: + + ```yaml + metricsCollectors: + - kind: StdOut + image: docker.io/kubeflowkatib/file-metrics-collector:latest + resources: + requests: + cpu: 50m + memory: 10Mi + ephemeral-storage: 500Mi + limits: + cpu: 500m + memory: 100Mi + ephemeral-storage: 5Gi + ``` + + You can run your metrics collector's container without requesting + the `cpu`, `memory`, or `ephemeral-storage` resource from the Kubernetes cluster. + For instance, you have to remove `ephemeral-storage` from the container resources to use the + [Google Kubernetes Engine cluster autoscaler](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#limitations). + + To remove specific resources from the metrics collector's container set the + negative values in requests and limits in your Katib config as follows: + + ```yaml + resources: + requests: + cpu: -1 + memory: -1 + ephemeral-storage: -1 + limits: + cpu: -1 + memory: -1 + ephemeral-storage: -1 + ``` + +- `waitAllProcesses` - a flag to define whether the metrics collector should wait until all + processes in the Trial's training container are finished before start to collect metrics. + + The default value is `false` + +### Suggestions Parameters + +Parameters set in `.runtime.suggestions` configure Deployment for +[the Katib Suggestions](/docs/components/katib/overview/#suggestion). Every Suggestion represents +one of the AutoML algorithms that you can use in Katib Experiments. +The following settings are **required** for Suggestion Deployment: + +- `algorithmName` - one of the Katib algorithm names. For example: `tpe` + +- `image` - a Docker image for the Suggestion Deployment's container. Image + example: `docker.io/kubeflowkatib/` + + For each algorithm you can specify one of the following suggestion names in the Docker image: + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Suggestion nameList of supported algorithmsDescription
suggestion-hyperoptrandom, tpeHyperopt optimization framework
suggestion-skoptbayesianoptimizationScikit-optimize optimization framework
suggestion-goptunacmaes, random, tpe, sobolGoptuna optimization framework
suggestion-optunamultivariate-tpe, tpe, cmaes, random, gridOptuna optimization framework
suggestion-hyperbandhyperbandKatib + Hyperband implementation
suggestion-pbtpbtKatib + PBT implementation
suggestion-enasenasKatib + ENAS implementation
suggestion-dartsdartsKatib + DARTS implementation
+
+ +The following settings are **optional**: + +- `` - you can specify all + [container parameters](https://github.com/kubernetes/api/blob/669e693933c77e91648f8602dc2555d96e6279ad/core/v1/types.go#L2608) + inline for your Suggestion Deployment. For example, `resources` for container resources or + `env` for container environment variables. + + Configuration for `resources` works the same as for Katib metrics collector's container `resources`. + +- `serviceAccountName` - a [ServiceAccount](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) + for the Suggestion Deployment. + + By default, the Suggestion Pod doesn't have any specific ServiceAccount, + in which case, the Pod uses the + [default](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server) + service account. + + **Note:** If you want to run your Experiments with + [early stopping](/docs/components/katib/early-stopping/), + the Suggestion's Deployment must have permission to update the Experiment's + Trial status. If you don't specify a ServiceAccount in the Katib config, + Katib controller creates required + [Kubernetes Role-based access control](https://kubernetes.io/docs/reference/access-authn-authz/rbac) + for the suggestion. + + If you need your own ServiceAccount for the experiment's + suggestion with early stopping, you have to follow the rules: + + - The ServiceAccount name can't be equal to + `-` + + - The ServiceAccount must have sufficient permissions to update + the experiment's trial status. + +#### Suggestion Volume Parameters + +When you create an Experiment with [`FromVolume` resume policy](/docs/components/katib/resume-experiment#resume-policy-fromvolume), you are able to specify [PersistentVolume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes) and [PersistentVolumeClaim (PVC)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) -settings for the experiment's suggestion. Learn more about Katib concepts -in the [overview guide](/docs/components/katib/overview/#suggestion). +settings for the Experiment's Suggestion to restore stage of the AutoML algorithm. If PV settings are empty, Katib controller creates only PVC. -If you want to use the default volume specification, you can omit these settings. - -Follow the example for the `random` algorithm: - -```json -suggestion: |- -{ - "random": { - "image": "docker.io/kubeflowkatib/suggestion-hyperopt", - "volumeMountPath": "/opt/suggestion/data", - "persistentVolumeClaimSpec": { - "accessModes": [ - "ReadWriteMany" - ], - "resources": { - "requests": { - "storage": "3Gi" - } - }, - "storageClassName": "katib-suggestion" - }, - "persistentVolumeSpec": { - "accessModes": [ - "ReadWriteMany" - ], - "capacity": { - "storage": "3Gi" - }, - "hostPath": { - "path": "/tmp/suggestion/unique/path" - }, - "storageClassName": "katib-suggestion" - }, - "persistentVolumeLabels": { - "type": "local" - } - }, - ... -} +If you want to use the default volume specification, you can omit these parameters. + +For example, Suggestion volume config for `random` algorithm: + +```yaml +suggestions: + - algorithmName: random + image: docker.io/kubeflowkatib/suggestion-hyperopt:latest + volumeMountPath: /opt/suggestion/data + persistentVolumeClaimSpec: + accessModes: + - ReadWriteMany + resources: + requests: + storage: 3Gi + storageClassName: katib-suggestion + persistentVolumeSpec: + accessModes: + - ReadWriteMany + capacity: + storage: 3Gi + hostPath: + path: /tmp/suggestion/unique/path + storageClassName: katib-suggestion + persistentVolumeLabels: + type: local ``` -1. `volumeMountPath` - a [mount path](https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/#configure-a-volume-for-a-pod) - for the suggestion's container with `random` algorithm. +- `volumeMountPath` - a [mount path](https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/#configure-a-volume-for-a-pod) + for the Suggestion Deployment's container. - The default value is `/opt/katib/data` + The default value is `/opt/katib/data` -1. `persistentVolumeClaimSpec` - a [PVC specification](https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#persistentvolumeclaimspec-v1-core) - for the suggestion's PVC. +- `persistentVolumeClaimSpec` - a [PVC specification](https://github.com/kubernetes/api/blob/669e693933c77e91648f8602dc2555d96e6279ad/core/v1/types.go#L487) + for the Suggestion Deployment's PVC. - The default value is set, if you don't specify any of these settings: + The default value is: - - `persistentVolumeClaimSpec.accessModes[0]` - the default value is - [`ReadWriteOnce`](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) + ```yaml + persistentVolumeClaimSpec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 1Gi + ``` - - `persistentVolumeClaimSpec.resources.requests.storage` - the default value - is `1Gi` +- `persistentVolumeSpec` - a [PV specification](https://github.com/kubernetes/api/blob/669e693933c77e91648f8602dc2555d96e6279ad/core/v1/types.go#L324) + for the Suggestion Deployment's PV. -1. `persistentVolumeSpec` - a [PV specification](https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#persistentvolumespec-v1-core) - for the suggestion's PV. + Suggestion Deployment's PV always has **`persistentVolumeReclaimPolicy: Delete`** to properly + remove all resources once Katib experiment is deleted. To know more about PV reclaim policies + check the + [Kubernetes documentation](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming). - PV `persistentVolumeReclaimPolicy` is always equal to **`Delete`** to properly - remove all resources once Katib experiment is deleted. To know more about - PV reclaim policies check the - [Kubernetes documentation](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming). +- `persistentVolumeLabels` - [PV labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) + for the Suggestion Deployment's PV. -1. `persistentVolumeLabels` - [PV labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) - for the suggestion's PV. +### Early Stoppings Parameters -## Early stopping settings +Parameters set in `runtime.earlyStoppings` configure container for +[the Katib Early Stopping algorithms](/docs/components/katib/early-stopping/#early-stopping-algorithms-in-detail). +The following settings are **required** for each early stopping algorithm that you want +to use in your Katib Experiments: -These settings are related to Katib early stopping, where: +- `algorithmName` - one of the early stopping algorithm names (e.g. `medianstop`). -- key: `early-stopping` -- value: corresponding JSON settings for each early stopping algorithm name +- `image` - a Docker image for the early stopping container. -If you want to use a new early stopping algorithm, you need to update the -Katib config. For example, using a `medianstop` early stopping algorithm with -all settings looks as follows: +The following settings are **optional**: -```json -early-stopping: |- -{ - "medianstop": { - "image": "docker.io/kubeflowkatib/earlystopping-medianstop", - "imagePullPolicy": "Always" - }, - ... -} -``` +- `imagePullPolicy` - an [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images) + for the early stopping's container. + + The default value is `IfNotPresent` + +- `resources` - [resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container) + for the early stopping's container. -All of these settings except **`image`** can be omitted. If you don't specify -any other settings, a default value is set automatically. - -1. `image` - a Docker image for the early stopping's container with a - `medianstop` algorithm (**must be specified**). - - Image example: `docker.io/kubeflowkatib/` - - For each early stopping algorithm you can specify one of the following - early stopping names in the Docker image: - -
- - - - - - - - - - - - - - - -
Early stopping nameEarly stopping algorithmDescription
earlystopping-medianstopmedianstopKatib - Median Stopping implementation
-
- -1. `imagePullPolicy` - an - [image pull policy](https://kubernetes.io/docs/concepts/configuration/overview/#container-images) - for the early stopping's container with a `medianstop` algorithm. - - The default value is `IfNotPresent` + Configuration for `resources` works the same as for Katib metrics collector's container `resources`. ## Next steps