diff --git a/SETUP.md b/SETUP.md index 29e5c6b..db4ec25 100644 --- a/SETUP.md +++ b/SETUP.md @@ -10,8 +10,9 @@ defines a `mlbatch-edit` role which enforces these restrictions and will be used in the setup process for each team of MLBatch users that is onboarded. -This setup has been developed on Red Hat OpenShift 4.14 and Kubernetes 1.27 and -is intended to support Red Hat OpenShift 4.14 and up and/or Kubernetes 1.27 and up. +This setup has been developed on Red Hat OpenShift 4.14, Red Hat OpenShift 4.16, +and Kubernetes 1.29 and is intended to support Red Hat OpenShift 4.14 and up +and/or Kubernetes 1.29 and up. To start with, recursively clone and enter this repository: ```sh @@ -35,7 +36,7 @@ Instructions are provided for the following Red Hat OpenShift AI ***stable*** re + [RHOAI 2.16 Team Setup](./setup.RHOAI-v2.16/TEAM-SETUP.md) + [UPGRADING from RHOAI 2.13](./setup.RHOAI-v2.16/UPGRADE-STABLE.md) + [UPGRADING from RHOAI 2.15](./setup.RHOAI-v2.16/UPGRADE-FAST.md) - + [RHOAI 2.16 Uninstall](./setup.RHOAI-v2.15/UNINSTALL.md) + + [RHOAI 2.16 Uninstall](./setup.RHOAI-v2.16/UNINSTALL.md) + Red Hat OpenShift AI 2.13 + [RHOAI 2.13 Cluster Setup](./setup.RHOAI-v2.13/CLUSTER-SETUP.md) + [RHOAI 2.13 Team Setup](./setup.RHOAI-v2.13/TEAM-SETUP.md) @@ -44,11 +45,11 @@ Instructions are provided for the following Red Hat OpenShift AI ***stable*** re + [RHOAI 2.13 Uninstall](./setup.RHOAI-v2.13/UNINSTALL.md) Instructions are provided for the following Red Hat OpenShift AI ***fast*** releases: -+ Red Hat OpenShift AI 2.15 - + [RHOAI 2.15 Cluster Setup](./setup.RHOAI-v2.15/CLUSTER-SETUP.md) - + [RHOAI 2.15 Team Setup](./setup.RHOAI-v2.15/TEAM-SETUP.md) - + [UPGRADING from RHOAI 2.14](./setup.RHOAI-v2.15/UPGRADE.md) - + [RHOAI 2.15 Uninstall](./setup.RHOAI-v2.15/UNINSTALL.md) ++ Red Hat OpenShift AI 2.17 + + [RHOAI 2.17 Cluster Setup](./setup.RHOAI-v2.17/CLUSTER-SETUP.md) + + [RHOAI 2.17 Team Setup](./setup.RHOAI-v2.17/TEAM-SETUP.md) + + [UPGRADING from RHOAI 2.16](./setup.RHOAI-v2.17/UPGRADE.md) + + [RHOAI 2.17 Uninstall](./setup.RHOAI-v2.17/UNINSTALL.md) ## Kubernetes diff --git a/setup.RHOAI-v2.15/CLUSTER-SETUP.md b/setup.RHOAI-v2.17/CLUSTER-SETUP.md similarity index 89% rename from setup.RHOAI-v2.15/CLUSTER-SETUP.md rename to setup.RHOAI-v2.17/CLUSTER-SETUP.md index 1a4680b..3fee15f 100644 --- a/setup.RHOAI-v2.15/CLUSTER-SETUP.md +++ b/setup.RHOAI-v2.17/CLUSTER-SETUP.md @@ -7,7 +7,7 @@ cluster roles, and priority classes. Create `default-priority`, `high-priority`, and `low-priority` priority classes: ```sh -oc apply -f setup.RHOAI-v2.15/mlbatch-priorities.yaml +oc apply -f setup.RHOAI-v2.17/mlbatch-priorities.yaml ``` ## Coscheduler @@ -20,15 +20,15 @@ helm install scheduler-plugins --namespace scheduler-plugins --create-namespace ``` Patch Coscheduler pod priorities: ```sh -oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.15/coscheduler-priority-patch.yaml scheduler-plugins-controller -oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.15/coscheduler-priority-patch.yaml scheduler-plugins-scheduler +oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-controller +oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-scheduler ``` ## Red Hat OpenShift AI Create the Red Hat OpenShift AI subscription: ```sh -oc apply -f setup.RHOAI-v2.15/mlbatch-subscription.yaml +oc apply -f setup.RHOAI-v2.17/mlbatch-subscription.yaml ```` Identify install plan: ```sh @@ -45,11 +45,11 @@ oc patch ip -n redhat-ods-operator --type merge --patch '{"spec":{"approved":tru ``` Create DSC Initialization: ```sh -oc apply -f setup.RHOAI-v2.15/mlbatch-dsci.yaml +oc apply -f setup.RHOAI-v2.17/mlbatch-dsci.yaml ``` Create Data Science Cluster: ```sh -oc apply -f setup.RHOAI-v2.15/mlbatch-dsc.yaml +oc apply -f setup.RHOAI-v2.17/mlbatch-dsc.yaml ``` The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The @@ -64,6 +64,7 @@ AI configuration as follows: - `batch/job` integration is disabled, - `waitForPodsReady` is disabled, - `LendingLimit` feature gate is enabled, + - `fairSharing` is enabled, - `enableClusterQueueResources` metrics is enabled, - Codeflare operator: - the AppWrapper controller is enabled and configured as follows: @@ -79,14 +80,14 @@ AI configuration as follows: Create Kueue's default flavor: ```sh -oc apply -f setup.RHOAI-v2.15/default-flavor.yaml +oc apply -f setup.RHOAI-v2.17/default-flavor.yaml ``` ## Cluster Role Create `mlbatch-edit` role: ```sh -oc apply -f setup.RHOAI-v2.15/mlbatch-edit-role.yaml +oc apply -f setup.RHOAI-v2.17/mlbatch-edit-role.yaml ``` ## Slack Cluster Queue diff --git a/setup.RHOAI-v2.15/TEAM-SETUP.md b/setup.RHOAI-v2.17/TEAM-SETUP.md similarity index 100% rename from setup.RHOAI-v2.15/TEAM-SETUP.md rename to setup.RHOAI-v2.17/TEAM-SETUP.md diff --git a/setup.RHOAI-v2.15/UNINSTALL.md b/setup.RHOAI-v2.17/UNINSTALL.md similarity index 100% rename from setup.RHOAI-v2.15/UNINSTALL.md rename to setup.RHOAI-v2.17/UNINSTALL.md diff --git a/setup.RHOAI-v2.15/UPGRADE.md b/setup.RHOAI-v2.17/UPGRADE.md similarity index 67% rename from setup.RHOAI-v2.15/UPGRADE.md rename to setup.RHOAI-v2.17/UPGRADE.md index 1221806..c16b6a2 100644 --- a/setup.RHOAI-v2.15/UPGRADE.md +++ b/setup.RHOAI-v2.17/UPGRADE.md @@ -1,11 +1,12 @@ -# Upgrading from RHOAI 2.14 +# Upgrading from RHOAI 2.16 -These instructions assume you installed and configured RHOAI 2.14 following -the MLBatch [install instructions for RHOAI-v2.14](../setup.RHOAI-v2.14/CLUSTER-SETUP.md) +These instructions assume you installed and configured RHOAI 2.16 following +the MLBatch [install instructions for RHOAI-v2.16](../setup.RHOAI-v2.16/CLUSTER-SETUP.md) +or the [fast stream upgrade instructions for RHOAI-V2.16](../setup.RHOAI-v2.16/UPGRADE-FAST.md) and are subscribed to the fast channel. Your subscription will have automatically created an unapproved -install plan to upgrade to RHOAI 2.15. +install plan to upgrade to RHOAI 2.17. Before beginning, verify that the expected install plan exists: ```sh @@ -14,8 +15,8 @@ oc get ip -n redhat-ods-operator Typical output would be: ```sh NAME CSV APPROVAL APPROVED -install-kpzzl rhods-operator.2.15.0 Manual false -install-nqrbp rhods-operator.2.14.0 Manual true +install-kpzzl rhods-operator.2.17.0 Manual false +install-nqrbp rhods-operator.2.16.0 Manual true ``` Assuming the install plan exists you can begin the upgrade process. diff --git a/setup.RHOAI-v2.15/coscheduler-priority-patch.yaml b/setup.RHOAI-v2.17/coscheduler-priority-patch.yaml similarity index 100% rename from setup.RHOAI-v2.15/coscheduler-priority-patch.yaml rename to setup.RHOAI-v2.17/coscheduler-priority-patch.yaml diff --git a/setup.RHOAI-v2.15/default-flavor.yaml b/setup.RHOAI-v2.17/default-flavor.yaml similarity index 100% rename from setup.RHOAI-v2.15/default-flavor.yaml rename to setup.RHOAI-v2.17/default-flavor.yaml diff --git a/setup.RHOAI-v2.15/mlbatch-dsc.yaml b/setup.RHOAI-v2.17/mlbatch-dsc.yaml similarity index 100% rename from setup.RHOAI-v2.15/mlbatch-dsc.yaml rename to setup.RHOAI-v2.17/mlbatch-dsc.yaml diff --git a/setup.RHOAI-v2.15/mlbatch-dsci.yaml b/setup.RHOAI-v2.17/mlbatch-dsci.yaml similarity index 100% rename from setup.RHOAI-v2.15/mlbatch-dsci.yaml rename to setup.RHOAI-v2.17/mlbatch-dsci.yaml diff --git a/setup.RHOAI-v2.15/mlbatch-edit-role.yaml b/setup.RHOAI-v2.17/mlbatch-edit-role.yaml similarity index 100% rename from setup.RHOAI-v2.15/mlbatch-edit-role.yaml rename to setup.RHOAI-v2.17/mlbatch-edit-role.yaml diff --git a/setup.RHOAI-v2.15/mlbatch-priorities.yaml b/setup.RHOAI-v2.17/mlbatch-priorities.yaml similarity index 100% rename from setup.RHOAI-v2.15/mlbatch-priorities.yaml rename to setup.RHOAI-v2.17/mlbatch-priorities.yaml diff --git a/setup.RHOAI-v2.15/mlbatch-subscription.yaml b/setup.RHOAI-v2.17/mlbatch-subscription.yaml similarity index 95% rename from setup.RHOAI-v2.15/mlbatch-subscription.yaml rename to setup.RHOAI-v2.17/mlbatch-subscription.yaml index 8259a23..71b0c63 100644 --- a/setup.RHOAI-v2.15/mlbatch-subscription.yaml +++ b/setup.RHOAI-v2.17/mlbatch-subscription.yaml @@ -102,13 +102,9 @@ data: appwrapper: enabled: true Config: - manageJobsWithoutQueueName: true - userRBACAdmissionCheck: false - schedulerName: scheduler-plugins-scheduler - defaultQueueName: default-queue - slackQueueName: slack-cluster-queue autopilot: - injectAntiAffinities: false + injectAntiAffinities: true + monitorNodes: true resourceTaints: nvidia.com/gpu: - key: autopilot.ibm.com/gpuhealth @@ -120,6 +116,16 @@ data: - key: autopilot.ibm.com/gpuhealth value: EVICT effect: NoExecute + defaultQueueName: default-queue + enableKueueIntegrations: true + kueueJobReconciller: + manageJobsWithoutQueueName: true + waitForPodsReady: + blockAdmission: false + enable: false + schedulerName: scheduler-plugins-scheduler + slackQueueName: slack-cluster-queue + userRBACAdmissionCheck: false --- apiVersion: v1 kind: ConfigMap @@ -200,6 +206,9 @@ data: # - key: kubernetes.io/metadata.name # operator: NotIn # values: [ kube-system, kueue-system ] + fairSharing: + enable: true + preemptionStrategies: [LessThanOrEqualToFinalShare, LessThanInitialShare] manager_config_patch.yaml: | apiVersion: apps/v1 kind: Deployment @@ -266,7 +275,7 @@ spec: name: rhods-operator source: redhat-operators sourceNamespace: openshift-marketplace - startingCSV: rhods-operator.2.15.0 + startingCSV: rhods-operator.2.17.0 config: env: - name: "DISABLE_DSC_CONFIG" diff --git a/setup.tmpl/Makefile b/setup.tmpl/Makefile index 1629ad8..e6b9781 100644 --- a/setup.tmpl/Makefile +++ b/setup.tmpl/Makefile @@ -23,10 +23,10 @@ help: ## Display this help. docs: gotmpl ../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.13/CLUSTER-SETUP.md -values RHOAI-v2.13.yaml ../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.13/TEAM-SETUP.md -values RHOAI-v2.13.yaml - ../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.15/CLUSTER-SETUP.md -values RHOAI-v2.15.yaml - ../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.15/TEAM-SETUP.md -values RHOAI-v2.15.yaml ../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.16/CLUSTER-SETUP.md -values RHOAI-v2.16.yaml ../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.16/TEAM-SETUP.md -values RHOAI-v2.16.yaml + ../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.17/CLUSTER-SETUP.md -values RHOAI-v2.17.yaml + ../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.17/TEAM-SETUP.md -values RHOAI-v2.17.yaml ../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.k8s/CLUSTER-SETUP.md -values Kubernetes.yaml ../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.k8s/TEAM-SETUP.md -values Kubernetes.yaml diff --git a/setup.tmpl/RHOAI-v2.15.yaml b/setup.tmpl/RHOAI-v2.15.yaml deleted file mode 100644 index 8999636..0000000 --- a/setup.tmpl/RHOAI-v2.15.yaml +++ /dev/null @@ -1,7 +0,0 @@ -# Values for RHOAI 2.15 - -OPENSHIFT: true -VERSION: RHOAI-v2.15 -KUBECTL: oc -SLACKCQ: true -FAIRSHARE: false diff --git a/setup.tmpl/RHOAI-v2.17.yaml b/setup.tmpl/RHOAI-v2.17.yaml new file mode 100644 index 0000000..5ab765b --- /dev/null +++ b/setup.tmpl/RHOAI-v2.17.yaml @@ -0,0 +1,7 @@ +# Values for RHOAI 2.17 + +OPENSHIFT: true +VERSION: RHOAI-v2.17 +KUBECTL: oc +SLACKCQ: true +FAIRSHARE: true