Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ defines a `mlbatch-edit` role which enforces these restrictions and
will be used in the setup process for each team of MLBatch users that
is onboarded.

This setup has been developed on Red Hat OpenShift 4.14 and Kubernetes 1.27 and
is intended to support Red Hat OpenShift 4.14 and up and/or Kubernetes 1.27 and up.
This setup has been developed on Red Hat OpenShift 4.14, Red Hat OpenShift 4.16,
and Kubernetes 1.29 and is intended to support Red Hat OpenShift 4.14 and up
and/or Kubernetes 1.29 and up.

To start with, recursively clone and enter this repository:
```sh
Expand All @@ -35,7 +36,7 @@ Instructions are provided for the following Red Hat OpenShift AI ***stable*** re
+ [RHOAI 2.16 Team Setup](./setup.RHOAI-v2.16/TEAM-SETUP.md)
+ [UPGRADING from RHOAI 2.13](./setup.RHOAI-v2.16/UPGRADE-STABLE.md)
+ [UPGRADING from RHOAI 2.15](./setup.RHOAI-v2.16/UPGRADE-FAST.md)
+ [RHOAI 2.16 Uninstall](./setup.RHOAI-v2.15/UNINSTALL.md)
+ [RHOAI 2.16 Uninstall](./setup.RHOAI-v2.16/UNINSTALL.md)
+ Red Hat OpenShift AI 2.13
+ [RHOAI 2.13 Cluster Setup](./setup.RHOAI-v2.13/CLUSTER-SETUP.md)
+ [RHOAI 2.13 Team Setup](./setup.RHOAI-v2.13/TEAM-SETUP.md)
Expand All @@ -44,11 +45,11 @@ Instructions are provided for the following Red Hat OpenShift AI ***stable*** re
+ [RHOAI 2.13 Uninstall](./setup.RHOAI-v2.13/UNINSTALL.md)

Instructions are provided for the following Red Hat OpenShift AI ***fast*** releases:
+ Red Hat OpenShift AI 2.15
+ [RHOAI 2.15 Cluster Setup](./setup.RHOAI-v2.15/CLUSTER-SETUP.md)
+ [RHOAI 2.15 Team Setup](./setup.RHOAI-v2.15/TEAM-SETUP.md)
+ [UPGRADING from RHOAI 2.14](./setup.RHOAI-v2.15/UPGRADE.md)
+ [RHOAI 2.15 Uninstall](./setup.RHOAI-v2.15/UNINSTALL.md)
+ Red Hat OpenShift AI 2.17
+ [RHOAI 2.17 Cluster Setup](./setup.RHOAI-v2.17/CLUSTER-SETUP.md)
+ [RHOAI 2.17 Team Setup](./setup.RHOAI-v2.17/TEAM-SETUP.md)
+ [UPGRADING from RHOAI 2.16](./setup.RHOAI-v2.17/UPGRADE.md)
+ [RHOAI 2.17 Uninstall](./setup.RHOAI-v2.17/UNINSTALL.md)

## Kubernetes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ cluster roles, and priority classes.

Create `default-priority`, `high-priority`, and `low-priority` priority classes:
```sh
oc apply -f setup.RHOAI-v2.15/mlbatch-priorities.yaml
oc apply -f setup.RHOAI-v2.17/mlbatch-priorities.yaml
```

## Coscheduler
Expand All @@ -20,15 +20,15 @@ helm install scheduler-plugins --namespace scheduler-plugins --create-namespace
```
Patch Coscheduler pod priorities:
```sh
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.15/coscheduler-priority-patch.yaml scheduler-plugins-controller
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.15/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-controller
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
```

## Red Hat OpenShift AI

Create the Red Hat OpenShift AI subscription:
```sh
oc apply -f setup.RHOAI-v2.15/mlbatch-subscription.yaml
oc apply -f setup.RHOAI-v2.17/mlbatch-subscription.yaml
````
Identify install plan:
```sh
Expand All @@ -45,11 +45,11 @@ oc patch ip -n redhat-ods-operator --type merge --patch '{"spec":{"approved":tru
```
Create DSC Initialization:
```sh
oc apply -f setup.RHOAI-v2.15/mlbatch-dsci.yaml
oc apply -f setup.RHOAI-v2.17/mlbatch-dsci.yaml
```
Create Data Science Cluster:
```sh
oc apply -f setup.RHOAI-v2.15/mlbatch-dsc.yaml
oc apply -f setup.RHOAI-v2.17/mlbatch-dsc.yaml
```
The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift
AI managed components: `codeflare`, `kueue`, `ray`, and `trainingoperator`. The
Expand All @@ -64,6 +64,7 @@ AI configuration as follows:
- `batch/job` integration is disabled,
- `waitForPodsReady` is disabled,
- `LendingLimit` feature gate is enabled,
- `fairSharing` is enabled,
- `enableClusterQueueResources` metrics is enabled,
- Codeflare operator:
- the AppWrapper controller is enabled and configured as follows:
Expand All @@ -79,14 +80,14 @@ AI configuration as follows:

Create Kueue's default flavor:
```sh
oc apply -f setup.RHOAI-v2.15/default-flavor.yaml
oc apply -f setup.RHOAI-v2.17/default-flavor.yaml
```

## Cluster Role

Create `mlbatch-edit` role:
```sh
oc apply -f setup.RHOAI-v2.15/mlbatch-edit-role.yaml
oc apply -f setup.RHOAI-v2.17/mlbatch-edit-role.yaml
```

## Slack Cluster Queue
Expand Down
File renamed without changes.
File renamed without changes.
13 changes: 7 additions & 6 deletions setup.RHOAI-v2.15/UPGRADE.md → setup.RHOAI-v2.17/UPGRADE.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Upgrading from RHOAI 2.14
# Upgrading from RHOAI 2.16

These instructions assume you installed and configured RHOAI 2.14 following
the MLBatch [install instructions for RHOAI-v2.14](../setup.RHOAI-v2.14/CLUSTER-SETUP.md)
These instructions assume you installed and configured RHOAI 2.16 following
the MLBatch [install instructions for RHOAI-v2.16](../setup.RHOAI-v2.16/CLUSTER-SETUP.md)
or the [fast stream upgrade instructions for RHOAI-V2.16](../setup.RHOAI-v2.16/UPGRADE-FAST.md)
and are subscribed to the fast channel.

Your subscription will have automatically created an unapproved
install plan to upgrade to RHOAI 2.15.
install plan to upgrade to RHOAI 2.17.

Before beginning, verify that the expected install plan exists:
```sh
Expand All @@ -14,8 +15,8 @@ oc get ip -n redhat-ods-operator
Typical output would be:
```sh
NAME CSV APPROVAL APPROVED
install-kpzzl rhods-operator.2.15.0 Manual false
install-nqrbp rhods-operator.2.14.0 Manual true
install-kpzzl rhods-operator.2.17.0 Manual false
install-nqrbp rhods-operator.2.16.0 Manual true
```

Assuming the install plan exists you can begin the upgrade process.
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,9 @@ data:
appwrapper:
enabled: true
Config:
manageJobsWithoutQueueName: true
userRBACAdmissionCheck: false
schedulerName: scheduler-plugins-scheduler
defaultQueueName: default-queue
slackQueueName: slack-cluster-queue
autopilot:
injectAntiAffinities: false
injectAntiAffinities: true
monitorNodes: true
resourceTaints:
nvidia.com/gpu:
- key: autopilot.ibm.com/gpuhealth
Expand All @@ -120,6 +116,16 @@ data:
- key: autopilot.ibm.com/gpuhealth
value: EVICT
effect: NoExecute
defaultQueueName: default-queue
enableKueueIntegrations: true
kueueJobReconciller:
manageJobsWithoutQueueName: true
waitForPodsReady:
blockAdmission: false
enable: false
schedulerName: scheduler-plugins-scheduler
slackQueueName: slack-cluster-queue
userRBACAdmissionCheck: false
---
apiVersion: v1
kind: ConfigMap
Expand Down Expand Up @@ -200,6 +206,9 @@ data:
# - key: kubernetes.io/metadata.name
# operator: NotIn
# values: [ kube-system, kueue-system ]
fairSharing:
enable: true
preemptionStrategies: [LessThanOrEqualToFinalShare, LessThanInitialShare]
manager_config_patch.yaml: |
apiVersion: apps/v1
kind: Deployment
Expand Down Expand Up @@ -266,7 +275,7 @@ spec:
name: rhods-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
startingCSV: rhods-operator.2.15.0
startingCSV: rhods-operator.2.17.0
config:
env:
- name: "DISABLE_DSC_CONFIG"
Expand Down
4 changes: 2 additions & 2 deletions setup.tmpl/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ help: ## Display this help.
docs: gotmpl
../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.13/CLUSTER-SETUP.md -values RHOAI-v2.13.yaml
../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.13/TEAM-SETUP.md -values RHOAI-v2.13.yaml
../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.15/CLUSTER-SETUP.md -values RHOAI-v2.15.yaml
../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.15/TEAM-SETUP.md -values RHOAI-v2.15.yaml
../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.16/CLUSTER-SETUP.md -values RHOAI-v2.16.yaml
../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.16/TEAM-SETUP.md -values RHOAI-v2.16.yaml
../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.RHOAI-v2.17/CLUSTER-SETUP.md -values RHOAI-v2.17.yaml
../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.RHOAI-v2.17/TEAM-SETUP.md -values RHOAI-v2.17.yaml
../tools/gotmpl/gotmpl -input ./CLUSTER-SETUP.md.tmpl -output ../setup.k8s/CLUSTER-SETUP.md -values Kubernetes.yaml
../tools/gotmpl/gotmpl -input ./TEAM-SETUP.md.tmpl -output ../setup.k8s/TEAM-SETUP.md -values Kubernetes.yaml

Expand Down
7 changes: 0 additions & 7 deletions setup.tmpl/RHOAI-v2.15.yaml

This file was deleted.

7 changes: 7 additions & 0 deletions setup.tmpl/RHOAI-v2.17.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Values for RHOAI 2.17

OPENSHIFT: true
VERSION: RHOAI-v2.17
KUBECTL: oc
SLACKCQ: true
FAIRSHARE: true