# RHOAI Platform Deployment

This notebook is an interactive walkthrough to deploy the full RHOAI platform stack.

## 1. Install OpenShift GitOps

**Important**: The GitOps operator must be installed directly (not via ArgoCD) since ArgoCD doesn't exist yet.

In [1]:
%%bash
# Step 1: Install the operator subscription
oc apply -k platform/gitops-operator/base/

namespace/openshift-gitops created
subscription.operators.coreos.com/openshift-gitops-operator created


In [2]:
%%bash
# Wait for the operator to install (1-2 minutes)
oc wait --for=condition=Available deployment/openshift-gitops-operator-controller-manager \
  -n openshift-operators --timeout=300s

deployment.apps/openshift-gitops-operator-controller-manager condition met


In [3]:
%%bash
# Step 2: Create the ArgoCD instance
oc apply -k platform/gitops-operator/instance/

clusterrolebinding.rbac.authorization.k8s.io/openshift-gitops-admin created




argocd.argoproj.io/openshift-gitops configured


In [4]:
%%bash
# Wait for ArgoCD server to be ready (1-2 minutes)
oc wait --for=condition=Ready pod -l app.kubernetes.io/name=openshift-gitops-server \
  -n openshift-gitops --timeout=300s

pod/openshift-gitops-server-674758b98b-f7p2l condition met


## 2. Install RHOAI

RHOAI 3.x requires dependencies to be installed first. The GitOps Application will handle this automatically.

In [15]:
%%bash
# This deploys:
# - Node Feature Discovery (NFD) operator
# - Red Hat Build for Kueue operator  
# - RHOAI operator subscription (fast-3.x channel)
# - DataScienceCluster instance
oc apply -f gitops/platform/rhoai-operator.yaml

application.argoproj.io/rhoai-operator unchanged


Wait about 5-10 minutes for the resources to install.

In [7]:
%%bash
# Verify the DataScienceCluster instance is ready
oc get dsc


No resources found


## 3. Install NVIDIA GPU Operator

**Required for GPU support.** The NVIDIA GPU Operator enables GPU workload scheduling by installing drivers, device plugins, and monitoring tools.

In [43]:
%%bash
# Deploy the NVIDIA GPU Operator
oc apply -f gitops/platform/nvidia-gpu-operator.yaml

application.argoproj.io/nvidia-gpu-operator created


Wait 2-5 minutes for the operator to install and create the ClusterPolicy.

In [None]:
%%bash
# Verify the GPU operator is installed
oc get csv -n nvidia-gpu-operator

'admin:login' logged in successfully
Context 'openshift-gitops-server-openshift-gitops.apps.cluster-5rtpd.5rtpd.sandbox3531.opentlc.com' updated
application.argoproj.io/gpu-machineset-aws-g6 created


## 4. Deploy GPU Nodes (AWS)

**Most demos require GPU nodes for model inference.** Deploy GPU infrastructure before running demos.

In [None]:
import subprocess
import json

# Get cluster name from infrastructure object
infra_result = subprocess.run(
    ["oc", "get", "infrastructure", "cluster", "-o", "json"],
    capture_output=True, text=True, check=True
)
infra_data = json.loads(infra_result.stdout)

CLUSTER_NAME = infra_data["status"]["infrastructureName"]
INFRA_ID = infra_data["status"]["infrastructureName"]
REGION = infra_data["status"]["platformStatus"]["aws"]["region"]

# Get first available worker machine to determine AZ
machines_result = subprocess.run(
    ["oc", "get", "machines", "-n", "openshift-machine-api", "-l", "machine.openshift.io/cluster-api-machine-role=worker", "-o", "json"],
    capture_output=True, text=True, check=True
)
machines_data = json.loads(machines_result.stdout)
AVAILABILITY_ZONE = machines_data["items"][0]["spec"]["providerSpec"]["value"]["placement"]["availabilityZone"]

# Get AMI ID from an existing worker MachineSet
machineset_result = subprocess.run(
    ["oc", "get", "machineset", "-n", "openshift-machine-api", "-o", "json"],
    capture_output=True, text=True, check=True
)
machineset_data = json.loads(machineset_result.stdout)
# Find a non-GPU machineset and get its AMI ID
AMI_ID = ""
for item in machineset_data["items"]:
    if "gpu" not in item["metadata"]["name"]:
        AMI_ID = item["spec"]["template"]["spec"]["providerSpec"]["value"]["ami"]["id"]
        if AMI_ID:  # Skip if empty
            break

print(f"Cluster Name: {CLUSTER_NAME}")
print(f"Region: {REGION}")
print(f"Availability Zone: {AVAILABILITY_ZONE}")
print(f"Infrastructure ID: {INFRA_ID}")
print(f"AMI ID: {AMI_ID}")

### Step 4a: Deploy GPU MachineSet

In [None]:
%%bash -s "$CLUSTER_NAME" "$REGION" "$AVAILABILITY_ZONE" "$INFRA_ID" "$AMI_ID"
# Deploy the GPU MachineSet ArgoCD Application with cluster-specific values
# For AWS g6.2xlarge (1x NVIDIA L4, 8 vCPU, 32GB RAM, ~$1.10/hr)

# Get ArgoCD admin password
ARGOCD_PASSWORD=$(oc get secret/openshift-gitops-cluster -n openshift-gitops -o jsonpath='{.data.admin\.password}' | base64 -d)
ARGOCD_SERVER=$(oc get route openshift-gitops-server -n openshift-gitops -o jsonpath='{.spec.host}')

# Login to ArgoCD CLI
argocd login $ARGOCD_SERVER --username admin --password $ARGOCD_PASSWORD --insecure

# Create the Application (without auto-sync)
oc apply -f gitops/infra/gpu-machineset-aws-g6.yaml

# Set Helm parameters with cluster values
argocd app set gpu-machineset-aws-g6 \
  -p clusterName="$1" \
  -p region="$2" \
  -p availabilityZone="$3" \
  -p infraID="$4" \
  -p amiId="$5" > /dev/null 2>&1

# Now enable auto-sync and sync
argocd app set gpu-machineset-aws-g6 --sync-policy automated --auto-prune --self-heal > /dev/null 2>&1
argocd app sync gpu-machineset-aws-g6 > /dev/null 2>&1


In [None]:
%%bash
# Wait for GPU node to be Ready (5-10 minutes)
# The GPU operator daemonsets will also deploy to this node
oc wait --for=condition=Ready nodes -l nvidia.com/gpu.present=true --timeout=600s

node/ip-10-0-29-70.us-east-2.compute.internal condition met


### Step 4b: Verify GPU Deployment

In [None]:
%%bash
# Check MachineSet was created
oc get machineset -n openshift-machine-api | grep gpu

# Check Machine is provisioning
oc get machine -n openshift-machine-api | grep gpu

# Verify GPU node is Ready and has GPU resources available
oc get nodes -l nvidia.com/gpu.present=true

# Verify GPU is allocatable (should show nvidia.com/gpu: 1)
oc get node -l nvidia.com/gpu.present=true -o json | jq '.items[0].status.allocatable'

cluster-5rtpd-6dtsz-gpu-us-east-2a      1         1         1       1           6m
cluster-5rtpd-6dtsz-gpu-us-east-2a-29q86      Running   g6.2xlarge    us-east-2   us-east-2a   6m
NAME                                       STATUS   ROLES        AGE     VERSION
ip-10-0-29-70.us-east-2.compute.internal   Ready    gpu,worker   2m45s   v1.33.6


## 5. Download and Deploy Models

**Most demos require at least one model to be deployed.** Choose and deploy the models you need to support specific demos.

### Step 5a: Create HuggingFace Token Secret (optional for some models)

In [None]:
import os
import getpass

# Get HuggingFace token from user input
# (look for input prompt (in VSCode it's above))
HF_TOKEN = getpass.getpass("Enter HuggingFace token: ")

In [None]:
%%bash -s "$HF_TOKEN"
oc create namespace demo --dry-run=client -o yaml | oc apply -f -
oc create secret generic huggingface-token \
  --from-literal=token=$1 \
  -n demo

### Step 5b: Download Model (10-30 min)

**Model Choices:**

- **`qwen3-vl-8b`** - Multimodal vision-language model (~18GB, recommended for RAG demos)
- **`granite-7b`** - IBM's open instruction model (~14GB)
- **`llama-3-8b`** - Meta's Llama 3 (~16GB, requires HuggingFace license acceptance)

Select the model you want to deploy by setting the `MODEL` variable in the next cell.


In [38]:
# Select a model to download from the choices above
MODEL = "qwen3-vl-8b"
NAMESPACE = "demo"

In [39]:
%%bash -s "$MODEL"
# Deploy the model download job via GitOps
oc apply -f gitops/platform/models/$1-pvc.yaml

application.argoproj.io/model-qwen3-vl-8b-pvc configured


In [18]:
%%bash -s "$NAMESPACE"
# Verify PVC was created and is bound
oc get pvc -n $1

NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
qwen3-vl-8b-model-storage   Bound    pvc-53ddba1f-b193-42bf-bc9a-6a60b13cfcde   18Gi       RWO            gp3-csi        <unset>                 125m


### Step 5c: Deploy Model Serving (3-5 min)

In [40]:
%%bash -s "$MODEL"
# Deploy the model serving via GitOps
oc apply -f gitops/platform/models/$1-serving.yaml

application.argoproj.io/model-qwen3-vl-8b-serving created


In [42]:
%%bash -s "$MODEL" "$NAMESPACE"
oc patch inferenceservice $1 -n $2 \
  --type=merge -p '{"metadata":{"annotations":{"serving.kserve.io/stop":"false"}}}'

inferenceservice.serving.kserve.io/qwen3-vl-8b patched


In [41]:
%%bash -s "$MODEL" "$NAMESPACE"
# Wait for ArgoCD application to be ready
oc wait --for=condition=Ready application/model-$1-serving \
  -n openshift-gitops --timeout=180s

# Wait for InferenceService to be ready (5-10 minutes for GPU node scheduling and model loading)
oc wait --for=condition=Ready inferenceservice/$1 \
  -n $2 --timeout=600s

Error from server (NotFound): applications.app.k8s.io "model-qwen3-vl-8b-serving" not found


Process was interrupted.


CalledProcessError: Command 'b'# Wait for ArgoCD application to be ready\noc wait --for=condition=Ready application/model-$1-serving \\\n  -n openshift-gitops --timeout=180s\n\n# Wait for InferenceService to be ready (5-10 minutes for GPU node scheduling and model loading)\noc wait --for=condition=Ready inferenceservice/$1 \\\n  -n $2 --timeout=600s\n'' returned non-zero exit status 1.

### Test the Model

In [None]:
%%bash -s "$MODEL" "$NAMESPACE"
# Get the inference endpoint
INFERENCE_URL=$(oc get inferenceservice $1 -n $2 -o jsonpath='{.status.url}')
echo "Inference URL: $INFERENCE_URL"

# Test the model
oc run curl-test --image=curlimages/curl -it --rm -n $2 -- \
  curl -X POST http://$1-predictor.$2.svc.cluster.local/v1/completions \
  -H "Content-Type: application/json" \
  -d "{\"model\": \"$1\", \"prompt\": \"Hello\", \"max_tokens\": 50}"