From cec81c5bce10d14eac35627ac14305848475ba63 Mon Sep 17 00:00:00 2001
From: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Date: Thu, 31 Aug 2023 17:58:13 -0700
Subject: [PATCH] [Doc] [KubeRay] Add two kuberay tutorials (Cherry-pick of
 #38858 and #38857) (#39186)

* [Doc] [KubeRay] Add tutorial for connecting to google cloud storage bucket from GKE RayCluster (#38858)

This PR adds a self contained tutorial for connecting to a google cloud storage bucket. (Mostly self contained, we do link out to the google cloud docs for creating a bucket.)

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>

* [Doc] [KubeRay] Add end-to-end tutorial for real-world RayJob workload (batch inference) (#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>

---------

Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/_toc.yml                           |   2 +
 doc/source/cluster/kubernetes/examples.md     |   1 +
 .../rayjob-batch-inference-example.md         | 159 ++++++++++++++++++
 .../getting-started/rayjob-quick-start.md     |   2 +-
 doc/source/cluster/kubernetes/user-guides.md  |   1 +
 .../user-guides/gcp-gke-gpu-cluster.md        |  45 ++---
 .../kubernetes/user-guides/gke-gcs-bucket.md  | 141 ++++++++++++++++
 doc/source/ray-overview/examples.rst          |   7 +
 8 files changed, 323 insertions(+), 35 deletions(-)
 create mode 100644 doc/source/cluster/kubernetes/examples/rayjob-batch-inference-example.md
 create mode 100644 doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md

diff --git a/doc/source/_toc.yml b/doc/source/_toc.yml
index 74f0f47103d27..f9415c37349f4 100644
--- a/doc/source/_toc.yml
+++ b/doc/source/_toc.yml
@@ -297,6 +297,7 @@ parts:
                       - file: cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md
                   - file: cluster/kubernetes/user-guides/config.md
                   - file: cluster/kubernetes/user-guides/configuring-autoscaling.md
+                  - file: cluster/kubernetes/user-guides/gke-gcs-bucket.md
                   - file: cluster/kubernetes/user-guides/logging.md
                   - file: cluster/kubernetes/user-guides/gpu.md
                   - file: cluster/kubernetes/user-guides/rayserve-dev-doc.md
@@ -312,6 +313,7 @@ parts:
                   - file: cluster/kubernetes/examples/stable-diffusion-rayservice.md
                   - file: cluster/kubernetes/examples/mobilenet-rayservice.md
                   - file: cluster/kubernetes/examples/text-summarizer-rayservice.md
+                  - file: cluster/kubernetes/examples/rayjob-batch-inference-example.md
               - file: cluster/kubernetes/k8s-ecosystem
                 sections:
                   - file: cluster/kubernetes/k8s-ecosystem/ingress.md
diff --git a/doc/source/cluster/kubernetes/examples.md b/doc/source/cluster/kubernetes/examples.md
index b6158a2e1f824..cf37b9ca96238 100644
--- a/doc/source/cluster/kubernetes/examples.md
+++ b/doc/source/cluster/kubernetes/examples.md
@@ -9,3 +9,4 @@ This section presents example Ray workloads to try out on your Kubernetes cluste
 - {ref}`kuberay-mobilenet-rayservice-example` (CPU-only)
 - {ref}`kuberay-stable-diffusion-rayservice-example`
 - {ref}`kuberay-text-summarizer-rayservice-example`
+- {ref}`kuberay-batch-inference-example`
diff --git a/doc/source/cluster/kubernetes/examples/rayjob-batch-inference-example.md b/doc/source/cluster/kubernetes/examples/rayjob-batch-inference-example.md
new file mode 100644
index 0000000000000..91963e2e95c2e
--- /dev/null
+++ b/doc/source/cluster/kubernetes/examples/rayjob-batch-inference-example.md
@@ -0,0 +1,159 @@
+(kuberay-batch-inference-example)=
+
+# RayJob Batch Inference Example
+
+This example demonstrates how to use the RayJob custom resource to run a batch inference job on a Ray cluster.
+
+This example uses an image classification workload, which is based on <https://docs.ray.io/en/latest/data/examples/huggingface_vit_batch_prediction.html>. See that page for a full explanation of the code.
+
+## Prerequisites
+
+You must have a Kubernetes cluster running,`kubectl` configured to use it, and GPUs available. This example provides a brief tutorial for setting up the necessary GPUs on Google Kubernetes Engine (GKE), but you can use any Kubernetes cluster with GPUs.
+
+## Step 0: Create a Kubernetes cluster on GKE (Optional)
+
+If you already have a Kubernetes cluster with GPUs, you can skip this step.
+
+
+Otherwise, follow [this tutorial](kuberay-gke-gpu-cluster-setup), but substitute the following GPU node pool creation command to create a Kubernetes cluster on GKE with four Nvidia T4 GPUs:
+
+```sh
+gcloud container node-pools create gpu-node-pool \
+  --accelerator type=nvidia-tesla-t4,count=4,gpu-driver-version=default \
+  --zone us-west1-b \
+  --cluster kuberay-gpu-cluster \
+  --num-nodes 1 \
+  --min-nodes 0 \
+  --max-nodes 1 \
+  --enable-autoscaling \
+  --machine-type n1-standard-64
+```
+
+This example uses four [Nvidia T4](https://cloud.google.com/compute/docs/gpus#nvidia_t4_gpus) GPUs. The machine type is `n1-standard-64`, which has [64 vCPUs and 240 GB RAM](https://cloud.google.com/compute/docs/general-purpose-machines#n1_machine_types).
+
+## Step 1: Install the KubeRay Operator
+
+Follow [this document](kuberay-operator-deploy) to install the latest stable KubeRay operator from the Helm repository.
+
+It should be scheduled on the CPU pod.
+
+## Step 2: Submit the RayJob
+
+Create the RayJob custom resource. The RayJob spec is defined in [ray-job.batch-inference.yaml](https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray-job.batch-inference.yaml).
+
+Download the file with `curl`:
+
+```bash
+curl -LO https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-job.batch-inference.yaml
+```
+
+Note that the `RayJob` spec contains a spec for the `RayCluster` that is to be created for the job. For this tutorial, we use a single-node cluster with 4 GPUs.  For production use cases, we recommend using a multi-node cluster where the head node does not have GPUs, so that Ray can automatically schedule GPU workloads on worker nodes and they won't interfere with critical Ray processes on the head node.
+
+Note the following fields in the `RayJob` spec, which specify the Ray image and the GPU resources for the Ray node:
+
+```yaml
+        spec:
+          containers:
+            - name: ray-head
+              image: rayproject/ray-ml:2.6.3-gpu
+              resources:
+                limits:
+                  nvidia.com/gpu: "4"
+                  cpu: "54"
+                  memory: "54Gi"
+                requests:
+                  nvidia.com/gpu: "4"
+                  cpu: "54"
+                  memory: "54Gi"
+              volumeMounts:
+                - mountPath: /home/ray/samples
+                  name: code-sample
+          nodeSelector:
+            cloud.google.com/gke-accelerator: nvidia-tesla-t4 # This is the GPU type we used in the GPU node pool.
+```
+
+To submit the job, run the following command:
+
+```bash
+kubectl apply -f ray-job.batch-inference.yaml
+```
+
+Check the status with `kubectl describe rayjob rayjob-sample`.
+
+Sample output:
+
+```
+[...]
+Status:
+  Dashboard URL:          rayjob-sample-raycluster-j6t8n-head-svc.default.svc.cluster.local:8265
+  End Time:               2023-08-22T22:48:35Z
+  Job Deployment Status:  Running
+  Job Id:                 rayjob-sample-ft8lh
+  Job Status:             SUCCEEDED
+  Message:                Job finished successfully.
+  Observed Generation:    2
+  Ray Cluster Name:       rayjob-sample-raycluster-j6t8n
+  Ray Cluster Status:
+    Endpoints:
+      Client:        10001
+      Dashboard:     8265
+      Gcs - Server:  6379
+      Metrics:       8080
+    Head:
+      Pod IP:             10.112.1.3
+      Service IP:         10.116.1.93
+    Last Update Time:     2023-08-22T22:47:44Z
+    Observed Generation:  1
+    State:                ready
+  Start Time:             2023-08-22T22:48:02Z
+Events:
+  Type    Reason   Age   From               Message
+  ----    ------   ----  ----               -------
+  Normal  Created  36m   rayjob-controller  Created cluster rayjob-sample-raycluster-j6t8n
+  Normal  Created  32m   rayjob-controller  Created k8s job rayjob-sample
+```
+
+To view the logs, first find the name of the pod running the job with `kubectl get pods`.
+
+Sample output:
+
+```bash
+NAME                                        READY   STATUS      RESTARTS   AGE
+kuberay-operator-8b86754c-r4rc2             1/1     Running     0          25h
+rayjob-sample-raycluster-j6t8n-head-kx2gz   1/1     Running     0          35m
+rayjob-sample-w98c7                         0/1     Completed   0          30m
+```
+
+The Ray cluster is still running because `shutdownAfterJobFinishes` is not set in the `RayJob` spec. If you set `shutdownAfterJobFinishes` to `true`, the cluster is shut down after the job finishes.
+
+Next, run:
+
+```text
+kubetcl logs rayjob-sample-w98c7
+```
+
+to get the standard output of the `entrypoint` command for the `RayJob`.  Sample output:
+
+```text
+[...]
+Running: 62.0/64.0 CPU, 4.0/4.0 GPU, 955.57 MiB/12.83 GiB object_store_memory:   0%|          | 0/200 [00:05<?, ?it/s]
+Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 999.41 MiB/12.83 GiB object_store_memory:   0%|          | 0/200 [00:05<?, ?it/s]
+Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 999.41 MiB/12.83 GiB object_store_memory:   0%|          | 1/200 [00:05<17:04,  5.15s/it]
+Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 1008.68 MiB/12.83 GiB object_store_memory:   0%|          | 1/200 [00:05<17:04,  5.15s/it]
+Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 1008.68 MiB/12.83 GiB object_store_memory: 100%|██████████| 1/1 [00:05<00:00,  5.15s/it]  
+                                                                                                                             
+2023-08-22 15:48:33,905 WARNING actor_pool_map_operator.py:267 -- To ensure full parallelization across an actor pool of size 4, the specified batch size should be at most 5. Your configured batch size for this operator was 16.
+<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546CF7F0>
+Label:  tench, Tinca tinca
+<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546AE430>
+Label:  tench, Tinca tinca
+<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546CF430>
+Label:  tench, Tinca tinca
+<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546AE430>
+Label:  tench, Tinca tinca
+<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546CF7F0>
+Label:  tench, Tinca tinca
+2023-08-22 15:48:36,522 SUCC cli.py:33 -- -----------------------------------
+2023-08-22 15:48:36,522 SUCC cli.py:34 -- Job 'rayjob-sample-ft8lh' succeeded
+2023-08-22 15:48:36,522 SUCC cli.py:35 -- -----------------------------------
+```
diff --git a/doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.md b/doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.md
index 86171dc5b1784..ff4b67eed99f8 100644
--- a/doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.md
+++ b/doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.md
@@ -3,7 +3,7 @@
 # RayJob
 
 :::{warning}
-This is the alpha version of RayJob Support in KubeRay. There will be ongoing improvements for RayJob in the future releases.
+RayJob support in KubeRay v0.x is in alpha.
 :::
 
 ## Prerequisites
diff --git a/doc/source/cluster/kubernetes/user-guides.md b/doc/source/cluster/kubernetes/user-guides.md
index db32fd9ae4dbd..3bbff6b7c542c 100644
--- a/doc/source/cluster/kubernetes/user-guides.md
+++ b/doc/source/cluster/kubernetes/user-guides.md
@@ -21,3 +21,4 @@ deployments of Ray on Kubernetes.
 * {ref}`kuberay-pod-security`
 * {ref}`kuberay-tls`
 * {ref}`deploy-a-static-ray-cluster-without-kuberay`
+* {ref}`kuberay-gke-bucket`
diff --git a/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md b/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md
index 2f9ad0b8129b0..4839ed2d16e5c 100644
--- a/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md
+++ b/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md
@@ -2,9 +2,11 @@
 
 # Start Google Cloud GKE Cluster with GPUs for KubeRay
 
+See <https://cloud.google.com/kubernetes-engine/docs/how-to/gpus> for full details, or continue reading for a quick start.
+
 ## Step 1: Create a Kubernetes cluster on GKE
 
-Run this command and all following commands on your local machine or on the [Google Cloud Shell](https://cloud.google.com/shell). If running from your local machine, you will need to install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install). The following command creates a Kubernetes cluster named `kuberay-gpu-cluster` with 1 CPU node in the `us-west1-b` zone. In this example, we use the `e2-standard-4` machine type, which has 4 vCPUs and 16 GB RAM.
+Run this command and all following commands on your local machine or on the [Google Cloud Shell](https://cloud.google.com/shell). If running from your local machine, you need to install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install). The following command creates a Kubernetes cluster named `kuberay-gpu-cluster` with 1 CPU node in the `us-west1-b` zone. This example uses the `e2-standard-4` machine type, which has 4 vCPUs and 16 GB RAM.
 
 ```sh
 gcloud container clusters create kuberay-gpu-cluster \
@@ -16,12 +18,11 @@ gcloud container clusters create kuberay-gpu-cluster \
 
 ## Step 2: Create a GPU node pool
 
-Run the following command to create a GPU node pool for Ray GPU workers.
-(You can also create it from the Google Cloud Console; see the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/node-taints#create_a_node_pool_with_node_taints) for more details.)
+Run the following command to create a GPU node pool for Ray GPU workers. You can also create it from the Google Cloud Console: <https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#console>
 
 ```sh
 gcloud container node-pools create gpu-node-pool \
-  --accelerator type=nvidia-l4-vws,count=1 \
+  --accelerator type=nvidia-l4-vws,count=1,gpu-driver-version=default \
   --zone us-west1-b \
   --cluster kuberay-gpu-cluster \
   --num-nodes 1 \
@@ -29,24 +30,17 @@ gcloud container node-pools create gpu-node-pool \
   --max-nodes 1 \
   --enable-autoscaling \
   --machine-type g2-standard-4 \
-  --node-taints=ray.io/node-type=worker:NoSchedule 
 ```
 
-The `--accelerator` flag specifies the type and number of GPUs for each node in the node pool. In this example, we use the [NVIDIA L4](https://cloud.google.com/compute/docs/gpus#l4-gpus) GPU. The machine type `g2-standard-4` has 1 GPU, 24 GB GPU Memory, 4 vCPUs and 16 GB RAM.
+The `--accelerator` flag specifies the type and number of GPUs for each node in the node pool. This example uses the [NVIDIA L4](https://cloud.google.com/compute/docs/gpus#l4-gpus) GPU. The machine type `g2-standard-4` has 1 GPU, 24 GB GPU Memory, 4 vCPUs and 16 GB RAM.
 
-The taint `ray.io/node-type=worker:NoSchedule` prevents CPU-only Pods such as the Kuberay operator, Ray head, and CoreDNS Pods from being scheduled on this GPU node pool. This is because GPUs are expensive, so we want to use this node pool for Ray GPU workers only.
+.. note::
 
-Concretely, any Pod that does not have the following toleration will not be scheduled on this GPU node pool:
+    GKE automatically installs the GPU drivers for you.  For more details, see [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create-gpu-pool-auto-drivers).
 
-```yaml
-tolerations:
-- key: ray.io/node-type
-  operator: Equal
-  value: worker
-  effect: NoSchedule
-```
+.. note::
 
-For more on taints and tolerations, see the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).
+    GKE automatically configures taints and tolerations so that only GPU pods are scheduled on GPU nodes.  For more details, see [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#create)
 
 ## Step 3: Configure `kubectl` to connect to the cluster
 
@@ -56,21 +50,4 @@ Run the following command to download Google Cloud credentials and configure the
 gcloud container clusters get-credentials kuberay-gpu-cluster --zone us-west1-b
 ```
 
-For more details, see the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl).
-
-## Step 4: Install NVIDIA GPU device drivers
-
-This step is required for GPU support on GKE. See the [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers) for more details.
-
-```sh
-# Install NVIDIA GPU device driver
-kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
-
-# Verify that your nodes have allocatable GPUs 
-kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
-
-# Example output:
-# NAME                                          GPU
-# gke-kuberay-gpu-cluster-gpu-node-pool-xxxxx   1
-# gke-kuberay-gpu-cluster-default-pool-xxxxx    <none>
-```
+For more details, see [GKE documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl).
diff --git a/doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md b/doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
new file mode 100644
index 0000000000000..9992d994bfbb6
--- /dev/null
+++ b/doc/source/cluster/kubernetes/user-guides/gke-gcs-bucket.md
@@ -0,0 +1,141 @@
+(kuberay-gke-bucket)=
+# Configuring KubeRay to use Google Cloud Storage Buckets in GKE
+
+If you are already familiar with Workload Identity in GKE, you can skip this document. The gist is that you need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. Otherwise, read on.
+
+This example is an abridged version of the documentation at <https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity>. The full documentation is worth reading if you are interested in the details.
+
+## Create a Kubernetes cluster on GKE
+
+This example creates a minimal KubeRay cluster using GKE.
+
+Run this and all following commands on your local machine or on the [Google Cloud Shell](https://cloud.google.com/shell). If running from your local machine, install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install).
+
+```bash
+gcloud container clusters create cloud-bucket-cluster \
+    --num-nodes=1 --min-nodes 0 --max-nodes 1 --enable-autoscaling \
+    --zone=us-west1-b --machine-type e2-standard-8 \
+    --workload-pool=my-project-id.svc.id.goog # Replace my-project-id with your GCP project ID
+```
+
+
+This command creates a Kubernetes cluster named `cloud-bucket-cluster` with one node in the `us-west1-b` zone. This example uses the `e2-standard-8` machine type, which has 8 vCPUs and 32 GB RAM.
+
+For more information on how to find your project ID, see <https://support.google.com/googleapi/answer/7014113?hl=en> or <https://cloud.google.com/resource-manager/docs/creating-managing-projects>.
+
+Now get credentials for the cluster to use with `kubectl`:
+
+```bash
+gcloud container clusters get-credentials cloud-bucket-cluster --zone us-west1-b --project my-project-id
+```
+
+## Create an IAM Service Account
+
+```bash
+gcloud iam service-accounts create my-iam-sa
+```
+
+## Create a Kubernetes Service Account
+
+```bash
+kubectl create serviceaccount my-ksa
+```
+
+## Link the Kubernetes Service Account to the IAM Service Account and vice versa
+
+In the following two commands, replace `default` with your namespace if you are not using the default namespace.
+
+```bash
+gcloud iam service-accounts add-iam-policy-binding my-iam-sa@my-project-id.iam.gserviceaccount.com \
+    --role roles/iam.workloadIdentityUser \
+    --member "serviceAccount:my-project-id.svc.id.goog[default/my-ksa]"
+```
+
+```bash
+kubectl annotate serviceaccount my-ksa \
+    --namespace default \
+    iam.gke.io/gcp-service-account=my-iam-sa@my-project-id.iam.gserviceaccount.com
+```
+
+## Create a Google Cloud Storage Bucket and allow the Google Cloud Service Account to access it
+
+Please follow the documentation at <https://cloud.google.com/storage/docs/creating-buckets> to create a bucket using the Google Cloud Console or the `gsutil` command line tool.  
+
+This example gives the principal `my-iam-sa@my-project-id.iam.gserviceaccount.com` "Storage Admin" permissions on the bucket. Enable the permissions in the Google Cloud Console ("Permissions" tab under "Buckets" > "Bucket Details") or with the following command:
+
+```bash
+gsutil iam ch serviceAccount:my-iam-sa@my-project-id.iam.gserviceaccount.com:roles/storage.admin gs://my-bucket
+```
+
+## Create a minimal RayCluster YAML manifest
+
+You can download the RayCluster YAML manifest for this tutorial with `curl` as follows:
+
+```bash
+curl -LO https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-cluster.gke-bucket.yaml
+```
+
+The key parts are the following lines:
+
+```yaml
+      spec:
+        serviceAccountName: my-ksa
+        nodeSelector:
+          iam.gke.io/gke-metadata-server-enabled: "true"
+```
+
+Include these lines in every pod spec of your Ray cluster. This example uses a single-node cluster (1 head node and 0 worker nodes) for simplicity.
+
+## Create the RayCluster
+
+```bash
+kubectl apply -f ray-cluster.gke-bucket.yaml
+```
+
+## Test GCS bucket access from the RayCluster
+
+Use `kubectl get pod` to get the name of the Ray head pod.  Then run the following command to get a shell in the Ray head pod:
+
+```bash
+kubectl exec -it raycluster-mini-head-xxxx -- /bin/bash
+```
+
+In the shell, run `pip install google-cloud-storage` to install the Google Cloud Storage Python client library. 
+
+(For production use cases, you will need to make sure `google-cloud-storage` is installed on every node of your cluster, or use `ray.init(runtime_env={"pip": ["google-cloud-storage"]})` to have the package installed as needed at runtime -- see <https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments> for more details.)
+
+Then run the following Python code to test access to the bucket:
+
+```python
+import ray
+import os
+from google.cloud import storage
+
+GCP_GCS_BUCKET = "my-bucket"
+GCP_GCS_FILE = "test_file.txt"
+
+ray.init(address="auto")
+
+@ray.remote
+def check_gcs_read_write():
+    client = storage.Client()
+    bucket = client.get_bucket(GCP_GCS_BUCKET)
+    blob = bucket.blob(GCP_GCS_FILE)
+    
+    # Write to the bucket
+    blob.upload_from_string("Hello, Ray on GKE!")
+    
+    # Read from the bucket
+    content = blob.download_as_text()
+    
+    return content
+
+result = ray.get(check_gcs_read_write.remote())
+print(result)
+```
+
+You should see the following output:
+
+```text
+Hello, Ray on GKE!
+```
diff --git a/doc/source/ray-overview/examples.rst b/doc/source/ray-overview/examples.rst
index 6b231ff5aaeb5..f5c0eb8fdbdab 100644
--- a/doc/source/ray-overview/examples.rst
+++ b/doc/source/ray-overview/examples.rst
@@ -1402,3 +1402,10 @@ Ray Examples
         :link-type: ref
 
         Distributed Training with Hugging Face Accelelate and TorchTrainer
+
+    .. grid-item-card:: :bdg-secondary:`Code example`
+        :class-item: gallery-item inference huggingface cv
+        :link: kuberay-batch-inference-example
+        :link-type: ref
+
+        RayJob Batch Inference Example on Kubernetes with Ray