Skip to content


[Doc] [KubeRay] Add tutorial for connecting to google cloud storage b…
Browse files Browse the repository at this point in the history
…ucket from GKE RayCluster (ray-project#38858)

This PR adds a self contained tutorial for connecting to a google cloud storage bucket. (Mostly self contained, we do link out to the google cloud docs for creating a bucket.)


Signed-off-by: Kai-Hsun Chen <>
Signed-off-by: Archit Kulkarni <>
Co-authored-by: Kai-Hsun Chen <>
Co-authored-by: Kai-Hsun Chen <>
Co-authored-by: angelinalg <>
Signed-off-by: Jim Thompson <>
  • Loading branch information
4 people authored and jimthompson5802 committed Sep 12, 2023
1 parent 916aed2 commit b6c3024
Show file tree
Hide file tree
Showing 3 changed files with 143 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/_toc.yml
Expand Up @@ -299,6 +299,7 @@ parts:
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
- file: cluster/kubernetes/user-guides/
Expand Down
1 change: 1 addition & 0 deletions doc/source/cluster/kubernetes/
Expand Up @@ -20,3 +20,4 @@ at the {ref}`introductory guide <kuberay-quickstart>` first.
* {ref}`kuberay-pod-security`
* {ref}`kuberay-tls`
* {ref}`deploy-a-static-ray-cluster-without-kuberay`
* {ref}`kuberay-gke-bucket`
141 changes: 141 additions & 0 deletions doc/source/cluster/kubernetes/user-guides/
@@ -0,0 +1,141 @@
# Configuring KubeRay to use Google Cloud Storage Buckets in GKE

If you are already familiar with Workload Identity in GKE, you can skip this document. The gist is that you need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. Otherwise, read on.

This example is an abridged version of the documentation at <>. The full documentation is worth reading if you are interested in the details.

## Create a Kubernetes cluster on GKE

This example creates a minimal KubeRay cluster using GKE.

Run this and all following commands on your local machine or on the [Google Cloud Shell]( If running from your local machine, install the [Google Cloud SDK](

gcloud container clusters create cloud-bucket-cluster \
--num-nodes=1 --min-nodes 0 --max-nodes 1 --enable-autoscaling \
--zone=us-west1-b --machine-type e2-standard-8 \ # Replace my-project-id with your GCP project ID

This command creates a Kubernetes cluster named `cloud-bucket-cluster` with one node in the `us-west1-b` zone. This example uses the `e2-standard-8` machine type, which has 8 vCPUs and 32 GB RAM.

For more information on how to find your project ID, see <> or <>.

Now get credentials for the cluster to use with `kubectl`:

gcloud container clusters get-credentials cloud-bucket-cluster --zone us-west1-b --project my-project-id

## Create an IAM Service Account

gcloud iam service-accounts create my-iam-sa

## Create a Kubernetes Service Account

kubectl create serviceaccount my-ksa

## Link the Kubernetes Service Account to the IAM Service Account and vice versa

In the following two commands, replace `default` with your namespace if you are not using the default namespace.

gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "[default/my-ksa]"

kubectl annotate serviceaccount my-ksa \
--namespace default \

## Create a Google Cloud Storage Bucket and allow the Google Cloud Service Account to access it

Please follow the documentation at <> to create a bucket using the Google Cloud Console or the `gsutil` command line tool.

This example gives the principal `` "Storage Admin" permissions on the bucket. Enable the permissions in the Google Cloud Console ("Permissions" tab under "Buckets" > "Bucket Details") or with the following command:

gsutil iam ch gs://my-bucket

## Create a minimal RayCluster YAML manifest

You can download the RayCluster YAML manifest for this tutorial with `curl` as follows:

curl -LO

The key parts are the following lines:

serviceAccountName: my-ksa
nodeSelector: "true"

Include these lines in every pod spec of your Ray cluster. This example uses a single-node cluster (1 head node and 0 worker nodes) for simplicity.

## Create the RayCluster

kubectl apply -f ray-cluster.gke-bucket.yaml

## Test GCS bucket access from the RayCluster

Use `kubectl get pod` to get the name of the Ray head pod. Then run the following command to get a shell in the Ray head pod:

kubectl exec -it raycluster-mini-head-xxxx -- /bin/bash

In the shell, run `pip install google-cloud-storage` to install the Google Cloud Storage Python client library.

(For production use cases, you will need to make sure `google-cloud-storage` is installed on every node of your cluster, or use `ray.init(runtime_env={"pip": ["google-cloud-storage"]})` to have the package installed as needed at runtime -- see <> for more details.)

Then run the following Python code to test access to the bucket:

import ray
import os
from import storage

GCP_GCS_BUCKET = "my-bucket"
GCP_GCS_FILE = "test_file.txt"


def check_gcs_read_write():
client = storage.Client()
bucket = client.get_bucket(GCP_GCS_BUCKET)
blob = bucket.blob(GCP_GCS_FILE)

# Write to the bucket
blob.upload_from_string("Hello, Ray on GKE!")

# Read from the bucket
content = blob.download_as_text()

return content

result = ray.get(check_gcs_read_write.remote())

You should see the following output:

Hello, Ray on GKE!

0 comments on commit b6c3024

Please sign in to comment.