# Introduction: Deploy Anyscale Ray on A New Google Kubernetes Engine (GKE) Cluster

© 2025, Anyscale. All Rights Reserved

This comprehensive guide walks you through the entire process of deploying Anyscale on GKE using the custom **`anyscale cloud register`** method. It walks through the necessary steps from prerequisites to Ray installation with Anyscale Operator.

Use it as a starting point and replace all placeholders (e.g.&nbsp;`{ANYSCALE_CLOUD_NAME}`) with values from your environment.

It is based on this [example](https://github.com/anyscale/terraform-kubernetes-anyscale-foundation-modules/blob/main/examples/gcp/gke-new_cluster/README.md), please refer to it for more information.

## Prerequisites

Before you begin, ensure you have the following tools installed:

```bash
# Create Google Cloud Project
# https://cloud.google.com/resource-manager/docs/creating-managing-projects

# Install Google Cloud SDK/CLI
# https://cloud.google.com/sdk/docs/install

# Configure Google Cloud CLI Authentication
# https://cloud.google.com/docs/authentication/gcloud

# Install kubectl (version 1.25+)
# https://kubernetes.io/docs/tasks/tools/

# Install helm (version 3.10+)
# https://helm.sh/docs/intro/install/

# Install Anyscale CLI (version 0.5.86+)
# https://docs.anyscale.com/reference/quickstart-cli/

# Install Terraform (version 1.9+)
# https://developer.hashicorp.com/terraform/install
```

You also need:
- GCP Project Owner or Editor role
- Billing account enabled
- Required APIs enabled (we'll do this in the next step)


<div class="alert alert-block alert-info">
<b>Alternative Terraform Installation:</b> If you are not able to install <b>Terraform 1.9+</b> with homebrew, you can try to install it with <code>tfenv</code>.

<details>
<summary>Click to expand installation steps</summary>

```bash
brew install tfenv
tfenv install 1.9.0
tfenv use 1.9.0
terraform version
```

</details>
</div>

## 1. Installation

### 1.1. Configure Google Cloud Authentication

<details>
<summary>Before starting, add gcloud command to your environment variables if you want to run your gcloud command globally:
</summary>

First, find your gcloud executable. On your system it might be under:

```bash
 /Users/{USER_NAME}/Downloads/google-cloud-sdk/bin/gcloud
 ```
 
Second, run:

```bash
echo $SHELL
```

The output might be:

```bash
/bin/zsh
```
Export the gcloud path. For example:

```bash
echo 'export PATH="/Users/{USER_NAME}/Downloads/google_cloud/sdk/bin:$PATH"' >> ~/.zshrc'
```

Finally, run:
```bash
source ~/.zshrc
```
in your current terminal to start using gcloud command.
</detail>


Define the global variables and configure Google Cloud Authentication:

In [None]:
GCP_PROJECT_ID = "anyscale-enablement-june-2025" # Replace with your actual Google project ID
GCP_REGION = "us-west2" # Replace with your actual GCP region
ANYSCALE_CLOUD_NAME = "anyscale-cloud-gke-private-xxx" # Replace with your actual Anyscale cloud name

In [None]:
# Set up application default credentials
!gcloud auth application-default login

# Set your project (replace with your actual project ID)
!gcloud config set project {GCP_PROJECT_ID}

# Set your region
!gcloud config set compute/region {GCP_REGION}

# Verify configuration
!gcloud config list


### 1.2: Enable Required APIs

Enable all the necessary Google Cloud APIs for this deployment:


In [None]:
!gcloud services enable \
  container.googleapis.com \
  compute.googleapis.com \
  storage.googleapis.com \
  file.googleapis.com \
  iam.googleapis.com \
  cloudresourcemanager.googleapis.com

# Verify APIs are enabled
!gcloud services list --enabled --filter="name:container.googleapis.com OR name:compute.googleapis.com"


You will see:
```bash
NAME                      TITLE
compute.googleapis.com    Compute Engine API
container.googleapis.com  Kubernetes Engine API
```
which means k8s and GCE APIs are enabled.

## 2. Create Anyscale Resources with Terraform

Now we'll set up the GKE infrastructure using Terraform.


### 2.1: Create terraform.tfvars

Create a `terraform.tfvars` file with your specific configuration:


```
google_project_id = {GCP_PROJECT_ID}
google_region = {GCP_REGION}
```

### 2.2: Deploy Infrastructure

Initialize and deploy the Terraform configuration:


In [None]:
# Initialize Terraform
!terraform init

# Plan the deployment (review what will be created)
!terraform plan

# Apply the configuration (this may take 10-15 minutes)
!terraform apply -auto-approve

<div class="alert alert-block alert-info">
<b>Take a note to the output of terraform apply! </b>You will need it when you register the Anyscale cloud to your cloud provider.
</div>

<details>
<summary>Sample output</summary>
```
Outputs:

anyscale_registration_command = <<EOT
anyscale cloud register \
    --name <anyscale_cloud_name> \
    --provider gcp \
    --region us-west2 \
    --compute-stack vm \
    --anyscale-operator-iam-identity anyscale-nodes-d3a9a7d0@xing-compute-engine-test.iam.gserviceaccount.com \
    --cloud-storage-bucket-name anyscale-compute-engine-bucket-xxx \
    --project-id xxx \
    --vpc-name xxx \
    --subnet-names anyscale-subnet \
    --anyscale-service-account-email anyscale-nodes-xxx@xxx.iam.gserviceaccount.com \
    --instance-service-account-email anyscale-nodes-xxx@xxx.iam.gserviceaccount.com \
    --provider-name projects/xxx/locations/global/workloadIdentityPools/xxxx/providers/anyscale-provider \
    --firewall-policy-names anyscale-firewall-policy-xxxx \
    --file-storage-id anyscale-filestore-xxxx \
    --filestore-location xxxxx-a
EOT
compute_nodes_service_account_email =
filestore_instance_name = 
filestore_location = 
firewall_policy_name = 
gcs_bucket_name = 
workload_identity_pool_provider = 
```
</details>

## 3. Troubleshooting GPU Availability

<div class="alert alert-block alert-warning">
<b>Common Issue:</b> T4 GPUs may not be available in all zones. If you encounter GPU availability errors, follow these steps:
</div>


In [None]:
# Check GPU availability by your zone, for example, us-west2
!gcloud compute accelerator-types list --filter="zone:{GCP_REGION}"

# If T4 GPUs are not available in us-west2-a, you may need to modify the Terraform configuration
# to use different zones where T4 GPUs are available


## 4. kubectl Configuration

Configure kubectl to connect to your newly created GKE cluster:


In [None]:
# Install GKE auth plugin
!gcloud components install gke-gcloud-auth-plugin

# Get cluster credentials (replace with your actual values)
!gcloud container clusters get-credentials anyscale-gke --region {GCP_REGION} --project {GCP_PROJECT_ID}

# Verify connection
!kubectl get nodes


## 5. Install NGINX Ingress Controller

Install the NGINX Ingress Controller to handle external traffic. Choose sample-values_nginx_gke_private.yaml or sample-values_nginx_gke_public.yaml depending on your preference.


In [None]:
# Add Helm repository for NGINX
!helm repo add nginx https://kubernetes.github.io/ingress-nginx
!helm repo update

# Install NGINX Ingress Controller
!helm upgrade ingress-nginx nginx/ingress-nginx \
  --version 4.12.1 \
  --namespace ingress-nginx \
  --values nginx-values.yaml \
  --create-namespace \
  --install

In [None]:
# Verify NGINX installation
!kubectl get service --namespace ingress-nginx ingress-nginx-controller --output wide
!kubectl get pods --namespace ingress-nginx

# Wait for external IP to be assigned (may take a few minutes)
!kubectl get service --namespace ingress-nginx ingress-nginx-controller --watch

## 6. (Optional) Upgrade Anyscale Dependencies

Install the required Anyscale dependencies and CLI:


In [None]:
# Install GCP dependencies for Anyscale
!pip install 'anyscale[gcp]'

# Upgrade Anyscale CLI to latest version
!pip install --upgrade anyscale

# Verify installation
!anyscale --version

## 7. Register the Anyscale Cloud

First, ensure you're logged into Anyscale. if you cannot run it in console, please run it in your local terminal:

In [None]:
!anyscale login

Then use the output of `terraform apply` to register Anyscale cloud. You only need to replace the `--name` parameter with your preferred `ANYSCALE_CLOUD_NAME`. The command looks like:

```bash
anyscale cloud register ...
```

You will get output like:

```text
Output
(anyscale +17.9s) For registering this cloud's Kubernetes Manager, use cloud deployment ID 'cldrsrc_12345abcdefgh67890ijklmnop'.
(anyscale +18.0s) Successfully created cloud anyscale-cloud-gke-private-xxxxx, and it's ready to use.
```

After running the command, note the Cloud Deployment ID from the output. It will look something like: 
```
cldrsrc_12345abcdefgh67890ijklmnop
```
You'll need this for the next step


## 8. Install the Anyscale Operator

In [None]:
# Set the cloud deployment ID from the previous step
CLOUD_DEPLOYMENT_ID = "cldrsrc_12345abcdefgh67890ijklmnop"  # Replace with your actual cloud deployment ID

!helm repo add anyscale https://anyscale.github.io/helm-charts
!helm upgrade anyscale-operator anyscale/anyscale-operator \
  --set-string cloudDeploymentId={CLOUD_DEPLOYMENT_ID} \
  --set-string cloudProvider=aws \
  --set-string region={GCP_REGION} \
  --set-string workloadServiceAccountName=anyscale-operator \
  --namespace anyscale-operator \
  --create-namespace \
  --install

## 8. Test

Once the cluster is created, you can test it by submitting a job:

In [None]:
!cd ../test && python test_job.py --cloud-name {ANYSCALE_CLOUD_NAME} --stack-type k8s

# You can check the job status by running:
!anyscale job list --cloud {ANYSCALE_CLOUD_NAME}

You just start a job and you can see the logs from your Anyscale Console. You can view the running results from Anyscale console in "Jobs".

## 9. Cleanup

When you're done, you can terminate the cluster and clean up resources:


In [None]:
# Step 1: Delete Anyscale Cloud Registration

!anyscale cloud delete {ANYSCALE_CLOUD_NAME}

# Step 2: Clean up Kubernetes resources (BEFORE deleting cluster)

!helm uninstall anyscale-operator --namespace anyscale-operator
!helm uninstall ingress-nginx --namespace ingress-nginx
!kubectl delete namespace anyscale-operator
!kubectl delete namespace ingress-nginx

# Step 3: Empty GCS bucket (if it has objects)
# (Optional) Please make sure the Compute Engine API and any other required APIs are enabled:
#!gcloud services list --enabled --project={YOUR_GCP_PROJECT_NAME}

# (Optional) If you want to delete the GCS bucket, you can run the following command:
# Find your GCS bucket name from the output of the terraform apply command, then run the following command:
!gsutil rm -r gs://{GCS_BUCKET_NAME}/*

# Step 4: Destroy Terraform resources
!terraform destroy --auto-approve -var="google_project_id={GCP_PROJECT_ID}" -var="google_region={GCP_REGION}"

# You may need to delete service accounts and VPC network created by terraform manually.