# Anyscale Administrator Overview

### 0. Overview

- For admins, to set up Anyscale in your infrastructure
- Overview of Anyscale Clouds
- Cloud Deployment types
- What you need to set up an Anyscale Cloud (resources etc.) with the Anyscale CLI
- Example: Understanding an Anyscale deployment on EC2
- Not fully executable, but you'll understand how


### Outlook

- Difference between VM and K8s deployments
- Concrete code examples for:
    - AWS EC2 + GCP GCE
    - AWS EKS + GCP GKE
- You choose your setup

## 1. What is an Anyscale Cloud?

An **Anyscale Cloud** is a logical abstraction that **manages the infrastructure required to run Ray clusters**. It serves as an isolated deployment layer between your **Anyscale organization** and the Ray projects you create.

### Purpose

Deploying an Anyscale cloud establishes a **secure trust relationship** between the Anyscale control plane and your cloud provider resources (AWS, Google Cloud, etc.).

### Key Functions
- 🎯 **Cluster Management**: Maintains configuration for launching and managing Ray clusters
- 🚀 **Multi-Cloud Deployment**: Deploys clusters across AWS, Google Cloud, or Kubernetes environments  
- 📦 **Resource Organization**: Defines the collection of resources needed for Anyscale operations
- 🔒 **Isolation**: Provides quota-scoped, isolated environments for users within your organization

## 2. Cloud Deployment Types

You can deploy Anyscale Clouds on different infrastructure types (virtual machines vs. Kubernetes), and can choose the way you deploy them (managed vs. custom).

### Supported Infrastructure Types

| **Infrastructure** | **Supported Platforms** |
|-------------------|------------------------|
| **Virtual Machines (VMs)** | • AWS EC2<br>• Google Cloud Compute Engine<br>• Serverless Anyscale (hosted by Anyscale) |
| **Kubernetes (k8s)** | • Amazon EKS<br>• Google GKE<br>• Azure AKS<br>• On-premises Kubernetes clusters |

### How Resources are defined

| **Resources Management** | **Command** | **Best For** | **Key Features** |
|---------------------|------------|--------------|------------------|
| **🤖 Anyscale-Defined** | `anyscale cloud setup` | Quick setup, demos, evaluation | • Anyscale creates all resources<br>• Public subnet deployment (Each Ray node has public IP and can be accessed via SSH)<br>• Minimal configuration required<br>• ⚠️ VM deployments only |
| **🛠️ Customer-Defined** | `anyscale cloud register` | Production, compliance, customization | • Full infrastructure control<br>• Private/public subnet options<br>• Custom VPCs, IAM, networking<br>• ✅ Required for Kubernetes |


**📝 Note**: All examples in this tutorial use `anyscale cloud register` to demonstrate full infrastructure control, including private subnets, custom VPCs, and detailed IAM configurations.

The following section provides a practical example demonstrating the resources that Anyscale requires for AWS EC2 deployment. This example serves as a starting point to help you understand the deployment process. For Kubernetes deployments, additional K8s components are required, which will be covered in the next course.

## 3. A Demonstrative Example of Resource Creation with AWS EC2

Deploy Anyscale by creating several resources in your cloud provider. This section uses the **"Architecture of an Anyscale Cloud runs on AWS EC2"** diagram to explain the required resources.

<img src="https://docs.anyscale.com/assets/images/aws-customer-defined-2ec2f924ecfe532b9ac8c30376c32aa4.png" alt="Alt text" width="100%"/>

Streamline long-term management and enable customization by using modularized Terraform resources to create your cloud resources:


| Name | Optional/Required | Description |
|------|------------------|-------------|
| **[aws-anyscale-iam](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/tree/main/modules/aws-anyscale-iam)** | Required | Builds `IAM roles` and `policies` for secure cross-account access from the Anyscale control plane and EC2 instances in the data plane. |
| **[aws-anyscale-securitygroups](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-securitygroups/README.md)** | Required | Configures `Security Groups` essential for Anyscale clusters and (optional) EFS storage.|
| **[aws-anyscale-s3](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-s3/README.md)** | Required | Creates an `S3 Bucket` to store logs and shared resources.|
| **[aws-anyscale-s3-policy](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-s3-policy/README.md)** | Required | Implements an `S3 Bucket Policy`, integrating with the `aws-anyscale-iam` module for comprehensive access control.|
| **[aws-anyscale-vpc](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-vpc/README.md)** | Optional but opinionated | Creates a basic (opinionated) `VPC` for Anyscale.|
| **[aws-anyscale-efs](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-efs/README.md)** | Optional | Deploys `EFS storage solutions` for workspace persistence and cluster shared storage|
| **[aws-anyscale-memorydb](https://github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules/blob/main/modules/aws-anyscale-memorydb/README.md)** | Optional | Sets up `MemoryDB` as Anyscale Services Redis Cache for Head node fault tolerance|



## 3.1 IAM Role Definition

Deploy an Anyscale cloud by creating two Identity and Access Management (IAM) roles:

#### 3.1.1  Anyscale Control Plane Role (**anyscale-iam-role-id**)

* **Who uses it**: Anyscale's services
* **Purpose**: Allows Anyscale control plane to manage your resources
* **Trust**: Anyscale AWS account **525325868955** (plus External ID added automatically)
* **Permissions**: EC2, VPC, EFS, IAM PassRole, CloudFormation, etc.  
  See docs for full least‑privilege policy.

This trust policy in your anyscale-iam-role-id role is saying:
"Allow AWS account 525325868955 (Anyscale's account) to assume this role in my account"
```
The Cross-Account Sample Access Flow
    ↓
Anyscale's AWS Account (525325868955)
    ↓ "Hey, I want to assume this role"
    ↓
Your AWS Account (your-account-id)
├── anyscale-iam-role-id 
│   ├── Trust Policy: ✅ "Yes, 525325868955 can assume me"
│   └── IAM Policies: "This role can access S3 etc."
└── Your Resources (S3 etc.)
    ↑
    └── Now Anyscale can access these through the assumed role
```

<details>
<summary>📝 Example trust relationship</summary>

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "AWS": "525325868955" },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}
```
</details>

### 3.1.2  Instance Role (**instance-iam-role-id**)
* **Who uses it**: Your Ray nodes/compute instances
* **Purpose**: Allows Ray workers to access S3 and other AWS services
* **Trusted by**: **ec2.amazonaws.com**
* **Permissions**: At minimum S3 read/write for your bucket, CloudWatch logs/metrics.<br>
  Extend as required (e.g. Secrets Manager, RDS, KMS).

<details>
<summary>📝 Example minimal S3 policy</summary>

```json
{
  "Version": "2012-10-17",
  "Statement": [
    { "Sid": "ListBucket",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::<bucket-name>"
    },
    { "Sid": "AllObjectActions",
      "Effect": "Allow",
      "Action": "s3:*Object",
      "Resource": "arn:aws:s3:::<bucket-name>/*"
    }
  ]
}
```
</details>

### 3.2 VPC

VPC provides a logically isolated network environment for your Anyscale Cloud within your cloud provider. **Depending on your scenario**, create a new VPC for your Anyscale Cloud or reuse an existing VPC in your account.

<details>
<summary>📝 VPC Deployment Options</summary>

#### Option 1: Create a New VPC
```terraform
# Create a new VPC
module "anyscale_vpc" {
  source = "github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules//modules/aws-anyscale-vpc"
  anyscale_vpc_name = "anyscale-ec2-vpc"
  cidr_block        = var.vpc_cidr
}
```

#### Option 2: Use Existing VPC
```terraform
# Reference existing VPC
vpc_id = var.existing_vpc_id
```
</details>

### 3.3 Subnets

Subnets divide the VPC's IP address space into smaller, manageable segments, enabling organized network architecture for Ray clusters.

#### Dual-Subnet Architecture

Anyscale deployment uses a **dual-subnet architecture** optimized for Ray clusters:

| **Subnet Type** | **Purpose** | **Components** |
|----------------|-------------|----------------|
| **Public Subnets** | External connectivity | • Load Balancers (external access)<br>• NAT Gateways (outbound internet)<br>• Bastion Hosts (admin access) |
| **Private Subnets** | Secure compute environment | • Ray Head Nodes (cluster coordination)<br>• Ray Worker Nodes (distributed processing)<br>• Shared Storage (EFS mount targets) |

<details>
<summary>📝 Subnets sample configuration</summary>

```terraform
locals {
  public_subnets  = ["172.24.101.0/24", "172.24.102.0/24"]  # External facing
  private_subnets = ["172.24.20.0/24", "172.24.21.0/24"]    # Ray clusters
}
```
</details>

**Key Benefits**

- 🔒 **Security**: Ray compute nodes isolated from direct internet exposure
- 🌍 **High Availability**: Resources distributed across multiple availability zones  
- 📈 **Scalability**: Organized IP allocation supports dynamic cluster scaling

### 3.4 Security Groups

Security Groups act as stateful virtual firewalls that control traffic at the instance level, and each security group tightly bounds to exactly one VPC. You can find rule details by go to [**Deploy an Anyscale cloud on AWS**](https://docs.anyscale.com/administration/cloud-deployment/deploy-aws-cloud/?cloud-deployment=custom), search "**anyscale cloud register (custom)**", then click "**Security group**".

<details>
<summary>📝 Security group sample configuration</summary>

####  Create custom security group
```terraform
module "anyscale_security_group" {
  source = "github.com/anyscale/terraform-aws-anyscale-cloudfoundation-modules//modules/aws-anyscale-securitygroups"

  vpc_id = module.anyscale_vpc.vpc_id
  security_group_name = "anyscale-ec2-sg-new"
  security_group_description = "Anyscale EC2 Security Group (New VPC)"

  # HTTPS access from customer CIDR ranges (no SSH)
  ingress_from_cidr_map = [
    {
      rule        = "https-443-tcp"
      cidr_blocks = join(",", var.customer_ingress_cidr_ranges)
      description = "Allow HTTPS from customer CIDR ranges"
    }
  ]

  # Allow all traffic within the VPC for internal communication
  ingress_with_self = [
    {
      rule = "all-all"
      description = "Allow all traffic from within the VPC"
    }
  ]
}
```
</details>

### 3.5 S3
Create the bucket with permissions granted to both the instance IAM role and Anyscale IAM role. 

<details>
<summary>📝 Example permissions</summary>

```terraform
local{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "allow-role-access",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<account_id>:role/<your-anyscale-iam-role-name>",
                    "arn:aws:iam::<account_id>:role/<your-instance-iam-role-name>"
                ]
            },
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>/*",
                "arn:aws:s3:::<your-bucket-name>"
            ]
        }
    ]
}
```
</details>

Additionally, if you plan to use the Anyscale UI to view job execution logs, add the following CORS rules to your bucket. This configuration allows the Anyscale UI to directly read and display logs from your S3 bucket without routing data through the Anyscale control plane.

<details>

```terraform
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET", "PUT", "POST", "HEAD", "DELETE"
        ],
        "AllowedOrigins": [
            "https://*.anyscale.com"
        ],
        "ExposeHeaders": []
    }
]
```
</details>

### 3.6 EFS (Optional)

Create EFS as a mount target using the subnets and security groups configured for your cloud.

EFS (Elastic File System) provides POSIX-compliant shared file storage that can be mounted across multiple Ray nodes

It is highly recommended for workspace persistence, as well as cluster shared storage.

### 3.7 MemoryDB (Optional)

MemoryDB is an optional component that used solely for head node fault tolerance in Ray clusters.

### 3.8 Summary
These are the resources required to deploy Anyscale to AWS EC2.

To deploy to AWS EKS, please refer to [this project](https://github.com/anyscale/terraform-kubernetes-anyscale-foundation-modules/tree/main/examples/aws) for examples with modularized Terraform resources.

To deploy to GCP, use different resource names by referring to [this project](https://github.com/anyscale/terraform-google-anyscale-cloudfoundation-modules/tree/main/modules) for examples with modularized Terraform resources.

Additionally, handle the following:

- **Workload Identity Federation** (GCP): Configure trust between Anyscale’s AWS account and your GCP project for secure cross-cloud authentication.

More deployment details can be found from [Anyscale's official documentation site](https://docs.anyscale.com/administration/cloud-deployment/overview)

## 4. Register Anyscale Cloud to Your Cloud Provider

Register an Anyscale cloud to AWS EC2 using a command like this:

In [None]:
!anyscale cloud register --provider aws \
  --name {ANYSCALE_CLOUD_NAME} \
  --region {AWS_REGION} \
  --vpc-id {VPC_ID} \
  --subnet-ids {SUBNET_IDS} \
  --s3-bucket-id {S3_BUCKET_ID} \
  --anyscale-iam-role-id {ANYSCALE_IAM_ROLE_ARN} \
  --instance-iam-role-id {INSTANCE_IAM_ROLE_ARN} \
  --security-group-ids {SECURITY_GROUP_IDS} \
  --efs-id {EFS_ID}

The `anyscale cloud register` command takes the resource IDs (VPC, subnets, IAM roles, S3 bucket, etc.) that you created through Terraform and registers them as a unified Anyscale Cloud deployment.