Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Latest commit



139 lines (102 loc) · 6.47 KB

File metadata and controls

139 lines (102 loc) · 6.47 KB

Configuring Cloud Storage

Managed Cloud Storage

To make storage management and configuration simple for user, CloudTik does two good things for you When you are creating workspace for a specific cloud:

  • CloudTik creates a managed cloud storage for you (S3 for AWS, Data Lake Storage Gen 2 for Azure, GCS for GCP) to use without any configurations.
  • CloudTik creates roles to access your cloud storage in the account and the cluster instances are assigned with the roles for gaining access without any credential configurations.

These give great convenience for most of the use cases. For users who need perform advanced configurations, CloudTik provide the flexibility to do so.


By default, CloudTik will create a workspace managed S3 bucket for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.

Creating a S3 bucket

Every object in Amazon S3 is stored in a bucket. Before you can store data in Amazon S3, you must create a bucket.

Please refer to the S3 Creating buckets for instructions.

Configuring S3 in CloudTik

The name of S3 bucket will be used in CloudTik S3 storage configurations.

# Cloud-provider specific configuration.
    type: aws
    region: us-west-2
    # S3 configurations for storage
        s3.bucket: your_s3_bucket your_s3_access_key_id
        s3.secret.access.key: your_s3_secret_access_key your AWS Access Key ID.

s3.secret.access.key: your AWS Secret Access Key.

AWS Access Key ID and AWS Secret Access Key can be found from the AWS guide of Managing access keys.

Azure Storage

By default, CloudTik will create a workspace managed storage account and a Data Lake Storage Gen 2 container for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.

Creating Azure Storage

Azure Blob storage or Data Lake Storage Gen 2 are both supported by CloudTik. For performance, we suggest you to use Azure Data Lake Storage Gen 2.

If you want to create your own Azure storage account and a storage container within the storage account, please refer to Creating an Azure storage account for instructions.

Configuring Azure Storage in CloudTik

Storage account name and storage container name will be used when configuring Azure cluster yaml.

You will also need Azure account access keys when configuring an Azure configuration yaml file, which grants the access to the created Azure storage.

You will be able to fill out the azure_cloud_storage for your cluster configuration yaml file.

# Cloud-provider specific configuration.
    type: azure
    location: westus
    subscription_id: your_subscription_id
        # Choose cloud storage type: blob (Azure Blob Storage) or datalake (Azure Data Lake Storage Gen 2). datalake your_storage_account
        azure.container: your_container
        azure.account.key: your_account_key

subscription_id: Subscription ID of your Azure account. Azure Storage Account name that you want CloudTik help to create.

azure.container: Azure Storage Container name that you have created.

azure.account.key: your Azure account access keys.

Google GCS

By default, CloudTik will create a workspace managed GCS bucket for use out of box without any user configurations. The following applies only when you want to create or use your own storage and configurations.

Creating GCS Bucket

If you want to use your own GCS bucket, you can create one by following the Creating buckets. The name of bucket will be used when configuring GCP cluster yaml.

Configuring GCS in CloudTik

You can use the same login service account to gain access to the bucket or create a dedicated service account. Refer to Creating a service account if you need to create a service account.

To use the service account through API, you need a service account key. Refer to Create and manage service account keys for details.

To control access to the bucket, please refer to Google Cloud Storage: Use IAM permissions for instructions to add principal (using the service account) and roles for bucket resource. We suggest you to choose "Storage Admin" role to gain full access to GCS bucket.

You will be able to fill out the gcp_cloud_storage for your cluster configuration yaml file using the download JSON key file.

# Cloud-provider specific configuration.
    type: gcp
    region: us-central1
    availability_zone: us-central1-a
    project_id: your_project_id
    # GCS configurations for storage
        gcs.bucket: your_gcs_bucket your_service_account_client_email your_service_account_private_key_id
        gcs.service.account.private.key: your_service_account_private_key

A JSON file should be safely downloaded and kept after a service account is created.

project_id: "project_id" in the JSON file. : "client_email" in the JSON file. "private_key_id" in the JSON file.

gcs.service.account.private.key: "private_key" in the JSON file, in the format of -----BEGIN PRIVATE KEY-----\n......\n-----END PRIVATE KEY-----\n