In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments).

In [2]:
# Import pieces from codeflare-sdk
from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration
from codeflare_sdk.cluster.auth import TokenAuthentication

In [None]:
# Create authentication object for oc user permissions
auth = TokenAuthentication(
    token = "XXXXX",
    server = "XXXXX",
    skip_tls=False
)
auth.login()

This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):

In [4]:
# Create and configure our cluster object (and appwrapper)
cluster = Cluster(ClusterConfiguration(
    name='instascaletest',
    namespace='default',
    min_worker=2,
    max_worker=2,
    min_cpus=2,
    max_cpus=2,
    min_memory=8,
    max_memory=8,
    gpu=1,
    instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request
    machine_types=["m5.xlarge", "g4dn.xlarge"] # Head, worker AWS machine types desired
))

Written to: instascaletest.yaml


Same as last time, we will bring the cluster up, wait for it to be ready, and confirm that the specs are as-requested:

In [8]:
# Bring up the cluster
cluster.up()
cluster.wait_ready()

Waiting for requested resources to be set up...
Requested cluster up and running!


While the resources are being scaled, we can also go into the console and take a look at the InstaScale logs, as well as the new machines/nodes spinning up.

Once the cluster is ready, we can confirm the specs:

In [9]:
cluster.details()

RayCluster(name='instascaletest', status=<RayClusterStatus.READY: 'ready'>, min_workers=2, max_workers=2, worker_mem_min=8, worker_mem_max=8, worker_cpu=2, worker_gpu=1, namespace='default', dashboard='http://ray-dashboard-instascaletest-default.apps.meyceoz-032023.psap.aws.rhperfscale.org')

Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up.

In [10]:
cluster.down()

Once again, we can look at the machines/nodes and see that everything has been successfully scaled down!

In [None]:
auth.logout()