# Clusters

A cluster is the most basic form of compute in Runhouse, largely representing a group of instances or VMs connected with Ray. They largely fall in two categories:

1. Static Clusters: Any machine you have SSH access to, set up with IP addresses and SSH credentials.
2. On-Demand Clusters: Any cloud instance spun up automatically for you with your cloud credentials.

Runhouse provides various APIs for interacting with remote clusters, such as terminating an on-demand cloud cluster or running remote CLI or Python commands from your local dev environment.

Let's start with a simple example using AWS. After making sure your `~/.aws/credentials` file is set up with access to create instances in EC2, you can install Runhouse and create on-demand clusters in AWS. 

In [None]:
! pip install "runhouse[aws]"

## On-Demand Clusters

We can start by using the `rh.cluster` factory function to create our cluster. By specifying an `instance_type`, Runhouse sets up an On-Demand Cluster in AWS EC2 for us.

Each cluster must be provided with a unique `name` identifier during construction. This `name` parameter is used for saving down or loading previous saved clusters, and also used for various CLI commands for the cluster.

In [6]:
import runhouse as rh

aws_cluster = rh.cluster(name="test-cluster", instance_type="CPU:2")

Next, we set up a basic function to throw up on our cluster. For more information about Functions & Modules that you can put up on a cluster, see [Functions & Modules](https://www.run.house/docs/tutorials/api-modules). 

In [5]:
def run_home(name: str):
    return f"Run home {name}!"

remote_function = rh.function(run_home).to(aws_cluster)

After running `.to`, your function is set up on the cluster to be called from anywhere. When you call `remote_function`, it executes remotely on your AWS instance.

In [4]:
remote_function("in cluster!")

INFO | 2024-02-20 16:38:13.733518 | Calling run_home.call
INFO | 2024-02-20 16:38:14.807770 | Time to call run_home.call: 1.07 seconds


'Run home in cluster!!'

### On-Demand Clusters with TLS exposed

In the previous example, the cluster that was brought up in EC2 is only accessible to the original user that has SSH credentials to the machine. However, you can set up a cluster with ports exposed to open Internet, and access objects and functions via `curl`.

In [8]:
tls_cluster = rh.cluster(name="tls-cluster",
                         instance_type="CPU:2",
                         open_ports=[443], # expose HTTPS port to public
                         server_connection_type="tls", # specify how runhouse communicates with this cluster
                         den_auth=False, # no authentication required to hit this cluster (NOT recommended)
)



In [None]:
remote_tls_function = rh.function(run_home).to(tls_cluster)

In [10]:
remote_tls_function("Marvin")

INFO | 2024-02-20 17:09:03.605194 | Calling run_home.call
INFO | 2024-02-20 17:09:04.640570 | Time to call run_home.call: 1.04 seconds


'Run home Marvin!'

In [11]:
tls_cluster.address

'3.86.210.191'

In [15]:
! curl "https://3.86.210.191/run_home/call?name=Marvin" -k

{"data":"\"Run home Marvin!\"","error":null,"traceback":null,"output_type":"result_serialized","serialization":"json"}

## Static Clusters

If you have existing machines within a VPC that you want to connect to, you can simply provide the IP addresses and path to SSH credentials to the machine.

In [16]:
cluster = rh.cluster(  # using private key
              name="cpu-cluster-existing",
              ips=['<ip of the cluster>'],
              ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'},
          )

## Useful Cluster Functions 

In [17]:
tls_cluster.run(['pip install numpy && pip freeze | grep numpy'])



numpy==1.26.4


[(0,

In [18]:
tls_cluster.run_python(['import numpy', 'print(numpy.__version__)'])

1.26.4


[(0, '1.26.4\n', '')]