# Quick Start Guide

## Installation

Runhouse can be installed with:


In [None]:
!pip install runhouse

If using Runhouse with a cloud provider, you can additionally install cloud packages (e.g. the right versions of tools like boto, gsutil, etc.):

    $ pip install "runhouse[aws]"
    $ pip install "runhouse[gcp]"
    $ pip install "runhouse[azure]"
    # Or
    $ pip install "runhouse[all]"

To import runhouse:

In [None]:
import runhouse as rh

In [None]:
# Optional: to sync over secrets from your Runhouse account
# !runhouse login

## Cluster Setup

Runhouse provides APIs to make it easy to interact with your clusters. This can be either an existing, on-prem cluster you have access to, or cloud instances that Runhouse spins up/down for you (through your own cloud account).

**Note that Runhouse is NOT managed compute. Everything runs inside your own compute and storage, using your credentials.**

### Bring-Your-Own Cluster

If you are using an existing, on-prem cluster, no additional setup is needed. Just have your cluster IP
address and path to SSH credentials ready:

In [None]:
cluster = rh.cluster(
              name="cpu-cluster",
              ips=['<ip of the cluster>'],
              ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'},
          )


### On-Demand Cluster

For on-demand clusters through cloud accounts (e.g. AWS, Azure, GCP, LambdaLabs), Runhouse uses [SkyPilot](https://github.com/skypilot-org/skypilot) for much of the heavy lifting
with launching and terminating cloud instances.

To check which cloud providers are setup, as well as detailed instructions for setting up other
cloud providers, run the following CLI, or check out SkyPilot's
[cloud account setup](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloud-account-setup) for more detailed instructions.

In [4]:
!sky check

[33mSkyPilot collects usage data to improve its services. `setup` and `run` commands are not collected to ensure privacy.
Usage logging can be disabled by setting the environment variable SKYPILOT_DISABLE_USAGE_COLLECTION=1.[0m
Checking credentials to enable clouds for SkyPilot.
  [32m[1mAWS: enabled[0m          
  [31m[1mAzure: disabled[0m          
    Reason: ~/.azure/msal_token_cache.json does not exist. Run the following commands:
      $ az login
      $ az account set -s <subscription_id>
    For more info: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli
  [31m[1mGCP: disabled[0m          
    Reason: GCP tools are not installed or credentials are not set. Run the following commands:
      $ pip install google-api-python-client
      $ conda install -c conda-forge google-cloud-sdk -y
      $ gcloud init
      $ gcloud auth application-default login
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html
  [

In [None]:
cluster = rh.cluster(
              name="cpu-cluster",
              instance_type="CPU:8",
              provider="cheapest",      # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
          )


## Secrets and Portability

Using Runhouse with only the OSS Python package is perfectly fine, but you can unlock some unique portability features by creating an (always free) [account](https://api.run.house/)
and saving down your secrets and/or resource metadata there.

Think of the OSS-package-only experience as
akin to Microsoft Office, while creating an account will make your cloud resources sharable and
accessible from anywhere like Google Docs.

Some notes on security
* Our API servers only ever store light metadata about your resources (e.g. folder name, cloud provider, storage bucket, path). All actual data and compute stays inside your own cloud account and never hits our servers.
* Secrets are stored in [Hashicorp Vault](https://www.vaultproject.io/) (an industry standard for secrets management), never on our API servers, and our APIs simply call into Vault's APIs.

## Getting Started Example

In the following example, we demonstrate how you can use Runhouse to bridge the gap
between local and remote compute, and create Resources that can be saved, reused, and shared.

Please first make sure that you have successfully followed the Installation and Cluster Setup sections above prior to running this example.

In [None]:
import runhouse as rh

### Running local functions on remote hardware

First let's define a simple local function which returns the number of CPUs available.


In [None]:
def num_cpus():
    import multiprocessing
    return f"Num cpus: {multiprocessing.cpu_count()}"

num_cpus()

'Num cpus: 2'

Next, instantiate the cluster that we want to run this function on. This can be either an existing
cluster where you pass in an IP address and SSH credentials, or a cluster associated with supported
Cloud account (AWS, GCP, Azure, LambdaLabs), where it is automatically launched (and optionally
terminated) for you.

In [None]:
# Using an existing, bring-your-own cluster
cluster = rh.cluster(
              name="cpu-cluster",
              ips=['<ip of the cluster>'],
              ssh_creds={'ssh_user': '<user>', 'ssh_private_key':'<path_to_key>'},
          )

# Using a Cloud provider
cluster = rh.cluster(
              name="cpu-cluster",
              instance_type="CPU:8",
              provider="cheapest",      # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
          )

INFO | 2023-05-05 14:02:33,950 | Loaded Runhouse config from /root/.rh/config.yaml
INFO | 2023-05-05 14:02:33,956 | Attempting to load config for /carolineechen/cpu-cluster from RNS.
INFO | 2023-05-05 14:02:34,754 | No config found in RNS: {'detail': 'Resource does not exist'}


If using a cloud cluster, we can launch the cluster with `.up()` or `.up_if_not()`.

Note that it may take a few minutes for the cluster to be launched through the Cloud provider and set up dependencies.

In [None]:
cluster.up_if_not()

Now that we have our function and remote cluster set up, we're ready to see how to run this function on our cluster!

We wrap our local function in `rh.function`, and associate this new function with the cluster. Now, whenever we call this new function, just as we would call any other Python function, it runs on the cluster instead of local.

In [None]:
num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])

INFO | 2023-05-05 14:31:58,659 | Attempting to load config for /carolineechen/num_cpus_cluster from RNS.
INFO | 2023-05-05 14:31:59,470 | No config found in RNS: {'detail': 'Resource does not exist'}
INFO | 2023-05-05 14:31:59,473 | Writing out function function to /content/num_cpus_fn.py. Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body).
INFO | 2023-05-05 14:31:59,476 | Setting up Function on cluster.
INFO | 2023-05-05 14:31:59,479 | Copying local package content to cluster <cpu-cluster>
INFO | 2023-05-05 14:32:04,026 | Installing packages on cluster cpu-cluster: ['./']
INFO | 2023-05-05 14:32:04,402 | Function setup complete.


In [None]:
num_cpus_cluster()

INFO | 2023-05-05 14:32:06,397 | Running num_cpus_cluster via gRPC
INFO | 2023-05-05 14:32:06,766 | Time to send message: 0.37 seconds


'Num cpus: 8'

### Saving, Reusing, and Sharing

Runhouse supports saving down the metadata and configs for resources like clusters and functions, so that you can load them from a different environment, or share it with your collaborators.

In [None]:
num_cpus_cluster.save()

INFO | 2023-05-05 14:32:31,248 | Saving config to RNS: {'name': '/carolineechen/cpu-cluster', 'resource_type': 'cluster', 'resource_subtype': 'OnDemandCluster', 'instance_type': 'CPU:8', 'num_instances': None, 'provider': 'cheapest', 'autostop_mins': 30, 'use_spot': False, 'image_id': None, 'region': None, 'sky_state': {'name': 'cpu-cluster', 'launched_at': 1683295614, 'handle': {'cluster_name': 'cpu-cluster', 'cluster_yaml': '~/.sky/generated/cpu-cluster.yml', 'head_ip': '3.87.203.10', 'launched_nodes': 1, 'launched_resources': {'cloud': 'AWS', 'instance_type': 'm6i.2xlarge', 'use_spot': False, 'disk_size': 256, 'region': 'us-east-1', 'zone': 'us-east-1a'}}, 'last_use': '/usr/local/lib/python3.10/dist-packages/ipykernel_launcher.py -f /root/.local/share/jupyter/runtime/kernel-729e54ec-f20d-48a4-8603-099468cb0df6.json', 'status': 'UP', 'autostop': 30, 'to_down': True, 'owner': 'AIDASQMZKHMBGKPSNXGMZ', 'metadata': {}, 'cluster_hash': 'b5ff32eb-425d-42af-ac6c-801be1f399de', 'public_key':

<runhouse.rns.function.Function at 0x7fb3b7ca1ff0>

In [None]:
num_cpus_cluster.share(
    users=["<email_to_runhouse_account>"],
    access_type="write",
)

Now, you, or whoever you shared it with, can reload this function from anther dev environment (like a different Colab, local, or on a cluster), as long as you are logged in to your Runhouse account.

In [None]:
reloaded_function = rh.function(name="num_cpus_cluster")
reloaded_function()

INFO | 2023-05-05 14:32:34,922 | Attempting to load config for /carolineechen/num_cpus_cluster from RNS.
INFO | 2023-05-05 14:32:35,708 | Attempting to load config for /carolineechen/cpu-cluster from RNS.
INFO | 2023-05-05 14:32:36,785 | Setting up Function on cluster.
INFO | 2023-05-05 14:32:48,041 | Copying local package content to cluster <cpu-cluster>
INFO | 2023-05-05 14:32:50,491 | Installing packages on cluster cpu-cluster: ['./']
INFO | 2023-05-05 14:32:50,862 | Function setup complete.
INFO | 2023-05-05 14:32:50,863 | Running num_cpus_cluster via gRPC
INFO | 2023-05-05 14:32:51,271 | Time to send message: 0.41 seconds


'Num cpus: 8'

### Terminate the Cluster

To terminate the cluster, you can run:

In [None]:
cluster.teardown()

### Summary

In this tutorial, we demonstrated how to use runhouse to create references to remote clusters, run local functions on the cluster, and save/share and reuse functions with a Runhouse account.

Runhouse also lets you:
- Send and save data (folders, blobs, tables) between local, remote, and file storage
- Send, save, and share dev environments
- Reload and reuse saved resources (both compute and data) from different environments (with a Runhouse account)
- ... and much more!