# Runhouse

The [Runhouse](https://github.com/run-house/runhouse) allows remote compute and data across environments and users. See the [Runhouse docs](https://runhouse-docs.readthedocs-hosted.com/en/latest/).

This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.

**Note**: Code uses `SelfHosted` name instead of the `Runhouse`.

In [1]:
%pip install --upgrade --quiet "runhouse[sky]"
%pip install --upgrade --quiet 'skypilot'

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import runhouse as rh
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import SelfHostedHuggingFaceLLM, SelfHostedPipeline
from langchain_community.llms.self_hosted_hugging_face import _generate_text, _load_transformer

In [3]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-a10x')

Output()

INFO | 2024-03-10 13:41:48.427922 | Saving config for rh-a10x-ssh-secret to Den
INFO | 2024-03-10 13:41:48.586415 | Saving secrets for rh-a10x-ssh-secret to Vault


In [4]:
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

In [5]:
model_env = rh.env(reqs=["transformers", "torch"])

In [6]:
load_transformer_remote = rh.function(fn=_load_transformer).to(gpu, env=model_env)

INFO | 2024-03-10 13:41:59.535978 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2024-03-10 13:42:00.383861 | Authentication (publickey) successful!
2024-03-10 15:42:00,388| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :32300 <> 127.0.0.1:32300 might be in use or destination not reachable
ERROR | 2024-03-10 13:42:00.388296 | Problem setting SSH Forwarder up: Couldn't open tunnel :32300 <> 127.0.0.1:32300 might be in use or destination not reachable
INFO | 2024-03-10 13:42:01.120154 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2024-03-10 13:42:01.952478 | Authentication (publickey) successful!
INFO | 2024-03-10 13:42:02.678318 | Server rh-a10x is up.
INFO | 2024-03-10 13:42:02.685062 | Copying package from file:///Users/sashabelousovrh/PycharmProjects/LangchainIntegration/langchain to: rh-a10x
INFO | 2024-03-10 13:42:04.563031 | Calling base_env.install
INFO | 2024-03-10 13:42:05.864888 | Time to call base_env.install: 1.3 seconds


Output()

INFO | 2024-03-10 13:42:09.530630 | Sending module _load_transformer to rh-a10x


Output()

In [7]:
generate_text_remote = rh.function(_generate_text).to(gpu, env=model_env)

INFO | 2024-03-10 13:42:14.461141 | Copying package from file:///Users/sashabelousovrh/PycharmProjects/LangchainIntegration/langchain to: rh-a10x
INFO | 2024-03-10 13:42:16.262320 | Calling base_env.install
INFO | 2024-03-10 13:42:17.572399 | Time to call base_env.install: 1.31 seconds


Output()

INFO | 2024-03-10 13:42:21.472036 | Sending module _generate_text to rh-a10x


Output()

In [8]:
SelfHostedHuggingFaceLLM_Remote = rh.module(cls=SelfHostedHuggingFaceLLM).to(system=gpu, env=model_env)

ValueError: "SelfHostedHuggingFaceLLM" object has no field "_name"

In [9]:
llm = SelfHostedHuggingFaceLLM(name="gemma-2b-it", model_id="gemma-2b-it", model_load_fn=load_transformer_remote, inference_fn=generate_text_remote).to(gpu, env=model_env)

ValueError: "SelfHostedHuggingFaceLLM" object has no field "_name"

In [None]:
llm_chain = LLMChain(prompt=prompt, llm=llm)

In [None]:
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"

llm_chain.run(question)

You can also load more custom models through the SelfHostedHuggingFaceLLM interface:

In [None]:
llm = SelfHostedHuggingFaceLLM(
    model_id="google/flan-t5-small",
    task="text2text-generation",
    hardware=gpu,
)

In [None]:
llm("What is the capital of Germany?")

Using a custom load function, we can load a custom pipeline directly on the remote hardware:

In [None]:
def load_pipeline():
    from transformers import (
        AutoModelForCausalLM,
        AutoTokenizer,
        pipeline,
    )

    model_id = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    pipe = pipeline(
        "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
    )
    return pipe


def inference_fn(pipeline, prompt, stop=None):
    return pipeline(prompt)[0]["generated_text"][len(prompt) :]

In [None]:
llm = SelfHostedHuggingFaceLLM(
    model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn
)

In [None]:
llm("Who is the current US president?")

You can send your pipeline directly over the wire to your model, but this will only work for small models (<2 Gb), and will be pretty slow:

In [None]:
pipeline = load_pipeline()
llm = SelfHostedPipeline.from_pipeline(
    pipeline=pipeline, hardware=gpu, model_reqs=["pip:./", "transformers", "torch"]
)

Instead, we can also send it to the hardware's filesystem, which will be much faster.

In [None]:
import pickle

rh.blob(pickle.dumps(pipeline), path="models/pipeline.pkl").save().to(
    gpu, path="models"
)

llm = SelfHostedPipeline.from_pipeline(pipeline="models/pipeline.pkl", hardware=gpu)