# Runhouse

The [Runhouse](https://www.run.house/) allows remote compute and data across environments and users. See the [Runhouse docs](https://www.run.house/docs).

This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.

**Note**: Code uses `SelfHosted` name instead of the `Runhouse`.

In [None]:
%pip install --upgrade --quiet "runhouse[sky]"

In [2]:
import runhouse as rh
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import SelfHostedHuggingFaceLLM, SelfHostedPipeline

In [3]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="langchain-rh-a10x", instance_type="g5.4xlarge")
gpu.up_if_not()
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-a10x')

Output()

Output()

<runhouse.resources.hardware.on_demand_cluster.OnDemandCluster at 0x107573fd0>

In [None]:
model_env = rh.env(
    name="model_env",
    reqs=["transformers", "torch", "accelerate", "huggingface-hub"],
    secrets=["huggingface"],  # need for downloading google/gemma-2b-it
).to(system=gpu)

In [None]:
gpu.run(commands=["pip install langchain"])

In [None]:
llm = SelfHostedHuggingFaceLLM(
    model_id="google/gemma-2b-it",
    hardware=gpu,
    env=model_env,
)

In [7]:
template = """Question: {question}

Answer: Let's think step by step."""

In [8]:
prompt = PromptTemplate.from_template(template)

In [9]:
llm_chain = LLMChain(prompt=prompt, llm=llm)

In [10]:
question = "What is the capital of Germany?"

llm_chain.run(question)

  warn_deprecated(
INFO | 2024-03-24 12:43:27.185705 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-24 12:45:19.638441 | Time to call LangchainLLMModelPipeline.interface_fn: 112.45 seconds


'\n\nThe word "Germany" has numerous meanings and can refer to different countries around the world.\n\nThe capital city of Germany is Berlin, which is located in the center of the country.'

You can also execute the prediction function of the model directly:


In [11]:
llm("Write me a short poem about Super Bowl")

  warn_deprecated(
INFO | 2024-03-24 12:45:29.004133 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-24 12:47:46.580013 | Time to call LangchainLLMModelPipeline.interface_fn: 137.58 seconds


' Sunday. \n\nThe crowd roars loud, the lights ignite,\nA symphony of cheers resound in the night.\nThe players take the field, the tension grows,\nFor victory or defeat, the battle flows.\n\nThe atmosphere is electric and alive,\nA spectacle of passion and strife.\nThe atmosphere'