# Runhouse

The [Runhouse](https://www.run.house/) allows remote compute and data across environments and users. See the [Runhouse docs](https://www.run.house/docs).

This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.

**Note**: Code uses `SelfHosted` name instead of the `Runhouse`.

In [None]:
%pip install --upgrade --quiet "runhouse[sky]"

In [12]:
import runhouse as rh
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import SelfHostedHuggingFaceLLM, SelfHostedPipeline

In [None]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name='langchain-rh-a10x', instance_type='g5.4xlarge', provider='aws')
gpu.up_if_not()
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-a10x')

In [None]:
model_env = rh.env(
    name="model_env",
    reqs=["transformers", "torch", "accelerate", "huggingface-hub"],
    secrets=["huggingface"]  # need for downloading google/gemma-2b-it
).to(system=gpu)

In [None]:
gpu.run(commands=["pip install langchain"])

In [None]:
llm = SelfHostedHuggingFaceLLM(model_id="google/gemma-2b-it", task="text2text-generation",hardware=gpu, env=model_env)

In [7]:
template = """Question: {question}

Answer: Let's think step by step."""

In [8]:
prompt = PromptTemplate.from_template(template)

In [18]:
llm_chain = LLMChain(prompt=prompt, llm=llm)

In [19]:
question = "What is the capital of Germany?"

llm_chain.run(question)

INFO | 2024-03-21 17:10:21.619127 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-21 17:11:38.269872 | Time to call LangchainLLMModelPipeline.interface_fn: 76.65 seconds


'\n\nThe word "Germany" is a country in Europe. The capital of Germany is Berlin.'

You can also execute the prediction function of the model directly:


In [20]:
llm("Write me a short poem about Super Bowl")

  warn_deprecated(

INFO | 2024-03-21 17:11:52.836679 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-21 17:14:47.459948 | Time to call LangchainLLMModelPipeline.interface_fn: 174.62 seconds


' Sunday.\n\nThe roar of the crowd, a deafening sound,\nA sea of colors, a vibrant ground.\nThe pigskin flies, the game is on,\nA spectacle of skill, a glorious dawn.\n\nThe halftime show, a dazzling light,\nA halftime show, a dazzling sight.\nFans unite, in this joyous day,\nSuper Bowl Sunday, a feast for the eye.'