# Runhouse

The [Runhouse](https://github.com/run-house/runhouse) allows remote compute and data across environments and users. See the [Runhouse docs](https://runhouse-docs.readthedocs-hosted.com/en/latest/).

This example goes over how to use LangChain and [Runhouse](https://github.com/run-house/runhouse) to interact with models hosted on your own GPU, or on-demand GPUs on AWS, GCP, AWS, or Lambda.

**Note**: Code uses `SelfHosted` name instead of the `Runhouse`.

In [19]:
%pip install --upgrade --quiet "runhouse[sky]"

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
anthropic 0.3.11 requires anyio<4,>=3.5.0, but you have anyio 4.3.0 which is incompatible.
langchain 0.1.12 requires langsmith<0.2.0,>=0.1.17, but you have langsmith 0.1.5 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [20]:
import runhouse as rh
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import SelfHostedHuggingFaceLLM, SelfHostedPipeline

In [21]:
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name='sasha-rh-a10x', instance_type='g5.2xlarge', provider='aws')
gpu.up_if_not()
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-a10x')

Output()

INFO | 2024-03-21 11:47:15.875884 | Saving config for sasha-rh-a10x-ssh-secret to Den
INFO | 2024-03-21 11:47:16.033507 | Saving secrets for sasha-rh-a10x-ssh-secret to Vault


Output()

<runhouse.resources.hardware.on_demand_cluster.OnDemandCluster at 0x14d387c70>

In [22]:
model_env = rh.env(
    name="model_env15",
    reqs=["transformers", "torch", "accelerate", "huggingface-hub"],
    secrets=["huggingface"]  # need for downloading google/gemma-2b-it
).to(system=gpu)

INFO | 2024-03-21 11:47:25.613827 | Copying package from file:///Users/sashabelousovrh/PycharmProjects/LangchainIntegration/langchain to: sasha-rh-a10x
INFO | 2024-03-21 11:47:39.625265 | SSH tunnel on to server's port 32300 via server's ssh port 22 already created with the cluster.
INFO | 2024-03-21 11:47:40.174156 | Server sasha-rh-a10x is up.
INFO | 2024-03-21 11:47:40.474580 | Calling huggingface._write_to_file


[36mSecrets already exist in ~/.cache/huggingface/token.
[0m

INFO | 2024-03-21 11:47:41.762476 | Time to call huggingface._write_to_file: 1.29 seconds


Output()

INFO | 2024-03-21 11:47:47.912961 | Calling model_env15.install
INFO | 2024-03-21 11:47:49.379459 | Time to call model_env15.install: 1.47 seconds


In [23]:
gpu.run(commands=["pip install langchain"])



[(0,
  '')]

In [24]:
llm = SelfHostedHuggingFaceLLM(model_id="google/gemma-2b-it", hardware=gpu, env=model_env)

INFO | 2024-03-21 11:48:15.227253 | Calling file_20240321_134741.exists_in_system
INFO | 2024-03-21 11:48:16.520308 | Time to call file_20240321_134741.exists_in_system: 1.29 seconds
INFO | 2024-03-21 11:48:16.523285 | Calling file_20240321_134741.resolved_state
INFO | 2024-03-21 11:48:17.811020 | Time to call file_20240321_134741.resolved_state: 1.29 seconds
INFO | 2024-03-21 11:48:17.821190 | Calling huggingface._write_to_file


[36mSecrets already exist in .cache/huggingface/token.
[0m

INFO | 2024-03-21 11:48:19.110353 | Time to call huggingface._write_to_file: 1.29 seconds


Output()

INFO | 2024-03-21 11:48:25.127882 | Calling model_env15.install
INFO | 2024-03-21 11:48:26.416752 | Time to call model_env15.install: 1.29 seconds


Output()

INFO | 2024-03-21 11:48:31.719930 | Sending module LangchainLLMModelPipeline to sasha-rh-a10x


Output()

Output()

INFO | 2024-03-21 11:48:43.913541 | Calling LangchainLLMModelPipeline._remote_init
INFO | 2024-03-21 11:48:45.209235 | Time to call LangchainLLMModelPipeline._remote_init: 1.3 seconds
INFO | 2024-03-21 11:48:45.215294 | Calling file_20240321_134819.exists_in_system
INFO | 2024-03-21 11:48:46.505129 | Time to call file_20240321_134819.exists_in_system: 1.29 seconds
INFO | 2024-03-21 11:48:46.508051 | Calling file_20240321_134819.resolved_state
INFO | 2024-03-21 11:48:47.795749 | Time to call file_20240321_134819.resolved_state: 1.29 seconds
INFO | 2024-03-21 11:48:47.801950 | Calling LangchainLLMModelPipeline.load_model


[36m
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s][0m[36m
Loading checkpoint shards:  50%|█████     | 1/2 [00:03<00:03,  3.14s/it][0m[36m
Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.39s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.65s/it]
[0m

INFO | 2024-03-21 11:48:53.096467 | Time to call LangchainLLMModelPipeline.load_model: 5.29 seconds


In [25]:
template = """Question: {question}

Answer: Let's think step by step."""

In [26]:
prompt = PromptTemplate.from_template(template)

In [27]:
llm_chain = LLMChain(prompt=prompt, llm=llm)

In [28]:
question = "What is the capital of Germany?"

llm_chain.run(question)

INFO | 2024-03-21 11:49:24.467459 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-21 11:50:40.950078 | Time to call LangchainLLMModelPipeline.interface_fn: 76.48 seconds


'\n\nThe word "Germany" is a country in Europe. The capital of Germany is Berlin.'

You can also execute the prediction function of the model directly:


In [29]:
llm("Write me a short poem about Super Bowl")

INFO | 2024-03-21 11:51:17.487237 | Calling LangchainLLMModelPipeline.interface_fn
INFO | 2024-03-21 11:52:28.955408 | Time to call LangchainLLMModelPipeline.interface_fn: 71.47 seconds


' Sunday.\n\nThe roar of the crowd, a deafening sound,\nA sea of colors, a vibrant ground.\nThe pigskin flies,'