# RunPod

This notebook demonstrates how to use [RunPod](https://www.runpod.io/) with LangChain to run your own LLMs, embeddings models, and more. RunPod provides GPU cloud computing with both dedicated and serverless options, allowing you to deploy inference endpoints for your AI applications.
You'll learn how to:

- Set up a RunPod account and obtain an API key
- Create a serverless endpoint on RunPod
- Connect LangChain to your RunPod endpoints
- Use streaming with LLMs
- Configure different models and parameters

Let's get started!

## Installation

First, we need to install the `runpod` package.

In [None]:
%pip install --upgrade langchain-runpod

We'll also install a few other packages that will be useful for this tutorial:

In [None]:
%pip install -U langchain langchain-core

## Setting Up RunPod
### Getting a RunPod API Key
To use RunPod with LangChain, you'll need a RunPod account and an API key. Here's how to get one:
1. Create an account on RunPod
2. Navigate to your account settings by clicking on your profile in the top-right corner
3. Select "API Keys" from the left sidebar
4. Click "Create API Key", give it a name, and copy the key

Let's set up the API key as an environment variable:

In [None]:
import os

# Set your RunPod API key
os.environ["RUNPOD_API_KEY"] = "your-runpod-api-key"  # Replace with your actual API key

### Creating a RunPod Serverless Endpoint
To use LLMs through RunPod, you need to set up a serverless endpoint. Here's a step-by-step guide:
1. Log in to your RunPod account
2. Navigate to the "Serverless" section from the dashboard
3. Click "New Endpoint"
4. Select a template based on the LLM you want to use
    - For text-based LLMs, good options include templates for:
        - vLLM (recommended for performance)
        - Text Generation WebUI
        - FastChat
5. Choose a GPU type based on your model's requirements
    - For Llama 3 70B, consider at least an A100 or H100
    - For smaller models (7B, 13B), RTX 4090 or A10 GPUs may be sufficient
6. Configure your endpoint settings:
    - Set a minimum and maximum number of workers
    - Set the idle timeout
    - Configure any template-specific settings
7. Deploy your endpoint

Once deployed, note your endpoint ID - you'll need this to connect LangChain to your endpoint.
**Important Tip**: Make sure your endpoint is running a text-based LLM server that accepts standard input/output formats. RunPod has text based templates we recomennd usingfound inthe serverless section of the RunPod dashboard.
## Example

In [None]:
from langchain_runpod import ChatRunPod

# Initialize the ChatRunPod model
# Replace "endpoint-id" with your actual RunPod endpoint ID
chat = ChatRunPod(
    endpoint_id="endpoint-id",  # Your RunPod serverless endpoint ID
    model_name="llama3-70b-chat",  # Optional - helps with identification
    temperature=0.7,  # Control randomness (0.0 to 1.0)
    max_tokens=1024,  # Maximum tokens in the response
    # api_key="your-runpod-api-key",  # Alternative to using environment variable
)

In [None]:
# Basic invocation
response = chat.invoke("Explain how transformer models work in 3 sentences.")
print(response.content)