
---

# 🏁⚡ LangServe — Ultimate Cheatsheet

> **Mental model:** LangServe turns **LCEL/LangGraph runnables** into **typed HTTP APIs** (FastAPI) with **/invoke • /batch • /stream**, **Pydantic I/O**, **SSE streaming**, **RemoteRunnable clients**, plus hooks for **auth 🔐**, **observability 👀**, **caching 💾**, and **scaling 📈**.

---

## 0) 🧭 Core Concepts (Orientation)

| 🔧 Concept   | 📝 What it means                                             | 🎯 Why it matters             |
| ------------ | ------------------------------------------------------------ | ----------------------------- |
| **Runnable** | Your chain/agent/graph implementing `.invoke/.batch/.stream` | One uniform server contract   |
| **Route**    | `add_routes(app, runnable, path="/...")`                     | Zero-boilerplate API surface  |
| **Client**   | Python/JS/HTTP caller                                        | Local ↔ remote symmetry       |
| **Modes**    | `invoke` (one) • `batch` (many) • `stream` (tokens)          | Balance latency vs throughput |
| **Schemas**  | Pydantic request/response models                             | Strong contracts & OpenAPI ✅  |

**Interview one-liner:** *“LangServe is the thinnest way to put LCEL/LangGraph behind typed, streaming endpoints with observability and auth.”* ✨

---

## 1) 🛠️ Install & Project Scaffold

| 🧩 Task  | 💻 Command / Tip                                      | 🗒️ Notes             |
| -------- | ----------------------------------------------------- | --------------------- |
| Install  | `pip install langserve fastapi uvicorn pydantic`      | Pin versions for prod |
| Scaffold | `app.py` + `runnables.py` + `settings.py`             | Keep routes modular   |
| Run      | `uvicorn app:app --host 0.0.0.0 --port 8000 --reload` | `--reload` for dev 🔁 |

---

## 2) 🚧 Build Your First Service

### 2.1 🧾 IO Schemas (Pydantic)

```python
# schemas.py
from pydantic import BaseModel, Field

class ChatIn(BaseModel):
    question: str = Field(..., min_length=1)

class ChatOut(BaseModel):
    answer: str
```

### 2.2 🔌 Wrap a Runnable with `add_routes`

```python
# app.py
from fastapi import FastAPI
from langserve import add_routes
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from schemas import ChatIn, ChatOut

app = FastAPI(title="LangServe Demo")

prompt = ChatPromptTemplate.from_template("Answer concisely:\nQ: {question}\nA:")
llm = ChatOpenAI(temperature=0)
chain = prompt | llm  # Runnable

add_routes(
    app,
    chain,
    path="/chat",
    input_schema=ChatIn,      # optional: inferred if omitted
    output_schema=ChatOut     # optional: use parser for strict JSON
)
```

---

## 3) 🛣️ Routes & Endpoints

| 🔗 Endpoint    | 🧭 Method  | 📦 Body (typical)                                     | 🎁 Returns                 |
| -------------- | ---------- | ----------------------------------------------------- | -------------------------- |
| `/chat/invoke` | POST       | `{"input":{"question":"..."}, "config": {...}}`       | Single result              |
| `/chat/batch`  | POST       | `{"inputs":[{"question":"..."}...], "config": {...}}` | List of results            |
| `/chat/stream` | POST (SSE) | Same as `/invoke`                                     | `text/event-stream` tokens |
| Options & CORS | —          | Tags/description/CORS allowlist                       | OpenAPI auto-updates 📜    |

> 💡 **Pro tip:** `config` supports `tags`, `metadata`, and `configurable` (e.g. `{"session_id":"abc"}`) that flow into LCEL/LangGraph.

---

## 4) 📡 Streaming (Server & Clients)

### 4.1 🧨 SSE from `/stream`

* Content-Type: `text/event-stream`
* Emits tokens + final message → perfect for chat UIs 💬

### 4.2 🐍 Python client streaming

```python
from langserve import RemoteRunnable
rr = RemoteRunnable("http://localhost:8000/chat/")
for chunk in rr.stream({"question": "Hello?"}):
    print(chunk, end="", flush=True)
```

### 4.3 🌐 Browser (JS) streaming

```js
const resp = await fetch("http://localhost:8000/chat/stream", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ input: { question: "Hi?" } }),
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  console.log(decoder.decode(value)); // parse SSE frames
}
```

---

## 5) 🤝 Clients & RemoteRunnable

| 🧰 Client      | 🏃 How to call                                 | 🗒️ Notes                  |
| -------------- | ---------------------------------------------- | -------------------------- |
| **Python**     | `RemoteRunnable(url).invoke/stream/batch(...)` | Mirrors local Runnable API |
| **JavaScript** | `fetch` or generated SDK from OpenAPI          | Prefer SSE for `/stream`   |
| **OpenAPI**    | `GET /openapi.json`                            | Generate TS/Java SDKs 🧪   |

---

## 6) 🔐 Auth & Security

| 🚧 Concern    | 🧭 Pattern                                          | 🔎 Snippet          |
| ------------- | --------------------------------------------------- | ------------------- |
| API key       | FastAPI dependency, read header (e.g., `x-api-key`) | See below           |
| CORS          | Allow specific origins                              | Add CORS middleware |
| Rate limiting | NGINX/Cloud or middleware                           | Per-IP / per-key    |
| Secrets       | Env/vault; never log                                | Pydantic Settings   |

**🔑 API-Key example**

```python
from fastapi import Depends, HTTPException, Header
def require_key(x_api_key: str | None = Header(default=None)):
    if x_api_key != "supersecret":
        raise HTTPException(status_code=401, detail="Invalid API key")
add_routes(app, chain, path="/chat", dependencies=[Depends(require_key)])
```

**🌍 CORS**

```python
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],
    allow_methods=["*"],
    allow_headers=["*"],
)
```

---

## 7) 👀 Observability (LangSmith & OTel)

| 📌 Area           | 🛠️ How                                   | 💡 Tip                          |
| ----------------- | ----------------------------------------- | ------------------------------- |
| LangSmith tracing | `LANGSMITH_API_KEY` + tags/metadata       | tag `env`, `release`, `feature` |
| Logs & metrics    | OpenTelemetry for FastAPI/uvicorn         | Merge with infra dashboards     |
| Per-route KPIs    | Tag by route/node; watch p50/p95 & tokens | Drive SLOs by path ✅            |

---

## 8) 🧪 Testing

| 🧪 Type         | 🔧 How                                    | 🗒️ Notes                      |
| --------------- | ----------------------------------------- | ------------------------------ |
| OpenAPI & SDK   | Validate `/openapi.json`; generate TS SDK | Contract tests at boundary     |
| Unit & contract | `pytest` + `TestClient(app)`              | Golden outputs & schema checks |
| Load tests      | k6/Locust/Gatling vs `/batch` & `/stream` | Watch p95, CPU, memory         |

**Contract test**

```python
from fastapi.testclient import TestClient
from app import app

def test_invoke_contract():
    c = TestClient(app)
    r = c.post("/chat/invoke", json={"input":{"question":"ping"}})
    assert r.status_code == 200
    body = r.json()
    assert "output" in body and "answer" in body["output"]
```

---

## 9) 🚀 Performance & Scaling

| ⚙️ Lever         | 🧭 What to do                                  | 📝 Notes                    |
| ---------------- | ---------------------------------------------- | --------------------------- |
| Async everywhere | Non-blocking I/O in nodes                      | Offload blocking ops        |
| Concurrency      | Uvicorn/Gunicorn workers, thread/process pools | Size by CPU & model latency |
| Backpressure     | Queue & drop policies                          | Protect upstream LLMs       |
| Timeouts         | Client & server per-route                      | Avoid resource leaks        |
| Load balancing   | NGINX/LB, sticky if in-mem state               | Health probes on `/docs` ❤️ |

---

## 10) 🧠 Caching & State

| 🧩 Topic             | 🔧 How                               | 🗒️ Notes                |
| -------------------- | ------------------------------------ | ------------------------ |
| LLM cache            | LangChain Cache (in-mem/Redis)       | Cache by prompt+params   |
| Tool/retriever cache | Memoize deterministic calls          | TTL & invalidation hooks |
| Session memory       | Pass `session_id` via `configurable` | Per-user chat state      |
| Idempotency          | Request IDs for dedup                | Record in traces         |

**Session config (client)**

```python
config = {"configurable": {"session_id": "user-123"}}
RemoteRunnable("http://.../chat/").invoke({"question": "..."}, config=config)
```

---

## 11) 🛡️ Error Handling & Reliability

| 🚨 Concern           | 🧭 Pattern                                                | 💡 Tip                   |
| -------------------- | --------------------------------------------------------- | ------------------------ |
| Exceptions → 4xx/5xx | Raise `HTTPException`; map known errors                   | Clear error contracts    |
| Retries & backoff    | Client retries idempotent calls; server wraps fragile I/O | Tag retry reason         |
| Validation errors    | Pydantic → HTTP 422                                       | Keep schemas strict      |
| Circuit breakers     | Fail fast on dependency errors                            | Consider `Retry-After` ⏳ |

---

## 12) 🌐 Deployment

| ☁️ Target            | 🔧 How                                      | 🗒️ Notes           |
| -------------------- | ------------------------------------------- | ------------------- |
| Containers           | Multi-stage Dockerfile, slim base, non-root | Healthcheck `/docs` |
| Cloud run/serverless | Container; set concurrency/timeouts         | Mind cold starts ❄️ |
| Reverse proxy & TLS  | NGINX/Traefik + SSL termination             | gzip/br compression |
| Env config           | Pydantic Settings + `.env`/vault            | Immutable on boot   |

---

## 13) 🧩 Advanced

| 🪄 Topic               | 🔧 How                                     | 🗒️ Notes                |
| ---------------------- | ------------------------------------------ | ------------------------ |
| Multimodal I/O         | Pydantic `bytes`/base64/URL fields         | Validate size & MIME     |
| Serve LangGraph agents | Wrap `graph_app` via `add_routes`          | Expose `invoke/stream`   |
| Tools & files          | `multipart/form-data` → parse to tool args | Handle temp files safely |

**Multipart upload (concept)**

```python
from fastapi import File, UploadFile

@app.post("/ingest")
async def ingest(file: UploadFile = File(...)):
    data = await file.read()
    return {"ok": True, "bytes": len(data)}
```

---

## 14) 🖥️ Frontend Integration

| 🎯 Need        | 🧭 Pattern           | 💡 Tip                           |
| -------------- | -------------------- | -------------------------------- |
| Minimal web UI | `fetch` `/invoke`    | Debounce inputs                  |
| Live tokens    | `/stream` (SSE)      | `EventSource` / `ReadableStream` |
| CORS           | Allow your FE origin | Lock down in prod 🔒             |

**Browser fetch**

```js
const r = await fetch("/chat/invoke", {
  method: "POST",
  headers: { "Content-Type": "application/json", "x-api-key": "supersecret" },
  body: JSON.stringify({ input: { question: "Hello" } }),
});
const { output } = await r.json();
```

---

## 15) ⚡ “Answer-in-a-Minute” Boilerplates

**A) Minimal app**

```python
from fastapi import FastAPI
from langserve import add_routes
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

app = FastAPI()
chain = PromptTemplate.from_template("Q:{q}\nA:") | ChatOpenAI(temperature=0)
add_routes(app, chain, path="/qa")
```

**B) Protected streaming route**

```python
from fastapi import Depends, Header, HTTPException
def require_key(x_api_key: str | None = Header(default=None)):
    if x_api_key != "k": raise HTTPException(401, "bad key")
add_routes(app, chain, path="/qa", dependencies=[Depends(require_key)])
# POST /qa/stream with {"input":{"q":"..."}}
```

**C) Remote client (Python)**

```python
from langserve import RemoteRunnable
rr = RemoteRunnable("https://api.example.com/qa/")
ans = rr.invoke({"q": "What is LangServe?"})
```

**D) OpenAPI → SDK (concept)**

* Download `/openapi.json` → generate TypeScript SDK → call `invoke/batch/stream`.

---

## 16) ✅ Checklists

**Production Readiness**

* 🔒 Auth (key) • 🌐 CORS allowlist • ⏱️ Timeouts • 🧯 Retries/backoff
* 📈 Tracing/metrics on • 💾 Cache strategy • 🧵 Concurrency sized
* 🚦 Health probes • 🧪 Contract tests • 🔁 Rollback plan

**Performance**

* ⚡ Async everywhere • 🧰 Worker counts tuned • 🧱 Backpressure at ingress
* 🧮 Batch where possible • 🧊 Cache hot prompts • 📊 p95 & cost dashboards

**Safety**

* 🧰 Strict schemas • 📜 Error contracts • 🕵️ PII redaction • 🔐 Secret hygiene

---

## 17) 🚫 Common Pitfalls → ✅ Fix

| ⚠️ Pitfall                    | ✅ Fix                                  |
| ----------------------------- | -------------------------------------- |
| Free-shape JSON inputs        | Always use Pydantic schemas            |
| Blocking calls in async nodes | Offload to thread/process pool         |
| No streaming for chat         | Use `/stream` SSE → better TTFB/UX     |
| CORS failures in browser      | Configure `allow_origins` correctly    |
| Secret leakage in logs        | Redact + disable debug logs in prod    |
| Unbounded concurrency         | Set worker limits + queue/backpressure |

---

## 18) 🎤 Quick Talking Points

* *“Any Runnable becomes `/invoke`, `/batch`, `/stream` with strict Pydantic I/O.”*
* *“**RemoteRunnable** mirrors local APIs—switch local ↔ remote with a URL.”*
* *“Auth + schemas + rate limits at the edge; SSE streaming for buttery UX.”*
* *“LangSmith/OTel tracing + caching + contract tests = safe, scalable prod.”*

---
