## FastAPI Integration with LangChain


**FastAPI** is a high-performance Python web framework used to expose **LLM pipelines as production APIs**.
When combined with **LangChain**, it becomes the interface layer that:

* Receives user requests
* Runs LangChain chains / agents / RAG pipelines
* Returns structured responses

**Architecture Role**

```
Client → FastAPI → LangChain Pipeline → LLM / Vector DB / Tools → FastAPI → Client
```

---

### Basic Project Structure

```
app/
 ├─ main.py
 ├─ chains.py
 ├─ models.py
 └─ services.py
```

---

### Create a LangChain Pipeline

#### Build Chain (chains.py)

```python
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

llm = ChatOpenAI(model="gpt-4")

prompt = PromptTemplate(
    input_variables=["question"],
    template="Answer clearly: {question}"
)

qa_chain = LLMChain(llm=llm, prompt=prompt)
```

---

### Define Request / Response Models

#### Data Models (models.py)

```python
from pydantic import BaseModel

class QueryRequest(BaseModel):
    question: str

class QueryResponse(BaseModel):
    answer: str
```

---

### Build FastAPI Service Layer

#### Service Wrapper (services.py)

```python
from app.chains import qa_chain

def run_chain(question: str) -> str:
    result = qa_chain.invoke({"question": question})
    return result["text"]
```

---

### Create FastAPI Application

#### API Server (main.py)

```python
from fastapi import FastAPI
from app.models import QueryRequest, QueryResponse
from app.services import run_chain

app = FastAPI()

@app.post("/ask", response_model=QueryResponse)
def ask_question(request: QueryRequest):
    answer = run_chain(request.question)
    return {"answer": answer}
```

---

### Run the Server

```bash
uvicorn app.main:app --reload
```

---

### Call the API

```bash
curl -X POST http://127.0.0.1:8000/ask \
     -H "Content-Type: application/json" \
     -d '{"question": "Explain black holes"}'
```

**Response**

```json
{
  "answer": "A black hole is..."
}
```

---

### Adding RAG (Retrieval-Augmented Generation)

#### RAG Chain Example

```python
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS

retriever = FAISS.load_local("db").as_retriever()

rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
```

#### Use in FastAPI

```python
def run_chain(question):
    return rag_chain.invoke({"query": question})["result"]
```

---

### Async Support (High Throughput)

```python
@app.post("/ask")
async def ask_question(request: QueryRequest):
    answer = await qa_chain.ainvoke({"question": request.question})
    return {"answer": answer["text"]}
```

---

### Production Enhancements

| Feature           | Purpose             |
| ----------------- | ------------------- |
| Rate limiting     | Protect API         |
| Caching           | Reduce LLM cost     |
| Input validation  | Security            |
| Output validation | Safety              |
| Tracing           | Debugging           |
| Streaming         | Real-time responses |

---

### Mental Model

```
FastAPI = Network Interface
LangChain = Intelligence Engine
```

---

### Key Takeaways

* FastAPI exposes LangChain pipelines as scalable web services
* Clear separation of concerns: API → Service → Chain
* Supports sync, async, streaming, and RAG pipelines
* Industry standard architecture for LLM applications
