# Middleware

[Middleware](https://docs.langchain.com/oss/python/langchain/middleware/overview) is the most impressive feature in this update. Many new features are implemented through middleware, such as Human-in-the-loop (HITL), dynamic system prompts, dynamic context injection, and more. Middleware is a hook function. By embedding middleware in workflows, efficient extension and customization of workflows can be achieved.

LangChain creates **custom middleware** through [decorators](https://reference.langchain.com/python/langchain/middleware/#decorators).

```{dropdown} Decorator Types (Click to Expand)

  | DECORATOR | DESCRIPTION |
  | -- | -- |
  | `@before_agent` | Execute logic before Agent execution |
  | `@after_agent` | Execute logic after Agent execution |
  | `@before_model` | Execute logic before each model call |
  | `@after_model` | Execute logic after each model receives a response |
  | `@wrap_model_call` | Control the model's calling process |
  | `@wrap_tool_call` | Control the tool's calling process |
  | `@dynamic_prompt` | Dynamically generate system prompts |
  | `@hook_config` | Configure hook behavior |

```

**Decorator Type** determines the execution position of the middleware. For example, using the `@before_model` decorator allows specific logic to be executed before the model call. The **decorated function** is responsible for the specific implementation of this logic. This might sound a bit abstract. Don't worry, this section provides four examples. After reading them, you will definitely understand how to use middleware:

- Budget Control
- Message Truncation
- Sensitive Word Filtering
- PII Detection (Personally Identifiable Information Detection)

## 1. Budget Control

As the number of conversation rounds increases, the conversation history becomes longer, leading to higher request costs. To control the budget, you can set up automatic switching to a lower-cost model when the conversation rounds exceed a certain threshold. Below we implement this feature using custom middleware.

In [1]:
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from langchain.messages import HumanMessage
from langgraph.graph import MessagesState

# Load model configuration
_ = load_dotenv()

# Low-cost model
basic_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-coder-plus",
)

# High-cost model
advanced_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-max",
)

Since our modifications involve model inference, `@before_model` and `@after_model` are not sufficient here. We choose the [`@wrap_model_call`](https://reference.langchain.com/python/langchain/middleware/#langchain.agents.middleware.wrap_model_call) decorator that can interfere with model calls. The specific logic is implemented by the function `dynamic_model_selection`: when the conversation history exceeds 5 messages, automatically switch to the lower-cost model.

In [2]:
@wrap_model_call
def dynamic_model_selection(request: ModelRequest, handler) -> ModelResponse:
    """Choose model based on conversation complexity."""
    message_count = len(request.state["messages"])

    if message_count > 5:
        # Use a basic model for longer conversations
        model = basic_model
    else:
        model = advanced_model

    print(f"message_count: {message_count}")
    print(f"model_name: {model.model_name}")

    return handler(request.override(model=model))

agent = create_agent(
    model=advanced_model,  # Default model
    middleware=[dynamic_model_selection]
)

From the example below, we can see that when the message count `message_count` exceeds 5, it indeed switches from the high-cost model `qwen3-max` to the low-cost model `qwen3-coder-plus`. We have successfully implemented the budget control feature!

In [3]:
state: MessagesState = {"messages": []}
items = ['car', 'airplane', 'motorcycle', 'bicycle']
for idx, i in enumerate(items):
    print(f"\n=== Round {idx+1} ===")
    state["messages"] += [HumanMessage(content=f"{i}How many wheels does it have? Answer briefly.")]
    result = agent.invoke(state)
    state["messages"] = result["messages"]
    print(f'content: {result["messages"][-1].content}')


=== Round 1 ===
message_count: 1
model_name: qwen3-max
content: A car typically has 4 wheels.

=== Round 2 ===
message_count: 3
model_name: qwen3-max
content: Most airplanes have 3 wheels (two main wheels and one nose wheel), but larger aircraft can have more.

=== Round 3 ===
message_count: 5
model_name: qwen3-max
content: A motorcycle has 2 wheels.

=== Round 4 ===
message_count: 7
model_name: qwen3-coder-plus
content: A bicycle has 2 wheels.


## 2. Message Truncation

LLMs have context length limits. Once exceeded, the context needs to be compressed. Among the many processing solutions, message truncation is the simplest. Below we implement message truncation functionality through the `@before_model` decorator.

In [4]:
from langchain.messages import RemoveMessage
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.checkpoint.memory import InMemorySaver
from langchain.agents import create_agent, AgentState
from langchain.agents.middleware import before_model
from langgraph.runtime import Runtime
from langchain_core.runnables import RunnableConfig
from typing import Any

We try a truncation strategy: while keeping recent messages, additionally keep the first message. In the example below, since we told the agent "I am bob" in the first message, it remembers that I am bob.

In [5]:
@before_model
def trim_messages(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Keep only the last few messages to fit context window."""
    messages = state["messages"]

    if len(messages) <= 3:
        return None  # No changes needed

    first_msg = messages[0]
    recent_messages = messages[-3:] if len(messages) % 2 == 0 else messages[-4:]
    new_messages = [first_msg] + recent_messages

    return {
        "messages": [
            RemoveMessage(id=REMOVE_ALL_MESSAGES),
            *new_messages
        ]
    }

agent = create_agent(
    basic_model,
    middleware=[trim_messages],
    checkpointer=InMemorySaver(),
)

config: RunnableConfig = {"configurable": {"thread_id": "1"}}

def agent_invoke(agent):
    agent.invoke({"messages": "hi, my name is bob"}, config)
    agent.invoke({"messages": "write a short poem about cats"}, config)
    agent.invoke({"messages": "now do the same but for dogs"}, config)
    final_response = agent.invoke({"messages": "what's my name?"}, config)
    
    final_response["messages"][-1].pretty_print()

agent_invoke(agent)


Your name is Bob! You told me "hi, my name is bob" at the beginning of our conversation.


Of course, this performance is not enough to prove that the truncation middleware really works. If this middleware never took effect, the result would be the same. To prove it really works, we modify the truncation strategy again. This time, only keep the last two conversation records. If the agent doesn't remember that I am bob, it means the truncation middleware is indeed working.

In [6]:
@before_model
def trim_without_first_message(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Keep only the last few messages to fit context window."""
    messages = state["messages"]

    return {
        "messages": [
            RemoveMessage(id=REMOVE_ALL_MESSAGES),
            *messages[-2:]
        ]
    }

agent = create_agent(
    basic_model,
    middleware=[trim_without_first_message],
    checkpointer=InMemorySaver(),
)

agent_invoke(agent)


I don't have access to your personal information, so I don't know your name. This is because privacy is important, and I'm designed not to store or access any personal data about users.

If you'd like to tell me your name, feel free to share it! But please remember that if you're using a shared device or account, you might want to be cautious about sharing personal information. 

Is there something specific I can help you with today?


Now the agent doesn't remember who I am, which means the middleware is indeed working!

## 3. Sensitive Word Filtering

**Guardrails** is a general term for content security capabilities provided by agents. Large models themselves have certain content risk control capabilities, but they are easily bypassed. Search for "jailbreaking LLM" to find such tutorials. Agents can provide additional security protection outside the model. This is achieved through engineering-based mandatory checks.

In LangGraph, guardrails can be easily implemented through middleware. Below we implement a simple guardrail: if the user's latest message contains certain sensitive words, the agent will refuse to answer.

In [7]:
from typing import Any

from langchain.agents.middleware import before_agent, AgentState
from langgraph.runtime import Runtime

banned_keywords = ["hack", "exploit", "malware"]

@before_agent(can_jump_to=["end"])
def content_filter(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()

    # Check for banned keywords
    for keyword in banned_keywords:
        if keyword in content:
            # Block execution before any processing
            return {
                "messages": [{
                    "role": "assistant",
                    "content": "I cannot process requests containing inappropriate content. Please rephrase your request."
                }],
                "jump_to": "end"
            }

    return None

agent = create_agent(
    model=basic_model,
    middleware=[content_filter],
)

# This request will be blocked before any processing
result = agent.invoke({
    "messages": [{"role": "user", "content": "How do I hack into a database?"}]
})

In [8]:
for message in result["messages"]:
    message.pretty_print()


How do I hack into a database?

I cannot process requests containing inappropriate content. Please rephrase your request.


## 4. PII Detection

Next, we continue to write guardrails. [PII](https://docs.langchain.com/oss/python/langchain/guardrails#pii-detection) (Personally Identifiable Information) detection can discover personal privacy information such as emails, IPs, addresses, and bank cards in user input and take appropriate action.

The following example comes from real life. We often copy error messages to LLMs to help debug. But error messages may contain personal privacy information. For this situation, we use the following two methods to handle it:

1. Refuse to answer the question
2. Mask the privacy information

In [9]:
from textwrap import dedent
from pydantic import BaseModel, Field

# Trusted model, usually a local model; for convenience, we still use qwen here
trusted_model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url=os.getenv("DASHSCOPE_BASE_URL"),
    model="qwen3-coder-plus",
)

# Used to format agent output; returns True if sensitive info is found, False otherwise
class PiiCheck(BaseModel):
    """Structured output indicating whether text contains PII."""
    is_pii: bool = Field(description="Whether the text contains PII")

def message_with_pii(pii_middleware):
    agent = create_agent(
        model=basic_model,
        middleware=[pii_middleware],
    )

    # This request will be blocked before any processing
    result = agent.invoke({
        "messages": [{
            "role": "user",
            "content": dedent(
                """
                File "/home/luochang/proj/agent.py", line 53, in my_agent
                    agent = create_react_agent(
                ---
                Where is the error location?
                """).strip()
        }]
    })

    return result

**Handling Method 1**: If privacy information is detected, refuse to respond.

In [10]:
@before_agent(can_jump_to=["end"])
def content_blocker(state: AgentState,  runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()
    prompt = (
        "You are a privacy protection assistant. Please identify personally identifiable information (PII) in the following text, "
        "such as: name, ID number, passport number, phone number, email, address, bank card number, social media account, license plate, etc. "
        "Note that if code or file paths contain usernames, they should also be considered sensitive information. "
        "If sensitive information is found, return {\"is_pii\": True}, otherwise return {\"is_pii\": False}. "
        "Please strictly return in JSON format and only output the JSON. The text is:\n\n" + content
    )

    pii_agent = trusted_model.with_structured_output(PiiCheck)
    result = pii_agent.invoke(prompt)

    if result.is_pii is True:
        # Block execution before any processing
        return {
            "messages": [{
                "role": "assistant",
                "content": "I cannot process requests containing inappropriate content. Please rephrase your request."
            }],
            "jump_to": "end"
        }
    else:
        print("No PII found")

    return None

In [11]:
result = message_with_pii(pii_middleware=content_blocker)

for message in result["messages"]:
    message.pretty_print()


File "/home/luochang/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?

I cannot process requests containing inappropriate content. Please rephrase your request.


**Handling Method 2**: If sensitive information is detected, use a series of `*****` to mask the privacy information.

In [12]:
@before_agent(can_jump_to=["end"])
def content_filter(state: AgentState,  runtime: Runtime) -> dict[str, Any] | None:
    """Deterministic guardrail: Block requests containing banned keywords."""
    # Get the first user message
    if not state["messages"]:
        return None

    last_message = state["messages"][-1]
    if last_message.type != "human":
        return None

    content = last_message.content.lower()
    prompt = (
        "You are a privacy protection assistant. Please identify personally identifiable information (PII) in the following text, "
        "such as: name, ID number, passport number, phone number, email, address, bank card number, social media account, license plate, etc. "
        "Note that if code or file paths contain usernames, they should also be considered sensitive information. "
        "If sensitive information is found, return {\"is_pii\": True}, otherwise return {\"is_pii\": False}. "
        "Please strictly return in JSON format and only output the JSON. The text is:\n\n" + content
    )

    pii_agent = trusted_model.with_structured_output(PiiCheck)
    result = pii_agent.invoke(prompt)

    if result.is_pii is True:
        mask_prompt = (
            "You are a privacy protection assistant. Please replace all personally identifiable information (PII) in the following text with asterisks (*). "
            "Only replace sensitive fragments, keep other text unchanged. "
            "Only output the processed text, no explanations or additional content. The text is:\n\n" + last_message.content
        )
        masked_message = basic_model.invoke(mask_prompt)
        return {
            "messages": [{
                "role": "assistant",
                "content": masked_message.content
            }]
        }
    else:
        print("No PII found")

    return None

In [13]:
result = message_with_pii(pii_middleware=content_filter)

for message in result["messages"]:
    message.pretty_print()


File "/home/luochang/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?

File "/home/******/proj/agent.py", line 53, in my_agent
    agent = create_react_agent(
---
Where is the error location?

The error is occurring at **line 53** in the file `/home/luochang/proj/agent.py`, specifically within the `my_agent` function where the `create_react_agent()` function is being called.

However, the traceback you've shown only shows the beginning of the error - it's incomplete. To see the exact error message and understand what's going wrong, you need to look at the rest of the traceback that comes after this line.

The complete error traceback would typically show:
- The specific error type (like `TypeError`, `ValueError`, `ImportError`, etc.)
- The detailed error message
- The full stack trace showing all function calls leading to the error

To get the complete error information, you should:

1. **Check the full console output** - the actual