### Middleware

middleware provides a way to more tightly control what happens inside the agent , middleware is useful for following:

* Tracking agent behaviour with logging , analytics and debugging
* Transforming prompts, tool selection, and output formatting
* Adding retries, fallbacks and early termination logic
* Applying rate limits.

In [1]:
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
model =init_chat_model("groq:qwen/qwen3-32b")

### Summarization middleware

Automatically summarizes conversation history when approaching token limits, preserving recent messages while compressing older context. Summarization is useful for the following:

* Long-running conversation that exceed context windows
* Multi-turn dialogues with extensive history
* Applications where preserving full conversation context matters

In [3]:
from langchain.agents import create_agent
from langchain.agents.middleware import SummarizationMiddleware
from langgraph.checkpoint.memory import InMemorySaver
from langchain_core.messages import HumanMessage,SystemMessage

### Messagebased Summarization
agent = create_agent(
    model=model,
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(
            model=model,
            trigger=("messages",10),
            keep=("messages",4) ## keep top 4 messages at top
        )
    ]
)

In [None]:
### Run with thread id
config = {"configurable":{"thread_id":"test-1"}} ## uniquely identify the user

In [7]:
questions = [
    "what is 2+2?",
    "what is 10*5",
    "what is 100/4",
    "what is 100*5",
    "what is 105/5",
    "what is 10%5",
]

In [8]:
for q in questions:
    reponse = agent.invoke({"messages":[HumanMessage(content=q)]},config=config)
    print(f"Messages: {reponse}")
    print(f"Messages: {len(reponse['messages'])}")

Messages: {'messages': [HumanMessage(content='what is 2+2?', additional_kwargs={}, response_metadata={}, id='42c177b6-103d-49d0-8df1-b30e4d896be6'), AIMessage(content='<think>\nOkay, so the user is asking "what is 2+2?" Hmm, that seems pretty straightforward, but let me make sure I\'m not missing anything. Let\'s break it down step by step.\n\nFirst, basic arithmetic. 2 plus 2. In elementary math, addition is one of the first operations taught. So, 2 + 2 would be combining two quantities of two. If I have two apples and add two more apples, that\'s four apples. So the answer should be 4. But wait, maybe there\'s a trick here? Sometimes people ask this question as a joke or to check if someone is paying attention. But in standard mathematics, 2+2 is definitely 4.\n\nIs there any context where 2+2 isn\'t 4? Well, in some non-standard arithmetic systems or in certain bases, maybe. For example, in base 3, the number 2 is still 2, but adding 2+2 would be 11 in base 3, which is 4 in decimal.

### Token Size

In [19]:
from langchain.tools import tool
from langchain_core.messages import HumanMessage

@tool
def search_hotels(city:str)->str:
    """Search hotels - returns long responce to use more tokens."""
    return f"""Hotels in {city}:
    1. Grand Hotel - 5 star ,350rs/night,spa,pool,gym
    2. City Inn - 4 star, 180rs/night,business center
    3. Budget Stay - 3 star. 75rs/night, free wifi"""

agent = create_agent(
    model=model,
    tools=[search_hotels],
    checkpointer=InMemorySaver(),
    middleware=[
        SummarizationMiddleware(
            model=model,
            trigger= ("tokens",550),
            keep=("tokens",200),

        )
    ]
)

config = {"configurable":{"thread_id":"test-2"}}

def count_tokens(messages):
    total_char = sum(len(str(m.content)) for m in messages)
    return total_char//4  ## 4 chars = 1 token

In [20]:
cities = ["paris","london","delhi","dubai","Hyderabad","singapore"]

for city in cities:
    response = agent.invoke(
        {"messages":[HumanMessage(content=f"find hotels in {city}")]},
        config=config
    )

    tokens = count_tokens(response["messages"])
    print(f"{city}: ~{tokens} tokens,{len(response['messages'])} messages")
    print(f"{(response['messages'])}")

paris: ~164 tokens,4 messages
[HumanMessage(content='find hotels in paris', additional_kwargs={}, response_metadata={}, id='08856868-6f63-4099-ab2b-b55279ebeddb'), AIMessage(content='', additional_kwargs={'reasoning_content': 'Okay, the user wants to find hotels in Paris. Let me check the available tools. There\'s a function called search_hotels that takes a city parameter. The city here is Paris. I need to make sure the parameters are correctly formatted. The required field is city, so I\'ll construct the tool call with "Paris" as the city value. I\'ll return the tool call in the specified JSON format within the XML tags.\n', 'tool_calls': [{'id': 'x1n3a1e4x', 'function': {'arguments': '{"city":"Paris"}', 'name': 'search_hotels'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 112, 'prompt_tokens': 156, 'total_tokens': 268, 'completion_time': 0.172750163, 'completion_tokens_details': {'reasoning_tokens': 87}, 'prompt_time': 0.007639139, 'prompt_tokens_d