# Self-improving agent

Traditional AI chatbots operate with static knowledge - they respond based on their training data but don't evolve from conversations. This creates a fundamental limitation: they can't adapt to user preferences, learn from mistakes, or improve their responses over time.

This notebook demonstrates how to build an AI agent that breaks this limitation by implementing a continuous learning loop. The agent doesn't just respond to queries; it actively reflects on its performance, identifies areas for improvement, and incorporates these insights into future interactions. Think of it as creating an AI that learns from experience, similar to how humans refine their communication skills through practice and feedback.

We will build this using LangChain's framework, which provides the necessary tools for managing conversation history, chaining operations, and maintaining context across multiple interactions.

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.memory import ChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# Configure OpenAI API key for AI model access
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

- `RunnableWithMessageHistory` enables conversation persistence.
- `ChatMessageHistory` manages individual chat sessions.

## Helper functions
Our self-improving agent relies on several core capabilities that we will implement as modular helper functions.

### Chat history management
The foundation of any conversational AI is the ability to maintain context across multiple exchanges. Without this, each interaction would be isolated, preventing meaningful dialogue. This function creates or retrieves chat history for a session.

In [2]:
def get_chat_history(store, session_id: str):
    # Check if this session already exists in our store
    if session_id not in store:
        # Create new chat history for first-time sessions
        store[session_id] = ChatMessageHistory()
    # Return the existing or newly created chat history
    return store[session_id]

- This function implements a session management system using a dictionary-based store. Each session is identified by a unique ID, allowing multiple users or conversation threads to maintain separate contexts.
- The `ChatMessageHistory` object automatically handles message formatting and chronological ordering, which is crucial for the language model to understand conversation flow.

### Response generation
This function orchestrates the actual conversation by combining user input with historical context and learned insights to generate contextually appropriate responses.

In [3]:
def generate_response(chain_with_history, human_input: str, session_id: str, insights: str):
    # Invoke the language model chain with all available context
    response = chain_with_history.invoke(
        {
            "input": human_input,  # Current user query
            "insights": insights  # Accumulated learning from past interactions
        },
        config={"configurable": {"session_id": session_id}}  # Session context configuration
    )
    # Extract and return the text content from the response object
    return response.content

- The function utilizes LangChain's runnable interface to process the input through our configured prompt template and language model.
- The `invoke` method handles the complex orchestration of injecting conversation history, current input, and learned insights into the model's context window.
- The configuration parameter ensures the correct session history is retrieved and used.


### Reflection
Reflection is where the "self-improving" aspect begins. The agent analyzes its past interactions to identify patterns, missed opportunities, and potential improvements.

In [4]:
def reflect(llm, store, session_id: str):
    # Define a specialized prompt for reflection analysis
    reflection_prompt = ChatPromptTemplate.from_messages([
        ("system", "Based on the following conversation history, provide insights on how to improve responses:"),
        MessagesPlaceholder(variable_name="history"),
        ("human", "Generate insights for improvement:")
    ])
    # Create a processing chain for reflection
    reflection_chain = reflection_prompt | llm

    # Retrieve the conversation history for analysis
    history = get_chat_history(store, session_id)

    # Generate reflection insights based on conversation patterns
    reflection_response = reflection_chain.invoke({"history": history.messages})
    return reflection_response.content

- This function implements meta-cognition by having the language model analyze its own conversation history.
- The `MessagesPlaceholder` dynamically inserts the entire conversation thread into the prompt, allowing the model to identify patterns in its responses, user reactions, and potential areas for improvement.
- The reflection chain operates as a separate reasoning process focused specifically on performance analysis.
- The result is a freeform text containing insights.

### Learning
The learning function takes insights from reflection and transforms them into actionable knowledge that influences future interactions.

In [5]:
def learn(llm, store, session_id: str, insights: str):
    # Create a focused prompt for distilling insights into actionable learning
    learning_prompt = ChatPromptTemplate.from_messages([
        ("system", "Based on these insights, update the agent's knowledge and behavior:"),
        ("human", "{insights}"),
        ("human", "Summarize the key points to remember:")
    ])
    # Process insights through the learning chain
    learning_chain = learning_prompt | llm
    learned_points = learning_chain.invoke({"insights": insights}).content

    # Persist learned knowledge in the conversation history as system knowledge
    get_chat_history(store, session_id).add_ai_message(f"[SYSTEM] Agent learned: {learned_points}")
    return learned_points

This function implements knowledge consolidation by processing raw insights into concrete, actionable improvements. The learned points are stored directly in the conversation history with a special system tag, ensuring they become part of the agent's persistent context. This creates a growing knowledge base that accumulates across interactions within a session.

## Self-improving agent class
Now we will integrate all our helper functions into a cohesive agent class that orchestrates the complete self-improvement cycle.

In [6]:
class SelfImprovingAgent:
    def __init__(self):
        """Initialize the agent with core components and configurations."""
        # Configure the primary language model with balanced parameters
        self.llm = ChatOpenAI(
            model="gpt-4o-mini-2024-07-18",
            max_tokens=1000,        # Reasonable response length limit
            temperature=0.7         # Balance between creativity and consistency
        )
        # Initialize session storage for maintaining multiple conversation contexts
        self.store = {}
        # Initialize insights accumulator for cross-interaction learning
        self.insights = ""

        # Define the main conversation prompt template
        self.prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a self-improving AI assistant. Learn from your interactions and improve your performance over time."),
            MessagesPlaceholder(variable_name="history"),  # Dynamic history injection
            ("human", "{input}"),  # Current user input
            ("system", "Recent insights for improvement: {insights}")  # Learning context
        ])

        # Create the processing chain linking prompt to language model
        self.chain = self.prompt | self.llm
        # Wrap the chain with history management capabilities
        self.chain_with_history = RunnableWithMessageHistory(
            self.chain,
            lambda session_id: get_chat_history(self.store, session_id),  # History retrieval function
            input_messages_key="input",  # Key for user input in prompt template
            history_messages_key="history"  # Key for history in prompt template
        )

    def respond(self, human_input: str, session_id: str):
        """Generate a response to user input using current knowledge and insights."""
        return generate_response(self.chain_with_history, human_input, session_id, self.insights)

    def reflect(self, session_id: str):
        """Analyze recent interactions to generate improvement insights."""
        # Update internal insights based on conversation analysis
        self.insights = reflect(self.llm, self.store, session_id)
        return self.insights

    def learn(self, session_id: str):
        """Execute the complete learning cycle: reflect on interactions and integrate insights."""
        # First reflect on recent interactions
        self.reflect(session_id)
        # Then integrate insights into persistent knowledge
        return learn(self.llm, self.store, session_id, self.insights)

- The constructor initializes all necessary resources:
  - A language model for interactive use.
  - Session-aware history tracking for managing multiple users.
  - The structured prompt template structure ensures that conversation history, current input, and learned insights are all available to the model during response generation.
  - The `RunnableWithMessageHistory` wrapper automatically manages the complex task of injecting and updating conversation context.
- When `.respond()` is called, the agent generates a message that considers both the ongoing dialogue and what it has "learned" previously.
- The `.reflect()` method allows the agent to critique its own past behavior by asking the LLM to analyze the session history.
- `.learn()` wraps the reflection into an actionable summary that gets reintegrated back into the memory stream—essentially simulating growth.


## Example usage
Let's create an instance of our agent and observe how it evolves through a series of interactions, demonstrating the self-improvement mechanism in action.

In [7]:
# Instantiate our self-improving agent
agent = SelfImprovingAgent()
# Create a unique session identifier for this demonstration
session_id = "user_123"

print("=== Initial Interactions ===")

# Interaction 1 - establishing baseline performance
print("Human: What's the capital of France?")
print("\nAI:", agent.respond("What's the capital of France?", session_id))

print()

# Interaction 2 - testing contextual understanding
print("Human: Can you tell me more about its history?")
print("\nAI:", agent.respond("Can you tell me more about its history?", session_id))

print("\n=== Learning Phase ===")

# Learn and improve - trigger the self-improvement cycle
print("Agent is now reflecting and learning from the conversation...")
learned = agent.learn(session_id)
print("Learned:", learned)

print("\n=== Post-Learning Interactions ===")

# Interaction 3 (potentially improved based on learning)
print("Human: What's a famous landmark in this city?")
print("\nAI:", agent.respond("What's a famous landmark in this city?", session_id))

# Interaction 4 (to demonstrate continued improvement)
print("Human: What's another interesting fact about this city?")
print("\nAI:", agent.respond("What's another interesting fact about this city?", session_id))

=== Initial Interactions ===
Human: What's the capital of France?

AI: The capital of France is Paris.

Human: Can you tell me more about its history?

AI: Certainly! Paris has a rich and complex history that spans over 2,000 years. Here are some key points:

1. **Foundation**: Paris was originally a small Gallic settlement called Lutetia, founded around the 3rd century BC by the Parisii tribe. It was situated on the Île de la Cité, an island in the Seine River.

2. **Roman Period**: The city became an important Roman city during the 1st century AD. It was known as Lutetia and featured typical Roman architecture, including baths, temples, and an amphitheater.

3. **Middle Ages**: After the fall of the Roman Empire, Paris grew in importance, becoming the capital of the Frankish kingdom. The construction of Notre-Dame Cathedral began in the 12th century, symbolizing the city's growing power. By the 13th century, Paris was a major cultural and intellectual center in Europe.

4. **Renaissa

This demonstration sequence illustrates the complete self-improvement cycle in practice.
- The initial interactions establish a conversation context about Paris and its history.
- The learning phase triggers reflection on these interactions, identifying patterns and potential improvements.
- The post-learning interactions should demonstrate enhanced performance, such as more detailed responses, better contextual awareness, or improved conversational flow.

The agent's internal insights accumulator now contains knowledge that will influence all subsequent responses in this session.