# Building an Agent From Scratch

This notebook demonstrates how to build an AI agent from the ground up, implementing the core components described in Chapter 2:

* Defining agents in code
* Granting agents access to tools
* Implementing short and long term memory in agents
* Exploring strategies for task termination
* Function calling (tool calling) in LLMs

By the end of this notebook, you will progressively build from an agent that can address simple tasks like "what is the capital of France?" to one that can "act" - such as "plot the stock price of Microsoft over the last 5 years" or "generate a weather report".

![Agent Components](https://github.com/victordibia/multiagent-systems-with-autogen/blob/main/docs/images/agent.png?raw=true)

## Installation

We'll need the OpenAI library for the generative AI model and other basic dependencies.

```bash
pip install openai python-dotenv pydantic
```

In [None]:
# !pip install -U openai python-dotenv pydantic

## Setting Up the Environment

Before we build our agent, we need to set up our generative AI model. We'll use OpenAI's GPT models for this example.

**API Key Setup:**
- Create a new file called `.env` in the folder for this notebook
- Add the following line to the file, replacing `YOUR_OPENAI_API_KEY` with your actual OpenAI API key:

```bash
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
```
- Save the file

You can get your API key from [OpenAI](https://platform.openai.com/signup).

## Defining an Agent in Code

We'll start by defining a basic `Agent` class with core components: name, instructions, model, tools, memory, and message history. The `run` method will serve as the foundation for executing agent logic.

In [10]:
from typing import List, Optional, Callable
from pydantic import BaseModel
from openai import OpenAI
from dotenv import load_dotenv
import os
 

# Load environment variables
load_dotenv(override=True)

class Memory(BaseModel):
    """A class representing the memory of an agent."""
    pass

class Agent:
    def __init__(self, name: str, instructions: str, model: str = "gpt-4.1-nano", tools: List[Callable] | None = None, memory: Memory | None = None) -> None:
        """
        Initialize the agent with its name, instructions, model, tools, and memory.
        """
        self.name = name
        self.model = model
        self.model_client = OpenAI()
        self.instructions = instructions
        self.tools = tools or []
        self.memory = memory 
        self.message_history = []

    def run(self, task: str) -> str:
        """
        Logic for a single turn of the agent's behavior
        """
        # For now, this is just a placeholder
        return f"Agent {self.name} received task: {task}"

# Create a basic agent
agent = Agent(
    name="assistant", 
    instructions="You are a helpful assistant.", 
    model="gpt-4.1-nano"
)

# Test the basic structure
response = agent.run(task="What is the capital of France?")
print(response)

Agent assistant received task: What is the capital of France?


## Agent + Generative AI Model

Now let's extend our agent to actually use a generative AI model for reasoning and generating responses. We'll implement the `_call_model` method that communicates with the LLM.

In [11]:
from typing import List, Callable
from pydantic import BaseModel
from openai import OpenAI



class Agent:
    def __init__(self, name: str, instructions: str, model: str = "gpt-4.1-nano", tools: List[Callable] | None = None, memory: Memory | None = None):
        """
        Initialize the agent with its name, instructions, model, tools, and memory.
        """
        self.name = name
        self.model = model
        self.model_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY", ""))
        self.instructions = instructions
        self.tools = tools or []
        self.memory = memory 
        self.message_history = []

    def _call_model(self, messages: list) -> str:
        """
        Call the model with the conversation history and return the generated response.
        """
        response = self.model_client.responses.create(
            model=self.model,
            input=messages,
            temperature=0.7,
            max_output_tokens=150
        )
        return response.output_text.strip()

    def run(self, task: str) -> str:
        """
        Run the agent with the given task and return the response.
        """
        messages = [
            {"role": "system", "content": self.instructions},
            {"role": "user", "content": task}
        ] + self.message_history 

        response = self._call_model(messages)
        self.message_history.append({"role": "user", "content": task})
        self.message_history.append({"role": "assistant", "content": response})
        return response

# Create and test the improved agent
agent = Agent(
    name="assistant", 
    instructions="You are a helpful assistant.", 
    model="gpt-4.1-nano"
)

# Test with a simple question
response = agent.run(task="What is the capital of France?")
print(f"Agent response: {response}")

Agent response: The capital of France is Paris.


## Agent + Tools

Tools expand the range of tasks that an agent can perform beyond text generation. Let's explore how to give our agent access to tools.

### Understanding Tool Categories

**Task-specific tools** provide capabilities for specific tasks (e.g., weather API, stock prices)
**General-purpose tools** offer broad capabilities (e.g., code interpreters, UI drivers)

### Function Calling in LLMs

Modern LLMs can translate natural language requests into structured function calls. This involves:
1. **Function definition**: Define the function with name, description, and parameters
2. **Register with LLM**: Present function definitions to the model
3. **Generate function call**: LLM creates structured function calls based on user requests

In [12]:
import json
import inspect

# Define some example tools that we'll use with our agent
def get_weather(location: str, date_range: str) -> str:
    """Get the weather for a given location and date range."""
    return f"The weather in {location} from {date_range} is sunny with a high of 75°F."

def get_stock_price(ticker: str, date_range: str) -> str:
    """Get the stock price for a given ticker and date range."""
    return f"The stock price for {ticker} from {date_range} averaged $150."

print("Example tools defined: get_weather, get_stock_price")

Example tools defined: get_weather, get_stock_price


In [None]:
class Agent:
    def __init__(self, name: str, instructions: str, model: str = "gpt-4.1-nano", tools: List[Callable] = None, memory: Memory = None):
        """
        Initialize the agent with its name, instructions, model, tools, and memory.
        """
        self.name = name
        self.model = model
        self.model_client = OpenAI()
        self.instructions = instructions
        self.tools = tools or []
        self.memory = memory 
        self.message_history = []
        
        # Create a mapping of tool names to functions
        self.tool_map = {tool.__name__: tool for tool in self.tools}

    def _convert_tools_to_llm_format(self, tools: List[Callable]) -> List[dict]:
        """
        Convert a list of tool functions to a format suitable for the LLM.
        """
        llm_tools = []
        for tool in tools:
            # Get function signature
            sig = inspect.signature(tool)
            parameters = {
                "type": "object", 
                "properties": {}, 
                "required": [],
                "additionalProperties": False
            }
            
            for param_name, param in sig.parameters.items():
                parameters["properties"][param_name] = {
                    "type": "string", 
                    "description": f"The {param_name} parameter"
                }
                parameters["required"].append(param_name)
            
            tool_info = {
                "type": "function",
                "name": tool.__name__,
                "description": tool.__doc__ or f"Function {tool.__name__}",
                "parameters": parameters,
                "strict": True
            }
            llm_tools.append(tool_info)
        return llm_tools

    def _execute_tool(self, tool_call: dict) -> str:
        """
        Execute the tool call and return the result.
        """
        function_name = tool_call.get("name") or tool_call.get("function", {}).get("name")
        function_args = tool_call.get("arguments") or tool_call.get("function", {}).get("arguments")
        
        # Handle both string and dict argument formats
        if isinstance(function_args, str):
            function_args = json.loads(function_args)
        
        if function_name in self.tool_map:
            function_to_call = self.tool_map[function_name]
            try:
                result = function_to_call(**function_args)
                return result
            except Exception as e:
                return f"Error executing {function_name}: {str(e)}"
        else:
            return f"Unknown function: {function_name}"

    def _call_model(self, messages: list) -> str:
        """
        Call the model with the conversation history and return the generated response.
        """
        # Convert tools to LLM format if tools are available
        llm_tools = self._convert_tools_to_llm_format(self.tools) if self.tools else None
        
        # Prepare request parameters
        request_params = {
            "model": self.model,
            "input": messages,
            "temperature": 0.7,
            "max_output_tokens": 150
        }
        
        # Only add tools if they exist
        if llm_tools:
            request_params["tools"] = llm_tools
        
        response = self.model_client.responses.create(**request_params)
        
        # Process response 
        if response.output:
            first_output = response.output[0]
            
            # Check if this is a message with tool calls
            if hasattr(first_output, 'type') and first_output.type == "message":
                # Check for tool calls in the message content
                for content_item in first_output.content:
                    if hasattr(content_item, 'type') and content_item.type == "tool_call":
                        # Execute the tool call
                        tool_result = self._execute_tool(content_item)
                        
                        # Add tool call and result to conversation
                        messages.append({
                            "role": "assistant",
                            "content": [{"type": "tool_call", "name": content_item.name, "arguments": content_item.arguments}]
                        })
                        messages.append({
                            "role": "tool",
                            "content": tool_result
                        })
                        
                        # Get final response from model
                        final_response = self.model_client.responses.create(**request_params)
                        return final_response.output_text
                
                # If no tool calls, extract text content
                for content_item in first_output.content:
                    if hasattr(content_item, 'type') and content_item.type == "output_text":
                        return content_item.text
        
        # Fallback to output_text property
        return response.output_text

    def run(self, task: str) -> str:
        """
        Run the agent with the given task and return the response.
        """
        messages = [
            {"role": "system", "content": self.instructions},
        ] + self.message_history + [
            {"role": "user", "content": task}
        ]

        response = self._call_model(messages)
        self.message_history.append({"role": "user", "content": task})
        self.message_history.append({"role": "assistant", "content": response})
        return response

In [14]:
# Create agent with tools - now the agent handles tool conversion internally
agent_with_tools = Agent(
    name="assistant", 
    instructions="You are a helpful assistant that can get weather and stock information.", 
    model="gpt-4.1-nano",
    tools=[get_weather, get_stock_price]  # Simply pass the list of tools
)

print(f"Agent created with {len(agent_with_tools.tools)} tools: {[tool.__name__ for tool in agent_with_tools.tools]}")
print(f"Tool mapping: {list(agent_with_tools.tool_map.keys())}")

# Test with weather request
print("\n=== Weather Request ===")
response1 = agent_with_tools.run(task="What is the weather going to be like in San Francisco over the next 4 days?")
print(f"Agent response: {response1}")

print("\n=== Stock Request ===")
response2 = agent_with_tools.run(task="What was the stock price of AAPL last week?")
print(f"Agent response: {response2}")

print("\n=== General Question ===")
response3 = agent_with_tools.run(task="What is the capital of France?")
print(f"Agent response: {response3}")

Agent created with 2 tools: ['get_weather', 'get_stock_price']
Tool mapping: ['get_weather', 'get_stock_price']

=== Weather Request ===


BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for function 'get_weather': In context=(), 'additionalProperties' is required to be supplied and to be false.", 'type': 'invalid_request_error', 'param': 'tools[0].parameters', 'code': 'invalid_function_parameters'}}

## Human Feedback Tools

Agents can request input or clarification from humans when tasks are complex, ambiguous, or require human oversight. This introduces a human-in-the-loop component for enhanced problem-solving.

In [None]:
def human_feedback_tool(message: str) -> str:
    """
    A tool that requests human feedback when the agent needs clarification or approval.
    """
    print(f"\n🤖 Agent needs human input: {message}")
    response = input("👤 Your response: ")
    return response

def send_email(recipient: str, subject: str, body: str) -> str:
    """
    Send an email (simulated - requires human approval for sensitive actions).
    """
    return f"Email draft created. To: {recipient}, Subject: {subject}, Body: {body[:50]}..."

# Create agent with human feedback capability
agent_with_feedback = Agent(
    name="assistant", 
    instructions="You are a helpful assistant. For sensitive actions like sending emails, always ask for human approval using the human_feedback_tool.", 
    model="gpt-4.1-nano",
    tools=[get_weather, send_email, human_feedback_tool]
)

# Example of human-in-the-loop interaction
print("=== Example with Human Feedback ===")
# Note: This would normally prompt for human input in an interactive environment
# For demonstration, we'll show the structure
response = agent_with_feedback.run(task="Can you help me send an email to john@example.com about tomorrow's meeting?")
print(f"Agent response: {response}")

## Memory and Conversation Context

Our agents maintain conversation history in the `message_history` attribute. This enables:

1. **Short-term memory**: Maintaining context within a conversation
2. **Reference to previous interactions**: Agents can refer back to earlier parts of the conversation
3. **Continuous learning**: Agents can build upon previous exchanges

Let's demonstrate how memory works in practice:

In [None]:
# Create a new agent for memory demonstration
memory_agent = Agent(
    name="memory_agent", 
    instructions="You are a helpful assistant with good memory. Remember details from our conversation.", 
    model="gpt-4.1-nano"
)

print("=== Memory Demonstration ===")

# First interaction
response1 = memory_agent.run("My name is Alice and I'm working on a Python project about weather data.")
print(f"Agent: {response1}")

# Second interaction - agent should remember Alice's name and project
response2 = memory_agent.run("What programming language am I using for my project?")
print(f"Agent: {response2}")

# Third interaction - testing deeper memory
response3 = memory_agent.run("What's my name again?")
print(f"Agent: {response3}")

# Show the message history
print(f"\n=== Message History (Length: {len(memory_agent.message_history)}) ===")
for i, msg in enumerate(memory_agent.message_history):
    print(f"{i+1}. {msg['role']}: {msg['content'][:100]}...")

## Summary

In this notebook, we've built an AI agent from scratch, implementing the core concepts from Chapter 2:

### Key Components Implemented:

1. **Basic Agent Class**: Defined with name, instructions, model, tools, and memory
2. **Generative AI Integration**: Added `_call_model` method for LLM communication  
3. **Tool System**: 
   - Internal tool conversion to LLM format
   - Function calling capabilities
   - Task-specific tools (weather, stock prices)
   - Human feedback tools for human-in-the-loop interactions
4. **Memory**: Conversation history for context maintenance
5. **Progressive Complexity**: From simple text responses to tool-enabled actions

### From Simple to Complex:

- ✅ **Simple**: "What is the capital of France?" → Text generation
- ✅ **Tools**: "What's the weather in San Francisco?" → Function calling  
- ✅ **Memory**: Multi-turn conversations with context
- ✅ **Human Feedback**: Requesting approval for sensitive actions

This foundation demonstrates how agents can be built from first principles, providing full control over behavior, tool access, and interaction patterns. The concepts here form the basis for more sophisticated multi-agent systems covered in later chapters.

### Next Steps:
- Explore more complex tool integrations
- Implement different memory strategies  
- Add error handling and robustness
- Scale to multiple collaborating agents