This notebook will showcase the capabilities of the Hugging Face **smolagents** library, which provides a lightweitght framework for creating AI agents.

The following code samples will exemplify how to use this package to build agents capable of searching for data, executing code, and interacting with web pages. Finally, we'll review how to combine multiple agents to create more powerful systems.

# Knowing *smolagents*

The package *smolagents* is a framework for building AI agents, providing LLMs with the agency to interact with the real world, like searching or generating images.

These are some of the main advantages of using this package for building your AI Agents:

* **Simplicity.** Has minimal code complexity and abstractions to make the framework easy to understand and use
* **Flexible LLM Support.** The package is capable of working with any LLM through integration with Hugging Face tools and external APIs
* **Code-First Approach.** It has first-class support for Code Agents that write their actions directly in code, removing the need for parsing and simplifying tool calling
* **HF Hub Integrations.** It also counts with seamless integration with the Hugging Face Hub, allowing the use of Gradio Sapces as tools

Another difference with other frameworks is that *smolagents* focuses on tool calls in code instead of writing actions in JSON format. This skips the need to parse the JSON data in order to build code that calls the tools, and executing it directly.

It is better to use this package under the following situations:

* You need a **lightweight and minimal solution**
* You want to **experiment quickly** without complex configurations
* Your **application logic is straightforward**

The package works with **multi-step agents**, where each perform the following:

* One thought
* One tool call and execution

The primary agent of the package is the **CodeAgent**, although it also supports **ToolCallingAgent**, which writes tool calls in JSON, like other frameworks.

It is important to mention that the package defines its tools with the *@tool* decorator, or using the *Tool* class.

The model integration of the package supports flexible connection with multiple LLM models that meet certain criteria. These are some of the predefined classes that allow model connection:

* **TransformersModel.** Implements local transformers pipelines for seamless integration
* **HfApiModel.** Supports serverless inference calls through the *Hugging Face's infrastructure*, or through *third-party inference providers*
* **LiteLLMModel.** Leverages *LiteLLM* for lightweight model interactions
* **OpenAIServerModel.** Connects to any service that offers an OpenAI API interface
* **AzureOpenAIServerModel.** Supports integration with any Azure OpenAI deployment

# Building Agents that Use Code

As mentioned before, the core agent type of *smolagents* is the **Code Agent** that generates Python tool calls to perform actions. This approach reduces the number of required actions, simplifies complex operations, and enables reuse of existing code functions.

The general approach that most of the frameworks follow is to use a JSON format to specify tool names and arguments as strings, which then the system **must parse to determine which tool to execute**. However, there are studies that suggest that **tool-calling LLMs work more effectively with code directly**, being some of the core advantages the following:

* **Composability.** Easily combine and reuse actions
* **Object Management.** Work directly with complex structures like images
* **Generality.** Express any computationally possible task
* **Natural for LLMs.** High-quality code is already present in LLM training data

A *CodeAgent* performs actions through a cycle of steps, with existing variables and knowledge being incorporated into the agent's context, which is kept in an execution log.

1. The system prompt is stored in a **SystemPromptStep**, and the user query is logged in a **TaskStep**
2. The following loop is executed:
    2.1 Method **agent.write_memory_to_message() writes the agent's logs into a list of LLM-readable chat messages
    2.2 These messages are sent to a **Model**, which generates a completion
    2.3 The completion is parsed to extract the action, which should be a code snippet
    2.4 The action is executed
    2.5 The results are logged into memory in an **ActionStep**

At the end of each step, if the agent includes any function calls in **agent.step_callback**, they are executed.

In [3]:
# Install the smolagents package
# pip install smolagents

import numpy as np
import time
import datetime

from huggingface_hub import login
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, tool

In [None]:
# Connect to the HF Serverless Inference API
with open("../hf_token.txt", "r") as f:
    hf_token = f.readline()

login(token = hf_token)

We'll start creating an agent capable of searching the web using DuckDuckGo. We'll use the default model *Qwen/Qwen2.5-Coder-32B-Instruct*

In [None]:
agent = CodeAgent(tools = [DuckDuckGoSearchTool()],
                  model = HfApiModel())
agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")

The next step will be using the *@tool* decorator to define a custom funciton that acts as a tool.

In [None]:
@tool
def suggest_menu(occasion:str) -> str:
    """Suggests a menu based on the occasion.
    Args:
        occasion: the type of occasion for the party."""
    
    if occasion == "casual":
        return "Pizza, snacks, and drinks."
    elif occasion == "formal":
        return "3-course dinner with wine and dessert."
    elif occasion == "superhero":
        return "Buffet with high-energy and healthy food."
    else:
        return "Custom menu for the butler."

In [None]:
agent = CodeAgent(tools = [suggest_menu],
                  model = HfApiModel())

agent.run("Prepare a formal menu for the party.")

*smolagents* specializes in agents that write and execute Python code snippets, offering sandboxed execution for security.

Code execution has strict security measures: imports outside a predefined safe list are blocked by default. However, you can authorize additional imports by passing them as strings in *additional_authorized_imports*.

In [None]:
agent = CodeAgent(tools = [DuckDuckGoSearchTool(), suggest_menu],
                  model = HfApiModel(),
                  additional_authorized_imports = ["datetime"])

agent.run("""Alfred needs to prepare for the party. Here are the tasks:
          1. Prepare the drinks - 30 minutes
          2. Decorate the mansion - 60 minutes
          3. Set up the menu - 45 minutes
          4. Prepare the music and playlist - 45 minutes
          
          If we start right now, at what time will the party be ready?""")

## Pushing the code into Hugging Face Hub

In [None]:
# Pushing the code to the Hub
hf_username = "germanebr"
agent.push_to_hub(f'{hf_username}/AlfredAgent')

In [None]:
# Download an agent from a HF Hub repo
alfred_agent = agent.from_hub(f"{hf_username}/AlfredAgent")
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. the party idea is a 'villain masquerade' theme.")

## Full Alfred Agent

The following code lists a more complete agent prepared for performing multiple tasks apart from the ones mentioned above. Most of the unseen modules will be discussed later.

In [None]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, HfApiModel, Tool, tool, VisitWebpageTool

In [None]:
@tool
def suggest_menu(occasion:str) -> str:
    """Suggests a menu based on the occasion.
    Args:
        occasion: The type of occasion for the party."""
    
    if occasion == "casual":
        return "Pizza, snacks, and drinks."
    elif occasion == "formal":
        return "3-course dinner with wine and dessert."
    elif occasion == "superhero":
        return "Buffet with high-energy and healthy food."
    else:
        return "Custom menu from the butler."

In [None]:
@tool
def catering_serviec_tool(query:str) -> str:
    """This tool returns the highest-rated catering service in Gotham City.
    Args:
        query: A search term for finding catering services."""
    
    # List of catering services and their ratings
    services = {"Gotham Catering Co.": 4.9,
                "Wayne Manor Catering": 4.8,
                "Gotham City Events": 4.7}
    
    # Find the highest rated catering service (simulating search query filtering)
    best_service = max(services,
                       key = services.get)
    return best_service

In [None]:
class SuperherPartyThemeTool(Tool):
    name = "superhero_party_theme_generator"
    
    description = """This tool suggests creative superhero-themed party ideas based on a category.
    It returns a unique party theme idea."""

    inputs = {"category": {"type": "string",
                           "description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham')."}}
    output_type = "string"

    def forward(self, category:str):
        themes = {"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
                  "villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
                  "futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."}
        
        return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")

In [None]:
agent = CodeAgent(tools = [DuckDuckGoSearchTool(),
                           VisitWebpageTool(),
                           suggest_menu,
                           catering_serviec_tool,
                           SuperheroPartyThemeTool()],
                  model = HfApiModel(),
                  max_steps = 10,
                  verbosity_level = 2)

agent.run("Give me best playlist for a party at the Wayne's mansion. The party idea is a 'villain masquerade' theme")

## Inspecting the agent with OpenTelemetry and Langfuse

*smolagents* is capable to use the **OpenTelemetry** standard for instrumenting agent runs, allowing seamless inspection and logging. Apart from that, by using **Langfuse** and the **SmolagentsInstrumentor**, we can track and analyze the agent's behavior.

In [None]:
# Install the dependencies
# pip install opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents

In [None]:
import os
import base64
import json

from smolagents import CodeAgent, HfApiModel

from opentelemetry.sdk.trace import TracerProvider

from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

In [None]:
with open("../langfuse_token.json", "r") as f:
    keys = json.load(f)

LANGFUSE_PUBLIC_KEY = keys["public_key"]
LANGFUSE_SECRET_KEY = keys["secret_key"]
LANGFUSE_AUTH = base64.b64encode(f"{LANGFUSE_PUBLIC_KEY}:{LANGFUSE_SECRET_KEY}".encode()).decode()

# os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://cloud.langfuse.com/api/public/otel" # EU data region
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://us.cloud.langfuse.com/api/public/otel" # US data region
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"

In [None]:
trace_provider = TracerProvider()
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))

SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)

In [None]:
agent = CodeAgent(tools=[], model=HfApiModel())
alfred_agent = agent.from_hub('sergiopaniego/AlfredAgent', trust_remote_code=True)
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")

alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")# Tool Calling Agents


# Tool Calling Agents

Tool Calling Agents follow the same multi-step workflow as Code Agents; the only difference is that Tool Calling Agents **generate JSON objects that specify tool anmes and arguments** instead of running code directly. The system then parses the JSON instructions to execute the appropriate tools.

In [None]:
from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel

Same as with the Code Agent, we can give a tool to the Tool Calling Agent using the same structure, but it will generate now a json output instead of parsed code.

In [None]:
agent = ToolCallingAgent(tools = [DuckDuckGoSearchTool()],
                         model = HfApiModel())
agent.run("Search for the best music recommendations for a party at the Wayne's Mansion.")

# Tools

As mentioned before, the tools used by the agents to perform actions are treated as **functions that an LLM can call within the agent system**. Its syntax is usually conformed by the following elements:

* **Name.** What the tool is called
* **Tool description.** What the tool does
* **Input types and descriptions.** What arguments the tool accepts
* **Output type.** What the tool returns

In the case of the *smolagents* framework, tools can be defined in two ways: with the **@tool decorator** or **creating a Tool subclass**.

## The @tool decorator

This is the recommended approach when working with simple tools. The tool definition needs to cover the following:

* A **clear and descriptive function name** that allows the LLM understand its purpose
* A list of **hints for both inputs and outputs** to ensure proper usage
* A **detailed description that includes the tool's Args**

In [1]:
from smolagents import CodeAgent, HfApiModel, tool

In [2]:
@tool
def catering_service_tool(query:str) -> str:
    """This tool returns the highest-rated catering service in Gotham City.
    Args:
        query: a search term for finding catering services"""
    
    services = {"Gotham Catering Co.": 4.9,
                "Wayne Manor Catering": 4.8,
                "Gotham City Events": 4.7}
    
    best_service = max(services,
                       key = services.get)
    return best_service

In [None]:
agent = CodeAgent(tools = [catering_service_tool],
                  model = HfApiModel())
agent.run("Can you give me the name of the highest-rated catering service in Gotham City?")

## The Tool Subclass

In the case where the tool is more complex, it is better to create a class instead of a single function. The Tool class will wrap the function with metadata that helps the LLM understand how to use it effectively.

The Tool subclass needs to cover the following parameters:

* **name**: The tool's name
* **description**: A description used to populate the agent's system prompt
* **inputs**: A dictionary with keys *type* and *decsription* for every managed input
* **output_type**: Specifies hte expected output type
* **forward**: The method containing the inference logic to execute

In [3]:
from smolagents import Tool, CodeAgent, HfApiModel

In [None]:
class SuperheroPartyThemeTool(Tool):
    name = "superhero_party_theme_generator"
    description = """This tool suggests creative superhero-themed party ideas based on a category.
    It returns a unique party theme idea."""

    inputs = {"category": {"type": "string",
                           "description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham')."}}
    output_type = "string"

    def forward(self, category:str):
        themes = {"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
                  "villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
                  "futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."}
        
        return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")

In [None]:
party_theme_tool = SuperheroPartyThemeTool()
agent = CodeAgent(tools = [party_theme_tool],
                  model = HfApiModel())

agent.run("What would be a good superhero party idea for a 'villain masquerade' theme?")

## Default Toolbox

The *smolagents* framework comes with some pre-built tools that can be injected directly into any agent:

* **PythonInterpreterTool**
* **FinalAnswerTool**
* **UserInputTool**
* **DuckDuckGoSearchTool**
* **GoogleSearchTool**
* **VisitWebpageTool**

## Sharing and Importing Tools

Another great feature from the framework is that it allows to **share custom tools on Hugging Face Hub and import them as well**, even connecting with **HF Spaces** and **LangChain tools**.

This is a sample on how to share a custom tool into the HF Hub:

In [None]:
hf_username = "germanebr"

with open("../hf_token.txt", "r") as f:
    hf_token = f.readline()

party_theme_tool.push_to_hub("{hf_username}/party_theme_tool",
                             token = hf_token)

This is the sample to import a tool from HF Hub

In [4]:
from smolagents import load_tool, CodeAgent, HfApiModel

In [None]:
image_generation_tool = load_tool("m-ric/text-to-image",
                                  trust_remote_code = True)

agent = CodeAgent(tools = [image_generation_tool],
                  model = HfApiModel())
agent.run("Generate an image of a luxurious superhero-themed party at Wayne Manor with made-up superheros.")

## Importing a Hugging Face Space as a Tool

You can also import a HF Space as a tool using *Tool.from_space()*. This opens up possibilities for integrating with thousands of spaces from the community for tasks from image generation to data analysis.

The tool will connect with the spaces Gradio backend using the *gradio_client*, so make sure to install it via pip if you don’t have it already.

In [5]:
from smolagents import CodeAgent, HfApiModel, Tool

In [None]:
image_generation_tool = Tool.from_space("black-forest-labs/FLUX.1-schnell",
                                        name = "image_generator",
                                        description = "Generate an image from a prompt")

model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")

agent = CodeAgent(tools = [image_generation_tool],
                  model = model)

agent.run("Improve this prompt, then generate an image of it.",
          additional_args = {'user_prompt': 'A grand superhero-themed party at Wayne Manor, with Alfred overseeing a luxurious gala'})

## Importing a LangChain Tool

It is also possible to reuse LangChain tools in the *smolagents* workflow using the *Tool.from_langchain()* method.

In [None]:
from langchain.agents import load_tools
from smolagents import CodeAgent, HfApiModel, Tool

In [None]:
search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])

agent = CodeAgent(tools = [search_tool],
                  model = model)

agent.run("Search for luxury entertainment ideas for a superhero-themed event, such as live performances and interactive experiences.")

# Retrieval Agents

Retrieval Augmented Generation (RAG) systems combine the capabilities of data retrieval and generation models to provide context-aware responses. Agentic RAG (Retrieval-Augmented Generation) extends traditional RAG systems by **combining autonomous agents with dynamic knowledge retrieval**.

While traditional RAG systems use an LLM to answer queries based on retrieved data, agentic RAG **enables intelligent control of both retrieval and generation processes**, improving efficiency and accuracy.

Traditional RAG systems face key limitations, such as **relying on a single retrieval step** and focusing on direct semantic similarity with the user’s query, which may overlook relevant information.

Agentic RAG addresses these issues by allowing the agent to autonomously formulate search queries, critique retrieved results, and conduct multiple retrieval steps for a more tailored and comprehensive output.

## Basic Retrieval with DuckDuckGo

This sample uses a search-capability tool for the retrieval of information. In general, the agent will follow these steps when receiving a request:

1. **Analyze the request.** The agent identifies the key elements of the query
2. **Perform retrieval.** The agent leverages DuckDuckGo to search for the most relevant and up-to-date information
3. **Synthesize information.** After gathering the results, the agent processes them iinto a single, cohesive response
4. **Store for future reference.** The agent stores the retrieved information for easy access when planning future, similar requests

In [6]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

In [None]:
search_tool = DuckDuckGoSearchTool()

model = HfApiModel()

agent = CodeAgent(model = model,
                  tools = [search_tool])
agent.run("Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering.")

## Custom Knowledge Base Tool

In the case of more complex and specialized tasks, it's more useful to connect to a custom knowledge base where the agent can use **semantic search** to find the most relevant information based on the user request.

The following example uses a tool that retrieves party planning ideas from a custom knowledge base using a *BM25 retriever* to search the knowledge base and return the top results, and *RecursiveCharacterTextSplitter* to split the documents into smaller chunks for a more efficient search.

In [None]:
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from smolagents import Tool
from langchain_community.retrievers import BM25Retriever
from smolagents import CodeAgent, HfApiModel

In [None]:
class PartyPlanningRetrieverTool(Tool):
    name = "party_planning_retriever"
    description = "Uses semantic search to retrieve relevant party planning ideas for Alfred’s superhero-themed party at Wayne Manor."
    inputs = {"query": {"type": "string",
                        "description": "The query to perform. This should be a query related to party planning or superhero themes."}}
    output_type = "string"

    def __init__(self, docs, **kwargs):
        super().__init__(**kwargs)
        self.retriever = BM25Retriever.from_documents(docs,
                                                      k=5)  # Retrieve the top 5 documents

    def forward(self, query: str) -> str:
        assert isinstance(query, str), "Your search query must be a string"

        docs = self.retriever.invoke(query)
        return "\nRetrieved ideas:\n" + "".join([f"\n\n===== Idea {str(i)} =====\n" + doc.page_content
                                                 for i, doc in enumerate(docs)])

In [None]:
# Simulate a knowledge base about party planning
party_ideas = [
    {"text": "A superhero-themed masquerade ball with luxury decor, including gold accents and velvet curtains.", "source": "Party Ideas 1"},
    {"text": "Hire a professional DJ who can play themed music for superheroes like Batman and Wonder Woman.", "source": "Entertainment Ideas"},
    {"text": "For catering, serve dishes named after superheroes, like 'The Hulk's Green Smoothie' and 'Iron Man's Power Steak.'", "source": "Catering Ideas"},
    {"text": "Decorate with iconic superhero logos and projections of Gotham and other superhero cities around the venue.", "source": "Decoration Ideas"},
    {"text": "Interactive experiences with VR where guests can engage in superhero simulations or compete in themed games.", "source": "Entertainment Ideas"}
]

source_docs = [Document(page_content=doc["text"], metadata={"source": doc["source"]})
               for doc in party_ideas]

In [None]:
# Split the documents into smaller chunks for more efficient search
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500,
                                               chunk_overlap = 50,
                                               add_start_index = True,
                                               strip_whitespace = True,
                                               separators = ["\n\n", "\n", ".", " ", ""])
docs_processed = text_splitter.split_documents(source_docs)

In [None]:
# Create the retriever tool
party_planning_retriever = PartyPlanningRetrieverTool(docs_processed)

# Initialize the agent
agent = CodeAgent(tools=[party_planning_retriever], model=HfApiModel())
agent.run("Find ideas for a luxury superhero-themed party, including entertainment, catering, and decoration options.")

# Multi-Agent Systems

There might be some use cases that are way too complex for a single agent to process due to all the data processing and reaction process it needs to follow.

Multi-agent systems enable **specialized agents to collaborate on complex tasks**, improving modularity, scalability, and robustness. Instead of relying on a single agent, tasks are distributed among agents with distinct capabilities.

In [None]:
# Install the necessary packages
# pip install 'smolagents[litellm]' matplotlib geopandas shapely kaleido

The first step will be to create a tool to get the cargo plane transfer time

In [None]:
import math
from typing import Optional, Tuple

from smolagents import tool

In [None]:
@tool
def calculate_cargo_travel_time(
    origin_coords: Tuple[float, float],
    destination_coords: Tuple[float, float],
    cruising_speed_kmh: Optional[float] = 750.0) -> float:  # Average speed for cargo planes
    """Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.

    Args:
        origin_coords: Tuple of (latitude, longitude) for the starting point
        destination_coords: Tuple of (latitude, longitude) for the destination
        cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)

    Returns:
        float: The estimated travel time in hours

    Example:
        >>> # Chicago (41.8781° N, 87.6298° W) to Sydney (33.8688° S, 151.2093° E)
        >>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))"""

    def to_radians(degrees: float) -> float:
        return degrees * (math.pi / 180)

    # Extract coordinates
    lat1, lon1 = map(to_radians, origin_coords)
    lat2, lon2 = map(to_radians, destination_coords)

    # Earth's radius in kilometers
    EARTH_RADIUS_KM = 6371.0

    # Calculate great-circle distance using the haversine formula
    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = (math.sin(dlat / 2) ** 2
        + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2)
    c = 2 * math.asin(math.sqrt(a))
    distance = EARTH_RADIUS_KM * c

    # Add 10% to account for non-direct routes and air traffic controls
    actual_distance = distance * 1.1

    # Calculate flight time
    # Add 1 hour for takeoff and landing procedures
    flight_time = (actual_distance / cruising_speed_kmh) + 1.0

    # Format the results
    return round(flight_time, 2)

The model provider will be **Together AI**, one of the new inference providers on HF Hub.

The GoogleSearchTool can use either **Serper API** or **DuckDuckGo** to search the web. Just consider that the latter has a rate limit. You'll need an API key to use the Serper service though.

In [None]:
import os
from PIL import Image
from smolagents import CodeAgent, DuckDuckGoSearchTool, GoogleSearchTool, HfApiModel, VisitWebpageTool

In [None]:
model = HfApiModel(model_id = "Qwen/Qwen2.5-Coder-32B-Instruct",
                   provider = "together")

In [None]:
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W), and return them to me as a pandas dataframe.
Also give me some supercar factories with the same cargo plane transfer time."""

In [None]:
agent = CodeAgent(model = model,
                  tools = [DuckDuckGoSearchTool(), VisitWebpageTool(), calculate_cargo_travel_time], #GoogleSearchTool("serper")
                  additional_authorized_imports=["pandas"],
                  max_steps=20)
agent.run(task)

We can use additional planning steps to allow the agent to think ahead and plan its next steps, improving the quality of the response.

In [None]:
agent.planning_interval = 4

detailed_report = agent.run(f"""You're an expert analyst. You make comprehensive reports after visiting many websites.
Don't hesitate to search for many queries at once in a for loop.
For each data point that you find, visit the source url to confirm numbers.

{task}""")

print(detailed_report)

## Splitting the task between two agents

Multi-agent structures allow to separate memories between different sub-tasks, with two great benefits:

* Each agent is more focused on its core task, thus more performant
* Separating memories reduces the count of input tokens at each step, thus reducing latency and cost.

The *web agent* will have plotting capabilities to write its final report.

In [None]:
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct",
                   provider = "together",
                   max_tokens = 8096)

web_agent = CodeAgent(model = model,
                      tools = [DuckDuckGoSearchTool(), #GoogleSearchTool(provider="serper"),
                               VisitWebpageTool(),
                               calculate_cargo_travel_time],
                      name = "web_agent",
                      description = "Browses the web to find information",
                      verbosity_level = 0,
                      max_steps = 10)

The *manager agent* will be in charge of planning the necessary intervals.

In [7]:
from smolagents.utils import encode_image_base64, make_image_url
from smolagents import OpenAIServerModel

In [None]:
def check_reasoning_and_plot(final_answer, agent_memory):
    multimodal_model = OpenAIServerModel("gpt-4o", max_tokens = 8096)
    filepath = "saved_map.png"
    assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
    image = Image.open(filepath)
    
    prompt = (f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
        "Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
        "First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
        "Don't be harsh: if the plot mostly solves the task, it should pass."
        "To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer).")
    
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {"url": make_image_url(encode_image_base64(image))},
                },
            ],
        }
    ]
    
    output = multimodal_model(messages).content
    print("Feedback: ", output)
    if "FAIL" in output:
        raise Exception(output)
    return True

In [None]:
manager_agent = CodeAgent(model = HfApiModel("deepseek-ai/DeepSeek-R1",
                                             provider = "together",
                                             max_tokens = 8096),
                          tools = [calculate_cargo_travel_time],
                          managed_agents = [web_agent],
                          additional_authorized_imports = ["geopandas", "plotly", "shapely", "json", "pandas", "numpy"],
                          planning_interval = 5,
                          verbosity_level = 2,
                          final_answer_checks = [check_reasoning_and_plot],
                          max_steps = 15)

In [None]:
manager_agent.visualize()

In [None]:
manager_agent.run("""Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W).
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!

Here's an example of how to plot and return a map:
import plotly.express as px
df = px.data.carshare()
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
     color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
fig.show()
fig.write_image("saved_image.png")
final_answer(fig)

Never try to process strings using code: when you have a string to read, just print it and you'll see it.""")

# Vision Agents

It has been more common to find use cases that go beyond text processing. For this reason, it's important to provide agents with visual capabilities as well through the use of **Vision-Language Models (VSM)**.

The following samples will use an agent that verifies the identity of people by searching for visual information about their appearance.

## Providing Images at the Start of the Agent's Execution

This approach passes images to the agent **at the start and stores them as *task_images*** alongside the task prompt. The agent then processes the images throughout its execution.

In [None]:
from PIL import Image
import requests
from io import BytesIO

from smolagents import CodeAgent, OpenAIServerModel

In [None]:
image_urls = ["https://upload.wikimedia.org/wikipedia/commons/e/e8/The_Joker_at_Wax_Museum_Plus.jpg", # Joker image
              "https://upload.wikimedia.org/wikipedia/en/9/98/Joker_%28DC_Comics_character%29.jpg"] # Joker image

images = []
for url in image_urls:
    response = requests.get(url)
    image = Image.open(BytesIO(response.content)).convert("RGB")
    images.append(image)

In [None]:
model = OpenAIServerModel(model_id = "gpt-4o")

# Instantiate the agent
agent = CodeAgent(tools = [],
                  model = model,
                  max_steps = 20,
                  verbosity_level = 2)

response = agent.run("""Describe the costume and makeup that the comic character in these photos is wearing and return the description.
                     Tell me if the guest is The Joker or Wonder Woman.""",
                     images = images)

## Providing Images with Dynamic Retrieval

Another implementation can be thorugh **retrieving dynamically images and information from external sources**. In this case, images are added to the agent's memory during execution.

In [8]:
# Install the necessary packages
# pip install "smolagents[all]" helium selenium python-dotenv

In [None]:
@tool
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
    """Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
    Args:
        text: The text to search for
        nth_result: Which occurrence to jump to (default: 1)"""
    
    elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")

    if nth_result > len(elements):
        raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
    
    result = f"Found {len(elements)} matches for '{text}'."
    elem = elements[nth_result - 1]
    driver.execute_script("arguments[0].scrollIntoView(true);", elem)
    result += f"Focused on element {nth_result} of {len(elements)}"
    return result

In [None]:
@tool
def go_back() -> None:
    """Goes back to previous page."""
    
    driver.back()

In [None]:
@tool
def close_popups() -> str:
    """Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners."""
    
    webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()