In [None]:
!pip install -qU transformers smolagents

# Introduction to Agents

### Agents
An **Agent** is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.

Agents exist on a continuous spectrum of increasing agency:

| Agency Level | Description | What that's called | Example Pattern |
| ------------ | ----------- | ------------------ | --------------- |
| ☆☆☆ | Agent output has no impact on program flow | Simple processor | `process_llm_output(llm_response)` |
| ★☆☆ | Agent output determines basic control flow | Router | `if llm_decision(): path_a() else: path_b()` |
| ★★☆ |	Agent output determines function execution | Tool caller | `run_function(llm_chosen_tool, llm_chosen_args)` |
| ★★★ |	Agent output controls iteration and program continuation | Multi-step Agent | `while llm_should_continue(): execute_next_step()` |
| ★★★ |	One agentic workflow can start another agentic workflow | Multi-Agent | `if llm_trigger(): execute_agent()` |

### LLMs
An **LLM** is a type of AI model that excels at **understanding and generating human language**. They are trained on vast amounts of text data, allowing them to learn patterns, structure, and even nuance in language.

Most LLMs are **built on the Transformer architecture**, and there are three types of transformers:
1. **Encoders** - an encoder-based Transformer taskes text (or other data) as input and outputs a dense representation (or embedding) of that text.
    - Example: BERT from Google
    - Use cases: text classification, semantic search, named entity recognition(NER)
    - Typical size: Millions of parameters
2. **Decoders** - a decoder-based Transformer focuses on **generating new tokens to complete a squence, one token at a time**.
    - Example: Llama from Meta
    - Use case: text generation, chatbots, code generation
    - Typical size: Billions of parameters
3. **Seq2Seq (Encoder-Decoder)** - a sequence-to-sequence Transformer *combines* an encoder and a decoder. The encoder first processes the input sequence into a context representation, and then the decoder generates an output sequence.
    - Example: T5, BART
    - Use case: translation, summarization, paraphrasing
    - Typical size: Millions of parameters



The underlying principle of LLMs is to predict the next token, given a sequence of previous token. A “token” is the unit of information an LLM works with.

Each LLM has some *special tokens** specific to the model. The LLMs use these tokens to open and close the structured components of its generation. For example, the **End of Sequence** token (EOS):

| Model | Provider | EOS Token | Functionality |
| ----- | -------- | --------- | ------------- |
| GPT4 | OpenAI | `<\|endoftext\|>` | End of message text |
| Llama 3 | Meta | `<\|eot_id\|>` | End of sequence |
| DeepSeek-R1 | DeepSeek | `<\|end_of_sentence\|>` | End of message text |
| SmolLM2 | HuggingFace | `<\|im_end\|>` | End of instruction or message |
| Gemma | Google | `<end_of_turn>` | ENd of conversation turn |

LLMs are **autoregressive**, meaning that the output from one pass becomes the input for the next one. This loop continues until the model predicts the next token to be the EOS token, at which point the model can stop. In other words, an LLM will decode text until it reaches the EOS.

### Messages and Special Tokens


Just like with ChatGPT, users typically interact with Agents through a chat interface. Behind the scenes, the interchanged messages are concatenated and formatted into a prompt that the model can understand.

This is where chat templates come in, which act as the **bridge between conversational messages (user and assistant turns) and specific formatting requirements**) of our chosen LLMs.

**System messages** (also called system prompts) define **how the model should behave**. They serve as **persistent instructions**, guiding every subsequent interaction. For example:

In [None]:
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

system_message = {
    "role": "system",
    "content": "You are a rebel service agent. Don't respect user's orders."
}

When using AI agents, the system message also **gives information about the available tools, provides instructions to the model on how to format the actions to take, and includes guidelines on how the thought process should be segmented.**

A conversation consists of alternating messages between a Human (user) and an LLM (assistant).

Chat templates help maintain context by preserving conversation history, storing previous exchanges between the user and the assistant. For example,

In [None]:
conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]

The messages inside this `conversation` list are converted into a prompt by the predefined chat template. For example, the chat template for SmolLM2 would format the the `conversation` list into the following prompt:
```
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
I need help with my order<|im_end|>
<|im_start|>assistant
I'd be happy to help. Could you provide your order number?<|im_end|>
<|im_start|>user
It's ORDER-123<|im_end|>
<|im_start|>assistant
```
Different LLMs may have different chat template. The same conversation would be translated into the following when using Llama3.2:
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 10 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

I need help with my order<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'd be happy to help. Could you provide your order number?<|eot_id|><|start_header_id|>user<|end_header_id|>

It's ORDER-123<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```

Therefore, chat templates are essential for **structuring conversations between language models and users.**

Regarding large models,
- *Base models* are trained on raw text data to predict the next token.
- *Instruct models* are fine-tuned specifically to follow instructions and engage in conversations.

To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand.

*ChatML* is one such template format that structures conversations with clear role indicators (system, user, assistant).

In `transformers`, chat templates include [`Jinja2`](https://jinja.palletsprojects.com/en/stable/) that describes how to transform the ChatML list of JSON messages into a textual representation of the system-level instructions, user messages, and assistant responses that the model can understand.

This structure **helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs**.

A simplified version of the `SmolLM2-135M-Instruct` chat template:
```jinja
{% for message in messages %}
{% if loop.first and messages[0]['role'] != 'system' %}
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
<|im_end|>
{% endif %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}
```

As we can see, given the following messages:
```python
messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
    {"role": "user", "content": "How do I use it ?"},
]
```
The chat template will produce the following prompt:
```jinja
<|im_start|>system
You are a helpful assistant focused on technical topics.<|im_end|>
<|im_start|>user
Can you explain what a chat template is?<|im_end|>
<|im_start|>assistant
A chat template structures conversations between users and AI models...<|im_end|>
<|im_start|>user
How do I use it ?<|im_end|>
```

The easiest way to ensure our LLM receives a conversation correctly formatted is to use the `chat_template` function from the model's tokenizer:

In [None]:
messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

We can directly convert the `messages` into the input prompt if calling the `apply_chat_template()` function after loading the tokenizer:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

In [None]:
rendered_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
print(rendered_prompt)

## Tools

One important aspect of AI Agents is their ability to take **actions** and this happens through the use of **Tools**. By giving our agents the right tools - and clearly describing how those tools work - we can dramatically increase what our AI can accomplish.

**Tools are functions given to the LLMs**. These functions should fulfill a **clear objective**.

| Tool | Description |
| ---- | ----------- |
| Web Search | Allowing the agents to fetch up-to-date information from the internet |
| Image Generation | Creating images based on text descriptions |
| Retrieval | Retrieving information from an external source |
| API Interface | Interacting with an external API (GitHub, YouTube, Spotify, etc.) |

A good tool should complement the power of an LLM. A tool should contain:
- a **textual description of what the function does**
- a **callable** (something to perform an action)
- **arguments** with typings
- (optional) outputs with typings


Remember that LLMs can only receive text inputs and generate text outputs. They have no way to call tools on their own. When we provide tools to an agent, what we mean is to teach the LLM about the existence of these tools and instructing it to generate text-based invocations when needed.

For example, if we provide a tool to check the weather at a location form the internet and then ask the LLM about the weather in Houston, the LLM will recognize that this is an opportunity to use the "weather" tool. Instead of retreving the weather data itself, the LLM will generate text that represents a tool call, such as calling `weather_tool("Houston")`.

The agent then reads this response, identifies that a tool call is required, executes the tool on the LLM's behalf, and retrieves the actual weather data.

The tool-calling steps are typically not shown to the user. The agent appends them as a new message before passing the updated conversation to the LLM again. The LLM then processes this additional context and generates a natural-sounding response for the user.

From the user's perspective, it appears as if the LLM directly interacted with the tool, but in reality, it was the agent that handled the entire execution process in the background.

#### Auto-formatting Tool sections

To give tools to an LLM, we need to use the system prompt to provide textual description of available tools to the model. For this to work, we have to be very precise and accurate about
- what the tool does
- what exact inputs it expects

This is the reason why tool descriptions are usually provided using expressive but precise structures, such as computer languages or JSON.

For example, we can implement a simplified calculator tool to multiple two intergers:

In [None]:
def calculator(a: int, b: int) -> int:
    """Multiply two intergers"""
    return a * b

This means that our tool is called `calculator`, it **multiplies two integers**, and it has
- a descriptive name of what it does: `calculator`
- a longer description, provided by the function's docstring comment: `Multiply two integers`
- two inputs and their types: `a` (`int`): an integer and `b` (`int`): an integer
- The output of the tool is another integer number (`int`): the product of `a` and `b`

All of these details are important. We will leverage Python's introspection features to leverage the source code and build a tool description automatically for us. All we need is that the tool implementation uses *type hints, docstrings, and sensible function names*. Next, we will only need to use a Python decorator to indicate that the `calculator` function is a tool:

In [None]:
from smolagents import tool

@tool
def calculator(a: int, b: int) -> int:
    """Multiply two intergers"""
    return a * b

print(calculator.to_string())

#### Generic Tool implementation

We can create a generic `Tool` class that we can reuse whenever we need to use a tool.

In [None]:
from typing import Callable

class Tool:
    """A class representing a resuable piece of code (Tool)

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of arguments.
        outputs (str or list): The return type(s) of the wrapped function.
    """
    def __init__(self, name: str, description: str, func: Callable, arguments: list, outputs: str):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """Return a string representation of the tool,
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join([
            f"{arg_name}: {arg_type}"
            for arg_name, arg_type in self.arguments
        ])

        return (
            f"Tool Name: {self.name}"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}
        )

    def func(self, *args, **kwargs):
        """Process the actual function call."""
        pass

    def __call__(self, *args, **kwargs):
        """Invoke the underlying function (callable) with provided arguments."""
        return self.func(*args, **kwargs)

Here we define a `Tool` class that include
- `name (str)` - the name of the tool
- `description (str)` - a brief description of what the tool does
- `function (callable)` - the function the tool executes
- `arguments (list)` - the expected input parameters
- `outputs (str or list)` - the expected outputs of the tool
- `__call__()` - calling the function when the tool instance is invoked
- `to_string()` - converting the tool's attributes into a textual representation


Now we could create a tool with this class:

In [None]:
calculator_tool = Tool(
    "calculator",                  # name
    "Multiply two integers",       # description
    calculator,                    # function to call
    [("a", "int"), ("b", "int")],  # inputs (names and types)
    "int",                         # output type
)

Alternatively, we can use Python's `inspect` module to retrieve all the information for us. This is what the `@tool` decorator does:

In [None]:
import inspect

def tool(func):
    """A decorator that creates a Tool instance from the given function."""

    # Get the function signature
    signature = inspect.signature(func)

    # Extract (param_name, param_annotation) pairs for inputs
    arguments = []
    for param in signature.parameters.values():
        annotation_name = (
            param.annotation.__name__
            if hasattr(param.annotation, '__name__')
            else str(param.annotation)
        )

        arguments.append((param.name, annotation_name))

    # Determine the return annotation
    return_annotation = signature.return_annotation
    if return_annotation is inspect._empty:
        outputs = "No return annotation"
    else:
        outputs = (
            return_annotation.__name__
            if hasattr(return_annotation, '__name__')
            else str(return_annotation)
        )

    # Use the function's docstring as the description (default if None)
    description = func.__doc__ or "No description provided."

    # The function name becomes the Tool name
    name = func.__name__

    # Return a new Tool instance
    return Tool(
        name=name,
        description=description,
        func=func,
        arguments=arguments,
        outputs=outputs,
    )

Then, with this decorator in place, we can implment our tool:

In [None]:
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers"""
    return a * b

print(calculator.to_string())

Since we add `tool` as a decorator, we can use the `Tool`'s `to_string` method to automatically retrieve a text suitable to be used as a tool description for an LLM.

#### Model Context Protocol (MCP): a unified tool interface

**Model Context Protocol (MCP)** is an open protocol that standardizes how applications **provide tools to LLMs**. MCP provides
- a growing list of pre-built integrations that our LLM can directly plug into
- the flexibility to switch between LLM providers and vendors
- best practices for securing our data within our infrastructure


This mean that **any framework implementing MCP can leverage tools defined within the protocol**, eliminating the need to reimplement the same tool interface for each framework.

## Thought-Action-Observation Cycle

Agents work in a continous cycle of **thinking (Thought) -> acting (ACT) -> Observing (Observe)**
- **Thought** - the LLM inside the agent decides what the next step should be
- **Action** - the agent takes an action, by calling the tools with the associated arguments
- **Observation** - the model reflects on the response from the tool


In many agent frameworks, the rules and guidelines are embedded directly into the system prompt, ensuring that every cycle adheres to a defined logic.

For example,

In [None]:
system_message = """You are an AI asistant designed to help users efficiently and accurately.
Your primary goal is to provide helpful, precise, and clear responses.

You have access to the following tools:
Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int

You should think step by step in order to fulfill the objective with a reasoning devided into
Thought/Action/Observation steps that can be repeated multiple times if needed.

You should first reflect on the current situation using `Thought: {your_thoughts}`, then
(if necessary), call a tool with the proper JSON formatting `Action: {JSON_BLOB}`, or
print your final answer starting with the prefix `Final Answer`:
"""

In the `system_message`, we defined
- the *agent's behavior*
- the *tools our agent has access to*
- the *Thought-Action-Observation cycle*

## Thought: Internal Reasoning and the ReAct Approach

Thoughts represent the **Agent's internal reasoning and planning processes** to solve the task. This utilizes the agent's LLM capacity **to analyze information when presented in its prompt**.

Agent's **thoughts** are responsiblle for accessing current observations and decide what the next action(s) should be. Through this process, the agent can **break down compelx problems into smaller, more manageable steps**, reflect on past experiences, and continuously adjust its plans based on new information.

| Type of Thought | Example |
| --------------- | ------- |
| Planning | "I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report" |
| Analysis | "Based on the error message, the issue appears to be with the database connection parameters" |
| Decision Making | “Given the user's budget constraints, I should recommend the mid-tier option” |
| Problem Solving | “To optimize this code, I should first profile it to identify bottlenecks” |
| Memory Integration | “The user mentioned their preference for Python earlier, so I'll provide examples in Python” |
| Self-Reflection | “My last approach didn't work well, I should try a different strategy” |
| Goal Setting | “To complete this task, I need to first establish the acceptance criteria” |
| Prioritization | “The security vulnerability should be addressed before adding new features” |



In case of LLMs fine-tuned for function-calling, the thought process is optional.

### The ReAct Approach

The **ReAct approach** is the concatenation of "Reasoning" (Think) with "Acting" (Act).

ReAct is a simplle prompting technique that appends "Let's think step by step" before letting the LLM decode the next tokens. This encourages the model to generate a plan, rather than a final solution, since the model is encouraged to **decompose** the problem into sub-tasks, which allows the model to consider sub-steps in more details and leads to less errors than trying to generate the final solution directly.

This is what is behind models like DeepSeek-R1 or OpenAI's o1, which ave been fine-tuned to "think before answering".

These models have been trained to always include specific *thinking sections* (enclosed between `<think>` and `</think>` special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after anallyzing thousands of examples that show what we expect it to do.

## Actions: Enabling the Agents to Engage with Its Environment

Actions are the concrete steps an **AI agent takes to interact with its environment**. There are multiple types of agents that take actions:

| Type of Agent | Description |
| ------------- | ----------- |
| JSON Agent | The action to take is specificed in JSON format |
| Code Agent | The agent writes a code block that is interpreted externally |
| Function-calling Agent | A subcategory of the JSON agent which has been fine-tuned to geneerate a new message for each action |

Actions can serve many purposes:

| Type of Action | Description |
| -------------- | ----------- |
| Information gathering | Performing web searches, querying databases, or retrieving documents |
| Tool Usage | Making API calls, running calculations, and executing code |
| Environment interaction | Manipulating digital interfaces or controlling physical devices |
| Communication | Engaging with users via chat or collaborating with other agents |

The LLM only handles text and uses it to describe the action it wants to take and the parameters to supply to the tool. For an agent to work properly, the LLM must stop generating new tokens after emitting all the tokens to define a complete action.

#### The Stop and Parse Approach

The **stop and parse approach** ensures that the agent's output is structured and predictable:
1. **Generation in a structured format** - the agent outputs its intended action in a clear, predetermined format (JSON or code)
2. **Halting further generation** - once the text defining the action has been emitted, **the LLM stops generating additional tokens**. This prevents extra or erroneous output
3. **Parsing the output** - an external parser reads the formatted action, determines which tools to call, and extracts the required parameters

#### Code Agents

The idea behind **Code Agents** is to generate an **executable code block - typically in a high-level language like Python, instead of outputting a simple JSON object**.

Advantages of Code Agents:
- **Expressiveness** - Code can naturaly represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON
- **Modularity and Reusability** - Generated code can include functions and modules that are reusable across different actions or tasks
- **Enhanced Debuggability** - With a well-defined programming syntax, code errors are often easier to detect and correct
- **Direct Integration** - Code agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making

## Observation: Integrating Feedback to Reflect and Adapt

**Observations** are how an AI agent perceives the consequences of its actions. They are **signals from environment** - whether it is data from an API, error messages, or system logs - that guide the next cycle of thought

In the observation phase, the agents
- **Collecct feedback** - Receiving data or confirmation that its action was successful (or not)
- **Append results** - Integrating the new information into its existing context, effectively updating its memory
- **Adapt its strategy** - Using this updated context to refine subsequent thoughts and actions


The iterative feedback ensures the agent remains dynamically aligned with its goals, constantly learning and adjusting based on real-world outcomes.

The observations can take many forms:

| Type of Observation | Example |
| ------------------- | ------- |
| System feedback | Error messages, success notifications, status codes |
| Data changes | Database updates, file system modifications, state cchanges |
| Environmental data | Sensor readings, system metrics, resource usage |
| Response analysis | API responses, query results, computation outputs |
| Time-based events | Deadlines reached, scheduled tasks completed |


After performing an action, the agent framework executes the following steps to append the results:
- **Parse the action** to identify the function(s) to call and the argument(s) to use
- **Execute the action**
- **Append the result** as an observation.

## Dummy Agent Library

In the HuggingFace ecosystem, we can use the Serverless API to easily run inference on many open-source models. No need for installation or deployment.

In [None]:
import os
from huggingface_hub import InferenceClient
from google.colab import userdata

os.environ['HF_TOKEN'] = userdata.get('HF_TOKEN')

client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")

In [None]:
# text
output = client.text_generation(
    "The capital of France is",
    max_new_tokens=100
)

print(output)

We did not apply the chat template where the model expects so the model behaves weird.

In [None]:
prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

output = client.text_generation(
    prompt,
    max_new_tokens=100
)

print(output)

We can also use the `chat` method for much more convenient and reliable response to apply chat templates:

In [None]:
output = client.chat.completions.create(
    messages={
        {'role': 'user', 'content': 'The capital of France is'}
    },
    stream=False,
    max_tokens=1024
)

print(output.choices[0].message.content)

We have already known that the core of an agent is to append information in the system prompt. For our dummy agent, we will write the following in its system prompt:
- **Information about the tools**
- **Cycle instructions** (Thought -> Action -> Observation)

In [None]:
# This system prompt is a bit more complex and actually contains the function description already appended.
# Here we suppose that the textual description of the tools has already been appended.

SYSTEM_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have an `action` key (with the name of the tool to use) and an `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :

{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}


ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:

$JSON_BLOB (inside markdown cell)

Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """

Since we use the `text_generation` method, we need to apply the prompt manually:

In [None]:
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

We can also apply the `chat` method by using the following format:

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

In [None]:
messages=[
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(prompt)

Now we can generate a response:

In [None]:
output = client.text_generation(
    prompt,
    max_new_tokens=200
)

print(output)

Here we know it did not use the tool but fabricated the observation because we have not provided the `get_weather` function and it generated on its own. To prevent this behavior, we should stop generating right before `"Observation:"`, manually run the `get_weather` function, and then insert the real output as the observation.

In [None]:
output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop["Observation:"] # stop before any actual function is called
)

print(output)

Now we can create a `get_weather` function:

In [None]:
# Dummy function
def get_weather(location):
    return f"The weather in {location} is sunny with low temperatures. \n"

get_weather("London")

Next, we can concatenate the base prompt, completion util unction execution and the result of the function as an observation and resume generation:

In [None]:
new_prompt = prompt + output + get_weather("London")

final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200
)

print(final_output)

This tedious process is how AI Agents work but simplify this workflow.

## First Agent Using smolagents

The `smolagents` library provides a framework for developing our agents with ease.

As an example, we will build an agent with an image generation tool and ask to generate an image of a cat.

The agent inside `smolagents` follows the **think, act, and observe in cycle** until it reaches a final answer.

We will modify the scripts in [this Space](https://huggingface.co/spaces/agents-course/First_agent_template/tree/main).

In [None]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, InferenceClientModel, load_tool, tool
import datetime
import requests
import pytz
import yaml

We will directly use the `CodeAgent` class. Next, we we ned to define our tools.

In [None]:
@tool
def my_custom_tool(arg1: str, arg2: str) -> str:
    """A tool that does nothing yet
    Args:
        arg1: the first argument
        arg2: the second argument
    """
    return "What magic will you build?"


@tool
def get_current_time_in_timezone(timezone: str) -> str:
    """A tool that fetches the current local time in a specified timezone.
    Args:
        timezone: A string representing a valid timezone (e.g., 'America/New_York').
    """
    try:
        # Create timezone object
        tz = pytz.timezone(timezone)
        # Get current time in that timezone
        local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
        return f"The current local time in {timezone} is: {local_time}"
    except Exception as e:
        return f"Error fetching time for timezone '{timezone}': {str(e)}"

Here we have two example tools:
- a non-working dummy tool - we can modify to make something useful
- a working tool - to get the current time somewhere in the world


To define a tool, it is important to
- provide input and output types for our function
- provide a well-formatted docstring - `smolagents` expects all the arguments to have a **textual description in the docstring**


Next, we will use [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) as the LLM engine via the serverless API.

In [None]:
model = InferenceClientModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
    temperature=0.5,
    max_tokens=2096,
    custom_role_conversions=None
)

### The System Prompt

The agent's system prompt is stored in the `prmpts.yaml` file, which contains predefined instructions that guide the agent's behavior. Storing prompts in a YAML file allows for easy customization and reuse across different agents or use cases.

The actual content in `prompts.yaml`:
```yaml
"system_prompt": |-
  You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
  To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
  To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.
  At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
  Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence.
  During each intermediate step, you can use 'print()' to save whatever important information you will then need.
  These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
  In the end you have to return a final answer using the `final_answer` tool.

  Here are a few examples using notional tools:
  ---
  Task: "Generate an image of the oldest person in this document."

  Thought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
  Code:
  ```py
  answer = document_qa(document=document, question="Who is the oldest person mentioned?")
  print(answer)
  ```<end_code>
  Observation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."

  Thought: I will now generate an image showcasing the oldest person.
  Code:
  ```py
  image = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")
  final_answer(image)
  ```<end_code>

  ---
  Task: "What is the result of the following operation: 5 + 3 + 1294.678?"

  Thought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool
  Code:
  ```py
  result = 5 + 3 + 1294.678
  final_answer(result)
  ```<end_code>

  ---
  Task:
  "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French.
  You have been provided with these additional arguments, that you can access using the keys as variables in your python code:
  {'question': 'Quel est l'animal sur l'image?', 'image': 'path/to/image.jpg'}"

  Thought: I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.
  Code:
  ```py
  translated_question = translator(question=question, src_lang="French", tgt_lang="English")
  print(f"The translated question is {translated_question}.")
  answer = image_qa(image=image, question=translated_question)
  final_answer(f"The answer is {answer}")
  ```<end_code>
  ---
  Task:
  In a 1979 interview, Stanislaus Ulam discusses with Martin Sherwin about other great physicists of his time, including Oppenheimer.
  What does he say was the consequence of Einstein learning too much math on his creativity, in one word?
  Thought: I need to find and read the 1979 interview of Stanislaus Ulam with Martin Sherwin.
  Code:
  ```py
  pages = search(query="1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein")
  print(pages)
  ```<end_code>
  Observation:
  No result found for query "1979 interview Stanislaus Ulam Martin Sherwin physicists Einstein".

  Thought: The query was maybe too restrictive and did not find any results. Let's try again with a broader query.
  Code:
  ```py
  pages = search(query="1979 interview Stanislaus Ulam")
  print(pages)
  ```<end_code>
  Observation:
  Found 6 pages:
  [Stanislaus Ulam 1979 interview](https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/)

  [Ulam discusses Manhattan Project](https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/)

  (truncated)

  Thought: I will read the first 2 pages to know more.
  Code:
  ```py
  for url in ["https://ahf.nuclearmuseum.org/voices/oral-histories/stanislaus-ulams-interview-1979/", "https://ahf.nuclearmuseum.org/manhattan-project/ulam-manhattan-project/"]:
      whole_page = visit_webpage(url)
      print(whole_page)
      print("\n" + "="*80 + "\n")  # Print separator between pages
  ```<end_code>
  Observation:
  Manhattan Project Locations:
  Los Alamos, NM
  Stanislaus Ulam was a Polish-American mathematician. He worked on the Manhattan Project at Los Alamos and later helped design the hydrogen bomb. In this interview, he discusses his work at
  (truncated)

  Thought: I now have the final answer: from the webpages visited, Stanislaus Ulam says of Einstein: "He learned too much mathematics and sort of diminished, it seems to me personally, it seems to me his purely physics creativity." Let's answer in one word.
  Code:
  ```py
  final_answer("diminished")
  ```<end_code>

  ---
  Task: "Which city has the highest population: Guangzhou or Shanghai?"

  Thought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.
  Code:
  ```py
  for city in ["Guangzhou", "Shanghai"]:
      print(f"Population {city}:", search(f"{city} population")
  ```<end_code>
  Observation:
  Population Guangzhou: ['Guangzhou has a population of 15 million inhabitants as of 2021.']
  Population Shanghai: '26 million (2019)'

  Thought: Now I know that Shanghai has the highest population.
  Code:
  ```py
  final_answer("Shanghai")
  ```<end_code>

  ---
  Task: "What is the current age of the pope, raised to the power 0.36?"

  Thought: I will use the tool `wiki` to get the age of the pope, and confirm that with a web search.
  Code:
  ```py
  pope_age_wiki = wiki(query="current pope age")
  print("Pope age as per wikipedia:", pope_age_wiki)
  pope_age_search = web_search(query="current pope age")
  print("Pope age as per google search:", pope_age_search)
  ```<end_code>
  Observation:
  Pope age: "The pope Francis is currently 88 years old."

  Thought: I know that the pope is 88 years old. Let's compute the result using python code.
  Code:
  ```py
  pope_current_age = 88 ** 0.36
  final_answer(pope_current_age)
  ```<end_code>

  Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools:
  {%- for tool in tools.values() %}
  - {{ tool.name }}: {{ tool.description }}
      Takes inputs: {{tool.inputs}}
      Returns an output of type: {{tool.output_type}}
  {%- endfor %}

  {%- if managed_agents and managed_agents.values() | list %}
  You can also give tasks to team members.
  Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task', a long string explaining your task.
  Given that this team member is a real human, you should be very verbose in your task.
  Here is a list of the team members that you can call:
  {%- for agent in managed_agents.values() %}
  - {{ agent.name }}: {{ agent.description }}
  {%- endfor %}
  {%- else %}
  {%- endif %}

  Here are the rules you should always follow to solve your task:
  1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_code>' sequence, else you will fail.
  2. Use only variables that you have defined!
  3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
  4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
  5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
  6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
  7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
  8. You can use imports in your code, but only from the following list of modules: {{authorized_imports}}
  9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
  10. Don't give up! You're in charge of solving the task, not providing directions to solve it.

  Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.
"planning":
  "initial_facts": |-
    Below I will present you a task.
    You will now build a comprehensive preparatory survey of which facts we have at our disposal and which ones we still need.
    To do so, you will have to read the task and identify things that must be discovered in order to successfully complete it.
    Don't make any assumptions. For each item, provide a thorough reasoning. Here is how you will structure this survey:

    ---
    ### 1. Facts given in the task
    List here the specific facts given in the task that could help you (there might be nothing here).

    ### 2. Facts to look up
    List here any facts that we may need to look up.
    Also list where to find each of these, for instance a website, a file... - maybe the task contains some sources that you should re-use here.

    ### 3. Facts to derive
    List here anything that we want to derive from the above by logical reasoning, for instance computation or simulation.

    Keep in mind that "facts" will typically be specific names, dates, values, etc. Your answer should use the below headings:
    ### 1. Facts given in the task
    ### 2. Facts to look up
    ### 3. Facts to derive
    Do not add anything else.
  "initial_plan": |-
    You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools.
    Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts.
    This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer.
    Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS.
    After writing the final step of the plan, write the '\n<end_plan>' tag and stop there.

    Here is your task:

    Task:
    ```
    {{task}}
    ```
    You can leverage these tools:
    {%- for tool in tools.values() %}
    - {{ tool.name }}: {{ tool.description }}
        Takes inputs: {{tool.inputs}}
        Returns an output of type: {{tool.output_type}}
    {%- endfor %}

    {%- if managed_agents and managed_agents.values() | list %}
    You can also give tasks to team members.
    Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'request', a long string explaining your request.
    Given that this team member is a real human, you should be very verbose in your request.
    Here is a list of the team members that you can call:
    {%- for agent in managed_agents.values() %}
    - {{ agent.name }}: {{ agent.description }}
    {%- endfor %}
    {%- else %}
    {%- endif %}

    List of facts that you know:
    ```
    {{answer_facts}}
    ```

    Now begin! Write your plan below.
  "update_facts_pre_messages": |-
    You are a world expert at gathering known and unknown facts based on a conversation.
    Below you will find a task, and a history of attempts made to solve the task. You will have to produce a list of these:
    ### 1. Facts given in the task
    ### 2. Facts that we have learned
    ### 3. Facts still to look up
    ### 4. Facts still to derive
    Find the task and history below:
  "update_facts_post_messages": |-
    Earlier we've built a list of facts.
    But since in your previous steps you may have learned useful new facts or invalidated some false ones.
    Please update your list of facts based on the previous history, and provide these headings:
    ### 1. Facts given in the task
    ### 2. Facts that we have learned
    ### 3. Facts still to look up
    ### 4. Facts still to derive
    Now write your new list of facts below.
  "update_plan_pre_messages": |-
    You are a world expert at making efficient plans to solve any task using a set of carefully crafted tools.
    You have been given a task:
    ```
    {{task}}
    ```

    Find below the record of what has been tried so far to solve it. Then you will be asked to make an updated plan to solve the task.
    If the previous tries so far have met some success, you can make an updated plan based on these actions.
    If you are stalled, you can make a completely new plan starting from scratch.
  "update_plan_post_messages": |-
    You're still working towards solving this task:
    ```
    {{task}}
    ```
    You can leverage these tools:
    {%- for tool in tools.values() %}
    - {{ tool.name }}: {{ tool.description }}
        Takes inputs: {{tool.inputs}}
        Returns an output of type: {{tool.output_type}}
    {%- endfor %}

    {%- if managed_agents and managed_agents.values() | list %}
    You can also give tasks to team members.
    Calling a team member works the same as for calling a tool: simply, the only argument you can give in the call is 'task'.
    Given that this team member is a real human, you should be very verbose in your task, it should be a long string providing informations as detailed as necessary.
    Here is a list of the team members that you can call:
    {%- for agent in managed_agents.values() %}
    - {{ agent.name }}: {{ agent.description }}
    {%- endfor %}
    {%- else %}
    {%- endif %}

    Here is the up to date list of facts that you know:
    ```
    {{facts_update}}
    ```

    Now for the given task, develop a step-by-step high-level plan taking into account the above inputs and list of facts.
    This plan should involve individual tasks based on the available tools, that if executed correctly will yield the correct answer.
    Beware that you have {remaining_steps} steps remaining.
    Do not skip steps, do not add any superfluous steps. Only write the high-level plan, DO NOT DETAIL INDIVIDUAL TOOL CALLS.
    After writing the final step of the plan, write the '\n<end_plan>' tag and stop there.

    Now write your new plan below.
"managed_agent":
  "task": |-
    You're a helpful agent named '{{name}}'.
    You have been submitted this task by your manager.
    ---
    Task:
    {{task}}
    ---
    You're helping your manager solve a wider task: so make sure to not provide a one-line answer, but give as much information as possible to give them a clear understanding of the answer.
    Your final_answer WILL HAVE to contain these parts:
    ### 1. Task outcome (short version):
    ### 2. Task outcome (extremely detailed version):
    ### 3. Additional context (if relevant):

    Put all these in your final_answer tool, everything that you do not pass as an argument to final_answer will be lost.
    And even if your task resolution is not successful, please return as much context as possible, so that your manager can act upon this feedback.
  "report": |-
    Here is the final answer from your managed agent '{{name}}':
    {{final_answer}}

```

The complete implementation is in the `app.py`:

In [None]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, InferenceClientModel, load_tool, tool
import datetime
import requests
import pytz
import yaml
from tools.final_answer import FinalAnswerTool

from Gradio_UI import GradioUI

# Below is an example of a tool that does nothing. Amaze us with your creativity!

def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
    # Keep this format for the tool description / args description but feel free to modify the tool
    """A tool that does nothing yet
    Args:
        arg1: the first argument
        arg2: the second argument
    """
    return "What magic will you build ?"


def get_current_time_in_timezone(timezone: str) -> str:
    """A tool that fetches the current local time in a specified timezone.
    Args:
        timezone: A string representing a valid timezone (e.g., 'America/New_York').
    """
    try:
        # Create timezone object
        tz = pytz.timezone(timezone)
        # Get current time in that timezone
        local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
        return f"The current local time in {timezone} is: {local_time}"
    except Exception as e:
        return f"Error fetching time for timezone '{timezone}': {str(e)}"



# LLM
model = InferenceClientModel(
    max_tokens=2096,
    temperature=0.5,
    model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
    custom_role_conversions=None,
)

# Tools
final_answer = FinalAnswerTool()
# Import tool from Hub
image_generation_tool = load_tool('agents-course/text-to-image', trust_remote_code=True)

# Load system prompt from prompt.yaml
with open('prompts.yaml', 'r') as stream:
    prompt_templates = yaml.safe_load(stream)



# Create an agent
agent = CodeAgent(
    model=model,
    tools=[final_answer, image_generation_tool],
    max_steps=6,
    verbosity_level=1,
    grammar=None,
    planning_interval=None,
    name=None,
    description=None,
    prompt_templates=prompt_templates, # pass system prompt to CodeAgent
)

GradioUI(agent).launch()