# What is an Agent

An Agent is an AI model capable of reasoning, planning and interacting with its environment.

## General notations


### AI models? 

Mostly LLMs, e.g. GPT4, LLaMA, Gemini.

### Interacting with environment?
LLMs can only generate text. To accomplish other tasks, it has to be equipped with tools. The LLM will generate code to run the tool, then fulfill the task.

### Large Language Model?
* AI models that can understand and generate human language
* Most LLMs are built on the Transformer architecture
* Typically, LLMs are decoder-based model with billions of parameters.
* The training objective is to predict the next token, given a sequence of previous tokens. 
* Each LLM has some special tokens specific to the model. The LLM uses these tokens to open and close the structure components of its generation. The most important of those are the **End of sequence** token. 

### Next token prediction?
* LLMs are autoregressive, meaning the output from one pass become the input for the next one. This loop continues until the model predict the next token to be the EOS token, at which point the model can stop. This is so-called "decoding".

### Decoding
* The first step is **tokenizing** the input text.
* The model then computes a representation of the sequence that captures information about the meaning and position of each token in the input sequence. This relies on attention mechanism.
* This representation goes into the model, which output a scores that ranks the likelihood of each token in its vocabulary as being the next one in the sequence.
* There are many strategies to select the next tokens (greedy, beam search...)

### Attention
* To accomplish the language model task, it's incredibly useful to identify the most relevant words to predict the next token.

### LLMs training and fine-tuning
* LLMs are trained using unsupervised learning, using large datasets of texts, where they learn to predict the next word in a sequence.
* After this initial pretraining, LLMs can be fine-tuned on a supervised learning objective to perform specific tasks (e.g. conversational structures, tool usage, classficiation, code generation)

## Components


### Messages
* When users interact with the system, these messages are concatenated and formatted into a prompt that the model can understand. The format can be different from models to models.
* This consists of a **System Message** and a **conversation** with User and Assistant messages.
* The System Messages serve as persistent instructions, guiding every subsequent interaction. When using Agents, the System Message also gives information about the available tools, provide instructions to the model on how to format the actions to take, and guidelines on how the thought process should be segmented.
* The conversation consists of alternating messages between an User and an Assistant.

In [None]:
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]

After applying the Chat template, e.g., of SmolLM2, all the messages will be concatenated into a single string.

### Base model vs. Instruct Model
* Base model is trained on raw text to predict the next token
* An Instruct model is fine-tuned to follow instructions and engage in conversation.

### Chat template
* We need to format our prompts in a consistent waz that the model can understand.
* The `transformers` library from HuggingFace takes care of chat templates as part of the tokenization process.

In [7]:
messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

Convert the messages to prompt

In [None]:
from transformers import AutoTokenizer 
# prepare the inputs for a model using from_pretrained() method

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") 
# pass the message to tokenize and/or format them. 
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

In [None]:
import pprint
pprint.pp(rendered_prompt)
# seting add_generation_prompt to True indicwill add tokens indicating the start of an assistant messages to the prompt, e.g. <|im_start|>assistant\n

('<|im_start|>system\n'
 'You are an AI assistant with access to various tools.<|im_end|>\n'
 '<|im_start|>user\n'
 'Hi !<|im_end|>\n'
 '<|im_start|>assistant\n'
 'Hi human, what can help you with ?<|im_end|>\n'
 '<|im_start|>assistant\n')


In [16]:
# If tokenize=True, it outputs a list/tensor of token_ids that is ready for generation and decoding
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
tokenized_chat

tensor([[    1,  9690,   198,  2683,   359,   354,  5646, 11173,   351,  1594,
           288,  1461,  2549,    30,     2,   198,     1,  4093,   198, 26843,
          5728,     2,   198,     1,   520,  9531,   198, 26843,  1205,    28,
           732,   416,   724,   346,   351,  9148,     2,   198,     1,   520,
          9531,   198]])

In [17]:
# we can convert the tensor of token ids to natural language
tokenizer.decode(tokenized_chat[0])

'<|im_start|>system\nYou are an AI assistant with access to various tools.<|im_end|>\n<|im_start|>user\nHi !<|im_end|>\n<|im_start|>assistant\nHi human, what can help you with ?<|im_end|>\n<|im_start|>assistant\n'

### What are Tools?

* A tool is a function given to the LLM, with clear objective. 
* Some popular tools: web search, image generation, retrieval, API reference.
* LLMs can't call tools on their own. We can provide tools to an agent by **teaching the LLM about the existence of these tools and instructing it to generate text-based invocations when needed**
* The Agent then read the responses from the LLM, executes the tool and returns the result to the LLM as a new message appending to the updated conversation.
* The LLM then processes this additional context and generates a response to the user.


### How do we give tools to an LLM?

* We use the system prompt to provide textual descriptions of available tools to the model. 
* We have to be very precise about: what the tool does and what exact inputs it expects.

In [22]:
# Tool implementation
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b
# What we want the LLM to know about the tool
tools_description = """Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int"""
# This can be done using Python decorator or a Tool class
system_message=f"""You are an AI assistant designed to help users efficientlz and accurately. Your primary goal is to provide helpful, precise, and clear responses.

You have access to the following tools:
{tools_description}
"""
system_message

'You are an AI assistant designed to help users efficientlz and accurately. Your primary goal is to provide helpful, precise, and clear responses.\n\nYou have access to the following tools:\nTool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int\n'

### The Thought-Action-Observation cycle

* Agents work in a continuous **cyle**: thinking --> acting --> observing
* Thought: The LLM part of the Agent decides what the **next step** should be
* Action: The Agent calls the tools with the associated arguments.
* Observation: The LLM **refects** on the response from the tool.
* In many agent framework, the **rules and guidelines** for Though-Action-Observation is embedded in the System prompt.

In [23]:
system_message="""You are an AI assistant designed to help users efficientlz and accurately. Your primary goal is to provide helpful, precise, and clear responses.

You have access to the following tools:
Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int

You should think step by step in order to fulfill the objective with a reasoning divided into Thought/Action/Observation steps that can be repeated multiple times if needed.

You should first reflect on the current situation using `Thought: {your_thoughts}`, then (if necessary), call a tool with the proper JSON formarting `Action: {JSON_BLOB}`, or print your final answer starting with the prefix `Final Answer:`
"""

### Thought
* In this step, we use LLM to analyze information in the prompt and decide what the next actions should be. 
* Common thoughts: planning, analysis, decision making, problem solving, memory integration , self-reflection, goal setting, prioritization
* **ReAct** is a prompting approach, which appends "Let's think step by step' before letting the LLM decode the next tokens.
* **DeepeekR1** or OpenAI GPT-4o1 are trained to generate **thinking section** before generate the final answer.

### Actions
* Actions are the concrete steps an AI agents takes to interact with its environment. 
* Type of agent actions: JSON Agent, Code Agent, Function-calling Agent
* The LLM only handles texts and uses it to describe the action it wants to take and the parameters to supplz to the tool. 
#### The Stop and Parse approach
* The LLM must STOP generating new toking **after emitting all the tokens to define a complete Action**. The output of the LLM should be in a clear, predetermined format (JSON or code)
* An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.



### Observation
* They are signals from the environement (data from an API, error messages, or system logs) that guide the next cycle of thought.
* In this phase, the agent **collect feedback**, **append results** into its existing context, **adapt its strategy** (whether additional information needed or it's ready to provide the final answer)
* Types: system feedback (error messages, success notification...), data changes, environement data, response analysis, time-based events



# First Agent with smolagents
* smolagents focuses on codeAgent. It performs action through code blocks, and then observe results by executing the code.

In [25]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, InferenceClientModel, load_tool, tool
import datetime
import requests
import pytz
import yaml

In [None]:
from smolagents.tools import Tool
class FinalAnswerTool(Tool):
    name = "final_answer"
    description = "Provides a final answer to the given problem."
    inputs = {'answer': {'type': 'any', 'description': 'The final answer to the problem'}}
    output_type = "any"

    def forward(self, answer: Any) -> Any:
        return answer

    def __init__(self, *args, **kwargs):
        self.is_initialized = False
        
final_answer = FinalAnswerTool()

In [26]:
@tool
def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
    # Keep this format for the tool description / args description but feel free to modify the tool
    """A tool that does nothing yet 
    Args:
        arg1: the first argument
        arg2: the second argument
    """
    return "What magic will you build ?"

@tool
def get_current_time_in_timezone(timezone: str) -> str:
    """A tool that fetches the current local time in a specified timezone.
    Args:
        timezone: A string representing a valid timezone (e.g., 'America/New_York').
    """
    try:
        # Create timezone object
        tz = pytz.timezone(timezone)
        # Get current time in that timezone
        local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
        return f"The current local time in {timezone} is: {local_time}"
    except Exception as e:
        return f"Error fetching time for timezone '{timezone}': {str(e)}"

In [None]:
from smolagents import LiteLLMModel

model = LiteLLMModel(
    model_id="ollama_chat/qwen2:7b",  # Or try other Ollama-supported models
    api_base="http://127.0.1:11434",  # Default Ollama local server
    num_ctx=8192,
)

In [32]:
messages = [
    {"role": "user","content": [{"type": "text","text": "Where is the capital of France?"}]}
]

In [33]:
model(messages)

ChatMessage(role='assistant', content='Paris', tool_calls=None, raw=ModelResponse(id='chatcmpl-4a50c13d-4f91-48b9-a53c-4c959c9ec398', created=1749649506, model='ollama_chat/qwen2:7b', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='Paris', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=2, prompt_tokens=26, total_tokens=28, completion_tokens_details=None, prompt_tokens_details=None)), token_usage=TokenUsage(input_tokens=26, output_tokens=2, total_tokens=28))

In [34]:
agent = CodeAgent(tools=[], model=model)

In [35]:
# Run the agent with a task
result = agent.run("Calculate the sum of numbers from 1 to 10")
print(result)

55


In [38]:
search_agent = CodeAgent(
    tools=[],  # Empty list since we'll use default tools
    model=model,
    add_base_tools=True  # This adds web search and other default tools
)

# Now the agent can search the web!
result = search_agent.run("What is the current weather in Paris?")
print(result)

The current weather conditions for Paris are currently unavailable due to the limitations in extracting web data.


In [41]:
import re
import requests
from markdownify import markdownify
from requests.exceptions import RequestException
from smolagents import tool

@tool
def visit_webpage(url: str) -> str:
    """Visits a webpage at the given URL and returns its content as a markdown string.

    Args:
        url: The URL of the webpage to visit.

    Returns:
        The content of the webpage converted to Markdown, or an error message if the request fails.
    """
    try:
        # Send a GET request to the URL
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for bad status codes

        # Convert the HTML content to Markdown
        markdown_content = markdownify(response.text).strip()

        # Remove multiple line breaks
        markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content)

        return markdown_content

    except RequestException as e:
        return f"Error fetching the webpage: {str(e)}"
    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"


In [42]:
print(visit_webpage("https://en.wikipedia.org/wiki/Hugging_Face")[:500])

Hugging Face - Wikipedia

[Jump to content](#bodyContent)

Main menu

Main menu

move to sidebar
hide

Navigation

* [Main page](/wiki/Main_Page "Visit the main page [z]")
* [Contents](/wiki/Wikipedia:Contents "Guides to browsing Wikipedia")
* [Current events](/wiki/Portal:Current_events "Articles related to current events")
* [Random article](/wiki/Special:Random "Visit a randomly selected article [x]")
* [About Wikipedia](/wiki/Wikipedia:About "Learn about Wikipedia and how it works")
* [Conta


In [44]:
from smolagents import (
    ToolCallingAgent,
    WebSearchTool,
)

web_agent = ToolCallingAgent(
    tools=[WebSearchTool(), visit_webpage],
    model=model,
    max_steps=10,
    name="web_search_agent",
    description="Runs web searches for you.",
)

In [45]:
manager_agent = CodeAgent(
    tools=[],
    model=model,
    managed_agents=[web_agent],
    additional_authorized_imports=["time", "numpy", "pandas"],
)

In [46]:
answer = manager_agent.run("If LLM training continues to scale up at the current rhythm until 2030, what would be the electric power in GW required to power the biggest training runs by 2030? What would that correspond to, compared to some countries? Please provide a source for any numbers used.")

KeyboardInterrupt: 