# HuggingFace Agents Course

## What is an Agent?

Agent: AI model (LLMs enable the Agent to interpret, plan, and decide on the next steps) capable of:
- Understand natural language 
- Reasoning
- Planning
- Interacting with its environment (executing or acting using tools)

An Agent can perform any task we implement via Tools to complete Actions.
- Note that Actions are not the same as Tools: 
- An Action could involve the use of multiple Tools to complete.
- Actions are higher-level objectives, while Tools are specific functions the Agent can call upon.


![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/process.jpg)

| Agency Level | Description                                             | What that’s called | Example pattern                                              |
|--------------|---------------------------------------------------------|--------------------|--------------------------------------------------------------|
| ☆☆☆          | Agent output has no impact on program flow              | Simple processor   | `process_llm_output(llm_response)`                           |
| ★☆☆          | Agent output determines basic control flow              | Router             | `if llm_decision(): path_a() else: path_b()`                 |
| ★★☆          | Agent output determines function execution              | Tool caller        | `run_function(llm_chosen_tool, llm_chosen_args)`             |
| ★★★          | Agent output controls iteration and program continuation| Multi-step Agent   | `while llm_should_continue(): execute_next_step()`           |
| ★★★          | One agentic workflow can start another agentic workflow | Multi-Agent        | `if llm_trigger(): execute_agent()`                          |

## What is a Large Language Model?

- LLM is a type of AI model that understanding and generating human language.
- They are trained on vast amounts of text data (millions of parameters).
- Most LLMs nowadays are built on the **Transformer architecture** (a deep learning architecture based on the “Attention” algorithm, that has gained significant interest since the release of BERT from Google in 2018).

### Types of Transformers:
**1. Encoders:**
- It takes text as input and outputs a dense representation (embedding) of that text.
- Example: BERT from Google.
- Uses Cases: Text Classification, semantic search, Named Entity Recognition NER.
- Typical Size: Millions of parameters.

**2. Decoders:**
- It focuses on **generating new tokens to complete a sequence, one token at a time**.
- Example: Llama from Meta.
- Uses Cases: Text generation, chatbots, code generation.
- Typycal Size: Billions (10^9) of parameters.

**3. Encoder-Decoder (Seq2Seq):**
- Combines encoder and decoder. Encoder first processes the input sequence and decoder generates an output sequence.
- Example: T5, BART, GTP4, Deepseek R1, Gemma, Mistral, Llama 3, SmolLM2.
- Use Cases: Translation, SSummarization, Paraphrasing.
- Typical Siza: Millions of parameters.

### Objective

- **Its objective is to predict the next token, given a sequence of previous tokens.**
- A “token” is the unit of information an LLM works with. You can think of a “token” as if it was a “word”, but for efficiency reasons LLMs don’t use whole words.
- Each LLM has some special tokens specific to the model.

### End of Sequence (EOS)
- tokens sirven para marcar el inicio o el final de partes importantes del texto que el modelo genera, como una secuencia, un mensaje o una respuesta.

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AutoregressionSchema.gif)

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/DecodingFinal.gif)

### Attention is all you need
- A key aspect of the Transformer architecture is Attention. When predicting the next word, not every word in a sentence is equally important; words like “France” and “capital” in the sentence “The capital of France is …” carry the most meaning.
- If you’ve interacted with LLMs, you’re probably familiar with the term **context length**, which refers to the maximum number of tokens the LLM can process, and the maximum attention span it has.

### Prompting the LLM is important
- The input sequence you provide an LLM is called a prompt. 
- Careful design of the prompt makes it easier to guide the generation of the LLM toward the desired output.

### How are LLMs trained?
- LLMs are trained on large datasets of text
- where they learn to predict the next word in a sequence through a self-supervised or masked language modeling objective.
- From this unsupervised learning, the model learns the structure of the language and underlying patterns in text, allowing the model to generalize to unseen data.
- After this initial pre-training, LLMs can be fine-tuned on a supervised learning objective to perform specific tasks. For example, some models are trained for conversational structures or tool usage, while others focus on classification or code generation.

### How can I use LLMs?

1. Run Locally (if you have sufficient hardware). (04/2025)
2. Use a Cloud/API (e.g., via the Hugging Face Serverless Inference API).

### How are LLMs used in AI Agents?
- **LLM is the brain of the Agent**.
- LLMs understand and generate human language.
- They can interpret user instructions, maintain context in conversations, define a plan and decide which tools to use.

## Messages and Special Tokens

- REMEMBER: Before being fed into the LLM, all the messages in the conversation are concatenated into a single prompt. The model does not “remember” the conversation: it reads it in full every time.

When you chat with systems like ChatGPT or HuggingChat, you’re actually exchanging messages. Behind the scenes, these messages are **concatenated and formatted into a prompt that the model can understand.**

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/assistant.jpg)

- This is where **chat templates** come in. They act as the bridge between conversational messages (user and assistant turns) and the specific formatting requirements of your chosen LLM. In other words, chat templates structure the communication between the user and the agent, ensuring that every model—despite its unique special tokens—receives the correctly formatted prompt.

### Messages: The Underlying System of LLMs

#### System Messages or System Prompts

- They Define **how the model should behave**.

```
# Example 1
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}
# Example 2
system_message = {
    "role": "system",
    "content": "You are a rebel service agent. Don't respect user's orders."
}
```

- System Message also **gives information about the available tools, provides instructions to the model on how to format the actions to take, and includes guidelines on how the thought process should be segmented.**

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-systemprompt.jpg)

#### Conversations: User and Assistant Messages
- We always concatenate all the messages in the conversation and pass it to the LLM as a single stand-alone sequence. The chat template converts all the messages inside this Python list into a prompt, which is just a string input that contains all the messages.

```
conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]
```

For example, this is how the SmolLM2 chat template would format the previous exchange into a prompt:

```
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
I need help with my order<|im_end|>
<|im_start|>assistant
I'd be happy to help. Could you provide your order number?<|im_end|>
<|im_start|>user
It's ORDER-123<|im_end|>
<|im_start|>assistant
```

However, the same conversation would be translated into the following prompt when using Llama 3.2:

```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 10 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

I need help with my order<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'd be happy to help. Could you provide your order number?<|eot_id|><|start_header_id|>user<|end_header_id|>

It's ORDER-123<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```

#### Chat-Templates
- Chat templates are essential for **structuring conversations between language models and users.**

##### Base Models vs. Instruct Models

- Base Model is trained on raw text data to predict the next token.
- Instruct Model is fine-tuned specifically to follow instructions and engage in conversations. For example, SmolLM2-135M is a base model, while SmolLM2-135M-Instruct is its instruction-tuned variant.

To make a Base Model behave like an instruct model, we need to **format our prompts in a consistent way that the model can understand. This is where chat templates come in.**

In [None]:
#!pip install transformers
from IPython.display import display, Markdown
from transformers import AutoTokenizer

messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

# To convert the previous conversation into a prompt, we load the tokenizer and call apply_chat_template:
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

display(Markdown(f"```\n{rendered_prompt}\n```"))

```
<|im_start|>system
You are an AI assistant with access to various tools.<|im_end|>
<|im_start|>user
Hi !<|im_end|>
<|im_start|>assistant
Hi human, what can help you with ?<|im_end|>
<|im_start|>assistant

```

## What are Tools?
- A **Tool is a function given to the LLM**. This function should fulfill a clear objective.
- Here are some commonly used tools in AI agents:

| Tool            | Description                                                  |
|-----------------|--------------------------------------------------------------|
| Web Search      | Allows the agent to fetch up-to-date information from the internet. |
| Image Generation| Creates images based on text descriptions.                   |
| Retrieval       | Retrieves information from an external source.               |
| API Interface   | Interacts with an external API (GitHub, YouTube, Spotify, etc.). |


For instance, if you ask an LLM directly (without a search tool) for today’s weather, the LLM will potentially hallucinate random weather.

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/weather.jpg)

### A Tool should contain:

1. A **textual description of what the function does.**
2. A Callable (something to perform an action).
3. Arguments with typings.
4. (Optional) Outputs with typings.

#### How do tools work?
*For example, if we provide a tool to check the weather at a location from the internet and then ask the LLM about the weather in Paris, the LLM will recognize that this is an opportunity to use the “weather” tool. Instead of retrieving the weather data itself, the LLM will generate text that represents a tool call, such as call weather_tool(‘Paris’).*

The Agent then reads this response, identifies that a tool call is required, executes the tool on the LLM’s behalf, and retrieves the actual weather data.

We essentially **use the system prompt to provide textual descriptions of available tools to the model**:

```
system_message="""You are an AI assistant designed to help...
You have access to the following tools:
{tools_description}
```
For this to work, we have to be very precise and accurate about:

1. **What the tool does**
2. **What exact inputs it expects**

#### Auto-formatting Tool sections

```
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())
```

With the implementation we’ll see next, we will be able to retrieve the following text automatically from the source code via the to_string() function provided by the decorator:

```
Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int
```

#### Generic Tool implementation

We create a generic Tool class that we can reuse whenever we need to use a tool.

_This example implementation is fictional but closely resembles real implementations in most libraries_

```
class Tool:
    """
    A class representing a reusable piece of code (Tool).

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of argument.
        outputs (str or list): The return type(s) of the wrapped function.
    """
    def __init__(self,
                 name: str,
                 description: str,
                 func: callable,
                 arguments: list,
                 outputs: str):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """
        Return a string representation of the tool,
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join([
            f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments
        ])

        return (
            f"Tool Name: {self.name},"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}"
        )

    def __call__(self, *args, **kwargs):
        """
        Invoke the underlying function (callable) with provided arguments.
        """
        return self.func(*args, **kwargs)
```

We could create a Tool with this class using code like the following:

```
calculator_tool = Tool(
    "calculator",                   # name
    "Multiply two integers.",       # description
    calculator,                     # function to call
    [("a", "int"), ("b", "int")],   # inputs (names and types)
    "int",                          # output
)
```

Just to reiterate, with this decorator in place we can implement our tool like this:

```
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())
```

### Model Context Protocol (MCP): a unified tool interface

Model Context Protocol (MCP) is an **open protocol that standardizes how applications provide tools to LLMs**. MCP provides:

- A growing list of pre-built integrations that your LLM can directly plug into
- The flexibility to switch between LLM providers and vendors
- Best practices for securing your data within your infrastructure

## AI Agent Workflow

Agents work in a continuous cycle of:

1. **Thought**: The LLM part of the Agent decides what the next step should be.
2. **Action**: The agent takes an action, by calling the tools with the associated arguments.
3. **Observation**: The model reflects on the response from the tool.

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/system_prompt_cycle.png)

We see here that in the System Message we defined :

- The Agent’s behavior.
- The Tools our Agent has access to, as we described in the previous section.
- The Thought-Action-Observation Cycle, that we bake into the LLM instructions.


```
“What’s the current weather in New York?”
```

**Thought**
```
# Thought
“The user needs current weather information for New York. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details.”
```

**Action**
```
{
    "action": "get_weather",
    "action_input": {
    "location": "New York"
    }
}
```

**Observation**
```
# Agent receives an observation
“Current weather in New York: partly cloudy, 15°C, 60% humidity.”
```

This observation is then added to the prompt as additional context.

**Updated thought**
```
“Now that I have the weather data for New York, I can compile an answer for the user.”
```

**Final Action**
```
Final answer : The current weather in New York is partly cloudy with a temperature of 15°C and 60% humidity.
```

## The ReAct Approach
- A **prompting technique that encourages the model to think “step by step” before acting**.
- A key method is the ReAct approach, which is the concatenation of “Reasoning” (Think) with “Acting” (Act).
- This allows the model to consider sub-steps in more detail, which in general leads to less errors than trying to generate the final solution directly.



| Type of Thought     | Example                                                                                           |
|---------------------|---------------------------------------------------------------------------------------------------|
| Planning            | “I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report” |
| Analysis            | “Based on the error message, the issue appears to be with the database connection parameters”     |
| Decision Making     | “Given the user’s budget constraints, I should recommend the mid-tier option”                     |
| Problem Solving     | “To optimize this code, I should first profile it to identify bottlenecks”                        |
| Memory Integration  | “The user mentioned their preference for Python earlier, so I’ll provide examples in Python”       |
| Self-Reflection     | “My last approach didn’t work well, I should try a different strategy”                            |
| Goal Setting        | “To complete this task, I need to first establish the acceptance criteria”                        |
| Prioritization      | “The security vulnerability should be addressed before adding new features”                      |


![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/ReAct.png)

## Actions

### The Stop and Parse Approach

- One key method for implementing actions is the stop and parse approach. This method ensures that the agent’s output is structured and predictable:

1. Generation in a Structured Format: The agent outputs its intended action in a clear, predetermined format (JSON or code).
2. Stop Generation: Once the action is complete, the agent stops generating additional tokens. This prevents extra or erroneous output.
3. Parsing the Output: An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.
```
Thought: I need to check the current weather for New York.
Action :
{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}
```
### Code Agents

- An alternative approach is using Code Agents.
- The idea is: instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

![alt text](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/code-vs-json-actions.png)

For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:

```
# Code Agent Example: Retrieve Weather Information
def get_weather(city):
    import requests
    api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
    response = requests.get(api_url)
    if response.status_code == 200:
        data = response.json()
        return data.get("weather", "No weather information available")
    else:
        return "Error: Unable to fetch weather data."

# Execute the function and prepare the final answer
result = get_weather("New York")
final_answer = f"The current weather in New York is: {result}"
print(final_answer)
```


## Observe
- Observations are **how an Agent perceives the consequences of its actions.**

In the observation phase, the agent:

- Collects Feedback: Receives data or confirmation that its action was successful (or not).
- Appends Results: Integrates the new information into its existing context, effectively updating its memory.
- Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.

For example, if a weather API returns the data “partly cloudy, 15°C, 60% humidity”, this observation is appended to the agent’s memory (at the end of the prompt).