In [3]:
# MIT License
#
# @title Copyright (c) 2025 Mauricio Tec { display-mode: "form" }

# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.


# Welcome to the HDSI Winter Workshop on LLMs as Autonomous Agents


<img src="https://drive.google.com/uc?export=view&id=1q4SGPmn6sWQhskt4D-1D09q_6C9FDz_L" alt="drawing" width="400"/>


# **Part I: Introduction to Agentic Frameworks**

<a target="_blank" href="https://colab.research.google.com/github/mauriciogtec/hdsi-winter-workshop/blob/main/llm-agents-part1.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Expected completion time: 1 hour


## March 7, 2025  <br> Mauricio Tec



**TL;DR** This interactive tutorial introduces the key conceptual framework of LLM agentic systems and provides hands-on experience with techniques such as Chain of Thought and ReAct. Through multiple examples, it demonstrates both the use of the full-featured `smolagents` agentic library and implementing agents from scratch using basic LLM completion functionality.

<img src="https://drive.google.com/uc?export=view&id=1Agfj2lsK155vzmvG6vB4dScH7RqUWV9B" alt="drawing" width="400"/>

<!-- <img src="https://drive.google.com/uc?export=view&id=11o2zAv2_Cu8BL-FVdoRY8z5IEruO3ElZ" alt="drawing" width="400"/> -->

<!-- https://drive.google.com/file/d/11o2zAv2_Cu8BL-FVdoRY8z5IEruO3ElZ/view?usp=sharing -->


See also:

* [Next (Part II): Grounding Agents with Fine-tuning and RL](https://colab.research.google.com/github/mauriciogtec/hdsi-winter-workshop/blob/main/llm-agents-part2.ipynb)
* [Previous (Pre-assignment): Setup LLM Access & API Keys](https://colab.research.google.com/github/mauriciogtec/hdsi-winter-workshop/blob/main/pre-assignment.ipynb)



## Getting Started: Software Prerequisites & Setup


### Utility Function: Markdown Printing

Before proceeding, we will define a very simple utility function to print nicely in a colab notebook environment with Markdown. This is not really needed, but it will make visualizations easier and nicer.


In [4]:
from IPython.display import Markdown, display

def printmd(string):
    display(Markdown(string))

test = "`This is code`. *This is italics*. **This is bold**."
printmd(test)

`This is code`. *This is italics*. **This is bold**.

### Package Requirements


We will be using the recent `smolagents` library for demonstrating advanced LLM agentic usage. `smolagents` was released only a few months ago by `HuggingFace`. It is designed to be extremely lightweight, yet powerful.

ℹ️ Our goal is not only to use it, but to understand its underpinnings. We will do so by coding our own version of its functionality from scratch using only LLM completion.

📚 We will install `smolagents` with the option `[all]` which also installs `litellm` and other libraries we need.

⚠️ You might receive an error due to the pandas and colab version. You can
safely ignore.

<img src="https://camo.githubusercontent.com/c6efa99360afde7cf829dff3cad81e56573658c1843464dff1fbb30a8f63b082/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f736d6f6c6167656e74732f736d6f6c6167656e74732e706e67" alt="drawing" width="300"/>



In [5]:
%pip install -q -U smolagents[all]

### LLM Setup

Most LLMs today adhere to the OpenAI conversational standard. For this tutorial we will use 🚅 `litellm` with ChatGPT models.


Since `litellm` provides a common interface based on the OpenAI standard for many LLM models and providers, you can easily switch to your prefer LLM provider (e.g., Groq, HFApi, Bedrock).

We will assume that you already have an API key setup correctly in the Jupyter notebook. Make  sure to have the appropriate key added to the Google colab secretes or as an environment variable if running locally.

I will be using `gpt-4o-mini` for all demos, but feel free to use other LLMs by simply changing the LLM model parameter below.

ℹ️ Groq has a free-tier, but the token rate will create problems. The cost of running this notebook is only a few cents of a dollar, but a pro account is needed for many of the providers.

In [6]:
# @title API Keys and Test LLM Call
# Ignore if running locally and API keys are in environment

import os
from google.colab import userdata
import litellm

# Retrieve open AI key from Colab secrets
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Example for other providers
# os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')
# os.environ["HUGGINGFACE_API_KEY"] = userdata.get('HUGGINGFACE_API_KEY')

# Default option: gpt
model_id = "gpt-4o-mini" # @param {type: 'string'}

# Other possible modeks
# model_id = "groq/llama-3.3-70b-versatile"
# model_id = "huggingface/meta-llama/Meta-Llama-3.3-70B-Instruct"

prompt = """
Explain the standard openai chat API for llms based on
a JSON list of messages.  Why is this API so widely used?
Your answer should be short.
Include an example with five entries in a chat history.
Explain the roles (system, user, assistant) and content.
Do we always need the system prompt?
"""

messages=[
    {"role": "system", "content": "You an assistant that loves emojis in every."},
    {"role": "user", "content": prompt}
]

response = litellm.completion(messages=messages, model=model_id)

printmd(response.choices[0].message.content)

The OpenAI Chat API for LLMs (Large Language Models) uses a structured format to handle conversations through a JSON list of messages. This format allows for clear management of dialogue by distinguishing roles and providing context for responses. 💬

### Why is this API so widely used?

1. **Simplicity**: Easy to implement and integrate with applications. 🛠️
2. **Clarity**: Clear message structure helps manage dialogue states. 📜
3. **Flexibility**: Supports various roles and interactions. 🎭

### Example of a Chat History with Five Entries

```json
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather like today?"},
    {"role": "assistant", "content": "It's sunny with a high of 75°F! ☀️"},
    {"role": "user", "content": "Great! Any plans for the weekend?"},
    {"role": "assistant", "content": "I might catch up on some reading. What about you? 📚"}
]
```

### Roles Explained

- **System**: Provides initial instructions and context for the assistant. It sets the behavior and personality. 🏗️
- **User**: Represents the person interacting with the assistant, asking questions or providing information. 👤
- **Assistant**: The AI's responses to user queries, aimed at being informative and helpful. 🤖

### Do We Always Need the System Prompt?

No, it is not always necessary to include the system prompt. However, including it helps establish the assistant's behavior and can improve the relevance of responses. If a more casual or open-ended interaction is desired, it can be omitted. 🙌

#### Test Agentic Framework: `smolagents`

Let's test the `smolagents` framework that we will use as an example of a full agentic pipeline.


In [7]:
import smolagents

# Setup LLM model as object using LiteLLM wrapper
model = smolagents.LiteLLMModel(model_id=model_id)
agent = smolagents.CodeAgent(model=model, tools=[], add_base_tools=True)
agent.run(task="What are billboard top song right now?")

"The current Billboard Hot 100 top song is 'Luther' by Kendrick Lamar & SZA. Other songs in the top 10 were not fully extracted."

_________

# I. Introduction to LLM Agents


* LLM agents are programs where LLMs control the flow of a program to solve a task. You can also think of agentic LLMs as those than can *act* in an interactive environment [(Sumers et al. 2024)](https://arxiv.org/pdf/2309.02427v3)

* Examples include autonomous robots, digital assistants, recommendation systems, video game NPCs, web crawler, etc.

* Even a simple Q&A task can be approach in an *agentic* way, by breaking down the steps to a solution in multiple sequential steps, each which can use tools or produce thoughts.

* Design patterns of LLM agents include:

  * Reasoning
  * Tools
  * Memory
  * Planning

* We will explore and see in action these topics in this part of the tutorial.


<!--
<img src="https://drive.google.com/uc?export=view&id=1SnyymyuwCdj_kFKTx8EQZXXXAPNYaJ9z" alt="drawing" width="125"/> -->



<img src="https://drive.google.com/uc?export=view&id=1en61QPhrx5TcEbCfK_RxmGqOf6O_qySl" alt="drawing" width="250"/>


### Philosophy of this Tutorial

 We aim for a balance between using the current agentic AI stack and understanding core principles. Sometimes we will use existing abstractions, but we will re-implement some of them using open-source in-device LLMs from HuggingFace.



________________


## Agentic Workflow and Sequential Decision Making

* What is common in practically all agentic frameworks is the notion of autonomous sequential decision making.

* Sequential decision making in a nutshell:
    * We start with an initial observation $O_0$, which would typically include information about the task to solve. The agent must take an action $A_0$ based on $O_0$.
    * In non-agentic frameworks, $A_0$ would be the final answer to the query.
    * In an agentic framework, at each time $t\in\{0,\ldots, H-1\}$, the agent must choose an action $A_t$ based on the observation history $(O_0, A_0, \ldots, A_{t-1},O_{t-1})$. The process continues until a final answer is found or an external environments sends a completion signal.
    * In modern LLM agentic setting, the observation history is sometimes called the *memory*. Although the memory can include additional sources as well.

* The agent's actions can be external or internal.
  *  Internal actions typically consist of *thoughts*, aka, *reasoning steps*. For example, an agentic framework may start with a special prompt about *planning* how to tackle a problem over the next few steps. The plan can be revised after every few observations.
  *  Action can include intermediate steps to solve a problem, such as *calls to tools* like querying a database, searching the web, etc.
  *  Some agents are also embedded in an actual *external* interactive environment which sends a *reward* signal.

* **Main takeaways:**
  * **An agentic approach implies solving a problem with multiple steps incolving reasoning, tool calls, environment interactions, etc.**
  * **A non-sequential problem can be turn into a sequential problem applying the agentic philosphy of breaking it into smaller substeps.**.

<!--   
  * For example, consider the following scenario:
    1. A human asks a robot to fetch an object from the kitchen
    2. The robot must process the query. First, it must decide whether it understood the query, or ask a follow up question. It can start an internal reasoning process to decide whether to ask a follow up question or begin retrieving the object. These decision steps are internal. The follow up question would be an external action.
    3. The robot must plan a route to fetch the object. That is a sequence of internal actions.


* The mapping from observation to action, $O_{\leq t} \mapsto A_t$, is called the *agent's policy*. Typically, the goal is to maximize a *reward* $R_t$. For example, the reward may be an indicator of having or not reached a goal. More generally is a measure of how hell the agent is performing a task.
 -->


<img src="https://drive.google.com/uc?export=view&id=1cbcIMaEJeXZGXAJL7KRWPtUd87_UDopV" alt="drawing" height="125"/>





## Agentic Examples


Before diving in the building blocks and mechanisms. Let us review a few examples of agentic pipelines to get a first feel of the concepts above.

In [8]:
# @title ArXiv Paper Finder
from smolagents import (
    CodeAgent,
    LiteLLMModel,
    UserInputTool,
    DuckDuckGoSearchTool,
    VisitWebpageTool,
    tool
)


task = """
Your task is to help find research papers on arxiv related to a topic.
Start by asking the user the topic. Your final answer must contain the links to
 the papers and an arxiv citation with the authors and year.
Include only the top three more relevant papers.
"""


@tool
def arxiv_search(query: str, num_results: int = 10) -> list[dict[str, str]]:
    """
    Search arXiv for research papers and return titles, abstracts, and links.

    Args:
        query: The search term.
        num_results: Number of results to fetch.

    Returns:
        list: A list of dictionaries containing title, abstract, and link.
    """
    import requests
    from xml.etree import ElementTree
    url = f"http://export.arxiv.org/api/query?search_query={query}&max_results={num_results}"
    response = requests.get(url)
    root = ElementTree.fromstring(response.content)

    results = []
    for entry in root.findall("{http://www.w3.org/2005/Atom}entry"):
        title = entry.find("{http://www.w3.org/2005/Atom}title").text.strip()
        abstract = entry.find("{http://www.w3.org/2005/Atom}summary").text.strip()
        link = entry.find("{http://www.w3.org/2005/Atom}id").text.strip()

        results.append({"title": title, "abstract": abstract, "link": link})

    return results


# == List all the desired tools
tools = [
    UserInputTool(),
    VisitWebpageTool(),
    arxiv_search,
]

# == Initialize OpenAI Model class wrapper
model = LiteLLMModel(
    model_id="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY"),
)

# == Initialize Agent
agent = CodeAgent(model=model, tools=tools, planning_interval=2)


# == Start agent loop
result = agent.run(task)
print(result)

SyntaxError: invalid syntax (<ipython-input-8-7e20c03e57d3>, line 5)

In [None]:
# @title Coding Project Template Creator based on Github Trending Repos

from smolagents import (
    CodeAgent,
    LiteLLMModel,
    UserInputTool,
    VisitWebpageTool,
    tool
)

# == Setup model

# model=LiteLLMModel(
#     model_id="groq/llama-3.3-70b-versatile",
#     api_base="https://api.groq.com/openai/v1",
#     api_key=os.getenv("GROQ_API_KEY"),
# )

model = LiteLLMModel(
    model_id="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY"),
)

# == Setup tools

@tool
def github_trending_repos(topic: str, num_results: int=20) -> list[dict[str, str]]:
    """
    Search GitHub for trending repositories based on a topic.

    Args:
        topic: The search topic (e.g., "machine learning").
        num_results: Number of results to fetch.

    Returns:
        list: A list of dictionaries containing repository name, description, stars, and URL.
    """
    import requests
    url = f"https://api.github.com/search/repositories?q={topic}+sort:stars&per_page={num_results}"
    headers = {"Accept": "application/vnd.github.v3+json"}

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        return f"Error: Unable to fetch results (status code {response.status_code})"

    data = response.json()
    results = []

    for repo in data.get("items", []):
        results.append({
            "name": repo["full_name"],
            "description": repo["description"],
            "stars": repo["stargazers_count"],
            "url": repo["html_url"]
        })

    return results


@tool
def write_to_file(path: str, content: str) -> None:
    """
    Write content to a file. Needed for safety since the agent's code is not
    allowed to execute unauthorized functions.

    Args:
        path: The name of the path/file to write to. If the path ends with /
              it is assumed to be a directory.
        content: The content to write to the file.
    """
    import os
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "a") as f:
        f.write(content)


tools=[
    UserInputTool(),
    VisitWebpageTool(),
    github_trending_repos,
    write_to_file,
]

# == Create Agent
agent = CodeAgent(
    model=model,
    tools=tools,
    additional_authorized_imports=["os", "shutil", "random"],
    planning_interval=4,
    max_steps=10,

)

# == Run Task

task = """
Your task is to help design a new coding project template based on existing high-quality
trending repositories.

Here are instructions:
* Ask the user for the overall topic.
* Search the Github trending repositories for the selected topic / requirements
  Choose a repository based on code quality and Github stars to use as blueprint.
* Navigate to the selected repository to inspect the folder and file structure.
  Use it as a reference for the new project template. Create the new project template
  in in a directory called `template_xxxx` where xxxxx is a random number.
  Use the write_to_file tool instead of the open function for safety.
  The files should contain templates of scripts, notebooks, configs, etc., as needed by the project.
* Include a pip requirements with packages based on the selected repo and topic.
* Create a README.md that describes all the file structure (use a nice diagram),
  installation instructions, dependencies, example run code, configs, etc.
  The overall readme must look attractive, use emojis.
  At the end of the readme mention the Github repository used as reference, and a statement
  tht all the contents have been autogenerated by a `smolagents` LLM agent.
* Your final answer should be a dictionary with keys 'readme' and 'path' as in
  `final_answer({'readme': '...', 'path': '...'})`.
  The value of the readme should be the content of README.md.
  The value of path is the location where project was created.
"""

result = agent.run(task)

# == Print results
printmd(f"### README")
printmd(result['readme'])

printmd(f"### Verify file structure at {result['path']}")
%ls -lah {result['path']}

In [None]:
# @title The Robo-Wrangler: A Data Wranger Assistant

import pandas as pd
from smolagents import (
    CodeAgent,
    LiteLLMModel,
)


# Customers DataFrame
customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'customer_name': ['John Doe', 'Jane Smith', 'Emily Johnson', 'Michael Brown']
})

# Orders DataFrame
orders = pd.DataFrame({
    'OrderID': [1001, 1002, 1003],
    'CustomerID': [1, 2, 1],
    'OrderAmount': [250, 500, 150]
})


agent = CodeAgent(
    model=model,
    tools=[],
    planning_interval=2,
    additional_authorized_imports=["pandas", "matplotlib"],

)


task = """
Assist with data wrangling and visualizations. Your solution must be based on
on the two provided datasets: 'customers', and 'orders'.
You must inspect these datasets to understand how to solve the task.
Task: create a barchart of order amounts by customer name. If a customer has no
orders, include zero in the chart.
Show the plot and save it to demo_barchart.png.
"""

agent.run(
    task,
    additional_args={
        "customers": customers,
        "orders": orders,
    }
)


# II. Understanding the Core

We will now understand the principles behind these amazing capabilities.


## Learning Goals


1. We will explore prompting for reasoning, such as chain-of-thought (CoT) [Wei et al., 2022](https://arxiv.org/abs/2201.11903).
2. We will define a minimal mathematical framework for interactive environments with actions, tools, and reasoning, and tools.
3. We will study the `ReAct` framework [(Yao et al. 2022)](https://arxiv.org/abs/2210.03629), which is the most widely used prompting technique to combine actions with reasoning.
4. We will briefly discuss structures for agent memory and correspondance to retrieval augmented generation (RAG) [Lewis et al., 2020](https://arxiv.org/abs/2005.11401) techniques.
5. Compare Tool vs Code calling agents.

Throughout, we will only assume access to the an LLM completion API. The rest, we will 'build from scratch' to mimic the behavior of complex systems such as `smolagents`.





## II.1 Reasoning from Chain of Thought

LLMs are trained for next word/token prediction. As a result, they can fail at very simple tasks requiring multiple steps to solve, particularly with numeric computations. This problem is worsen with smaller LLMs.

One simple solution is to use of chain of thought (CoT). CoT will be our first example of *reasoning* techniques. It will also be the easiest example of *memory* model, in which the memory is simply the agent's previous outputs.


<figure>
<img src="https://drive.google.com/uc?export=view&id=16S6PVq2oDmTwCeuOolQ56GPDTTJdtwJ2" alt="drawing" height="300"/>
<figcaption>Fig. Example from <a href='https://arxiv.org/pdf/2201.11903'>Wei et al. (2022)</a></figcaption>
</figure>

**Why bother about CoT at all?**

Well, smaller LLMs can outperform more expensive ones by simply applying this technique.

<figure>
<img src="https://drive.google.com/uc?export=view&id=1AA-uPPxH5wv7emr4tR41pCG7bhfvmljW" alt="drawing" height="200" width="700"/>
<figcaption>Fig. Benchmarks from <a href='https://arxiv.org/pdf/2201.11903'>Wei et al. (2022)</a></figcaption>
</figure>


In [None]:
# @title ChatGPT can't reason

# Moderns LLMs are bad at counting. Here is a simple example, which would be easy
# for humans to the visual grouping.
# https://medium.com/@konstantine_45825/gpt-4-cant-reason-2eab795e2523

problem = """
How many times is p negated in the following formula:
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~ p
"""

# First test pure LLM completion

result = litellm.completion(
    messages=[{"role": "user", "content": problem}], model=model_id
)
printmd("### Completion-only Solution")
printmd(result.choices[0].message.content)


⚠️ The correct answer is 32.

In [None]:
# @title Chain of Thought: ChatGPT Can't Reason in a Counting Problem

#Here is a problem prompt for LLMs from https://medium.com/@konstantine_45825/gpt-4-cant-reason-2eab795e2523


def chain_of_thought_loop(problem, model_id, max_steps: int = 10):
    """Implement a simple chain of thought loop."""

    cot_instructions = problem + (
        "\n\n* Break down the problem in simple steps until you find the solution.\n"
        "* Begin each step with the tag '[Thought]'. Each step should be a strategy or a simple computation.\n"
        "* Once found, indicate the solution with a new line starting with the tag '[Final Answer]'.\n\n"
    )
    step = 0

    memory = [cot_instructions]
    while step < max_steps:
        step += 1

        # Make prompt from instructions andm emory
        prompt = "\n".join(memory)

        # Get response
        response = litellm.completion(model_id, [{"role": "user", "content": prompt}])
        obs = response.choices[0].message.content

        # Add obs to memory (here the memory is just the accumulated reponses)
        memory.append(obs)

        # Return if complete
        if "Final Answer" in obs:
            break

    if step == max_steps:
        print("Warning: Maximum number of steps reached.")

    return "\n".join(memory)

conversation = chain_of_thought_loop(problem, model_id)

printmd("### Chain-of-thought Solution")
printmd(conversation)

⚠️ In practice, just asking the LLM to solve the problem step by step gives a better implementation of CoT than the actual loop above. However, having implemented the loop will be a good foundation for the more sophisticated agentic frameworks.

## II.2 ReAct: Thoughts + Actions

The react framework is by far the most widely used agentic framework at the moment [(Yao et al. 2022)](https://arxiv.org/abs/2210.03629). The concept and implementation is actually surpisingly simple.

* The idea is to follow a very similar loop as in the CoT example above. However, this time *the agent can call actions* at each step of the loop.

* Actions can be call to tools or, more recently, executable code [(Wang et al., 2024)](https://arxiv.org/abs/2402.01030). The idea of *code as actions* is gaining traction and is a component of the `smolagent` examples we have used.

* Nonetheless, actions as JSON is still popular and it is easy to implement. In particular, modern LLMs have been specially trained with the purpose to be able to correctly call tools as JSON [(Schick et al., 2023)](https://arxiv.org/abs/2302.04761). Therefore we will focus on it first.


<figure>
<img src="https://drive.google.com/uc?export=view&id=1M9lMpO6rdbjvGNCfaPJNZugEwnemKoP_" alt="drawing"  width="400"/>
<figcaption>Fig. Flowchart of ReAct loop</figcaption>
</figure>


<figure>
<img src="https://drive.google.com/uc?export=view&id=1vgSZzz3jAFiqTWhMfZ2xmQ14YzB5W6ir" alt="drawing" height="600" width="700"/>
<figcaption>Fig. Example from <a href='https://arxiv.org/abs/2210.03629'>Yao al. (2022)</a></figcaption>
</figure>

**Summary**

* **Allows to use tools! It expands the universe of what is possible!**
* **Coding ReAct agents can perform so many actions beyond simple tool calls.**
* **ReAct agents can be sees as the analogues of CoT for interactive environments**

In [None]:
# @title ChatGPT is bad at arithmetic

# There are trivial arithmetical problems that not even CoT can solve.

import random
import math

num1 = random.randint(100000, 1000000)
num2 = random.randint(100000, 1000000)

problem = f"""
Multiply the square roots of {num1} and {num2}
"""
print(problem)

# Apply the CoT loop

base_solution = litellm.completion(
    model_id, [{"role": "user", "content": problem}]
).choices[0].message.content

printmd("### Completion-only Solution")
printmd(base_solution)

cot_solution = chain_of_thought_loop(problem, model_id)
printmd("### Chain-of-thought Solution")
printmd(cot_solution)

printmd("### Actual Solution")
printmd(f"{math.sqrt(num1) * math.sqrt(num2):.2f}")


### Implementing the ReAct Loop

Let's implement the simple version that can call JSON tools.

<figure>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif" alt="drawing"  width="1400"/>
<figcaption>Fig. Explanation of React loop in anarithmetic problem. Source: <a href='https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents'>smolagents documentation</a></figcaption>
</figure>




In [None]:
import math
import json

# @title Tool Calling Agent

# == Let's define a mock single calc_tool ==

def calc_tool(a: float, b: float | None = None, op: str = "add") -> float:
    # try to parse a  as float or returne error
    try:
        a = float(a)
    except ValueError:
        raise ValueError(f"Invalid value for 'a': {a}. Must befloat")

    if b is not None:
        try:
            b = float(b)
        except ValueError:
            raise ValueError(f"Invalid value for 'b': {b}. Must be float or null")

    if op in ["add", "subtract", "multiply", "divide"] and b is None:
        raise ValueError(f"Operation '{op}' requires both 'a' and 'b'.")

    if op == "add":
        return a + b
    elif op == "subtract":
        return a - b
    elif op == "multiply":
        return a * b
    elif op == "divide":
        if b == 0:
            raise ValueError("Cannot divide by zero.")
        return a / b
    elif op == "sqrt":
        if a < 0:
            raise ValueError("Cannot compute square root of a negative number.")
        return math.sqrt(a)
    elif op == "square":
        return a ** 2
    else:
        raise ValueError(f"Invalid operation: {op}")


# Define function specifications (Llama 3 compliant for illustration.
function_definitions = """
### Function Definitions

[
    {
        "name": "calc_tool",
        "description": "Performs basic arithmetic operations such as addition, subtraction, multiplication, division, square, and square root.",
        "parameters": {
            "type": "object",
            "required": ["a", "op"],
            "properties": {
                "a": {
                    "type": "number",
                    "description": "The first operand. Always required."
                },
                "b": {
                    "type": "number",
                    "description": "The second operand, required for binary operations (add, subtract, multiply, divide).",
                    "default": null
                },
                "op": {
                    "type": "string",
                    "description": "The operation to perform. One of ['add', 'subtract', 'multiply', 'divide', 'square root', 'square'].",
                    "enum": ["add", "subtract", "multiply", "divide", "sqrt", "square"]
                }
            }
        }
    }
]

For example, to use calc_tool you can call the function as
{"name": "calc_tool", "parameters": {"a": 1, "b": 2, "op": "add"}}
"""

def parse_tool(x: str) -> dict:
    """Utility function to parse JSON.

    - Remove the tag [Tool]
    - Removes trailing spaces and lines
    - Removes ``` and ```json
    """
    x = x.split("[Tool]")[-1].strip()
    x = x.replace("```json", "").replace("```", "")
    return json.loads(x)


# ReAct Reasoning Loop
def react_reasoning_with_tools(problem, model_id, function_definitions, max_steps: int = 10):
    """
    Implements a ReAct (Reasoning + Acting) loop using structured function calling.
    """

    react_instructions = (
        "\n\n* Break down the problem in simple steps until you find the solution.\n"
        "* Your response must consist only of a [Thought], [Tool] or [Final Answer] step."
        "* For a thought step, begin your answer witht the token '[Thought]' followed by a short statment about the problem.\n"
        "* For an tool step, you may use exactly one tool, starting with the token '[Tool]',"
        "* follow by the tool call in the JSON format from the function_definitions.\n"
        "* In your answer you may only provide one thought or tool step.\n"
        "* Once found, indicate the solution with a new line starting with the tag '[Final Answer]'.\n\n"
        f"Below are the available tools: {function_definitions}\n\n"
    )

    step = 0
    memory = [problem]

    while step < max_steps:
        step += 1

        # Construct prompt from memory
        prompt = "\n".join(memory)

        # Get response from LLM
        response = litellm.completion(
            model_id,
            [{"role": "system", "content": react_instructions}, {"role": "user", "content": prompt}]
        )
        obs = response.choices[0].message.content

        # Add observation to memory
        memory.append(obs)

        # Check for tool use
        if "[Tool]" in obs:
            tool_call = parse_tool(obs)
            try:
                if tool_call.get("name") == "calc_tool":
                    params = tool_call["parameters"]
                    result = calc_tool(**params)
                    memory.append(f"[Tool Result] {result}")

                else:
                    raise ValueError(f"Unknown tool: {tool_call['name']}")

            except Exception as e:
                memory.append(f"[Error] {e}. \nTry correcting it.")

        # Stop if final answer is found
        if "[Final Answer]" in obs:
            break

    if step == max_steps:
        memory.append("Warning: Maximum number of steps reached.")

    return "\n\n".join(memory)


printmd("### ReAct Solution with Calculator 📲 😀")

react_tools_solution = react_reasoning_with_tools(
    problem, model_id, function_definitions, max_steps=10
)
printmd(react_tools_solution)


printmd("### Actual Solution 🙏")
printmd(f"{math.sqrt(num1) * math.sqrt(num2):.2f}")

Let's now  see if we can simplify the approach with a coding react agent.

In [None]:
# @title Code-as-action Agent

import math
import json
import litellm  # Assuming litellm is used for LLM calls


tool_definitions = """
You must **only** use the `math` module for calculations.
"""

def parse_code(x: str) -> str:
    """Utility function to parse JSON.

    - Remove the tag [Code]
    - Removes trailing spaces, lines, colons
    - Removes ``` and ```python
    """
    x = x.split("[Code]")[-1].strip()
    x = x.replace("```python", "").replace("```", "")
    return x.lstrip(":  \n")


# Code Execution Agent Loop
def code_execution_agent(problem, model_id, tool_definitions, max_steps=10):
    """
    Implements a Code Execution Agent that generates and runs Python math code.
    """

    code_agent_instructions = (
        "\n\n* Break down the problem in simple easy steps until you find the solution.\n"
        "* Your response must consist only of a thought or code:"
        "  - [Thought]: If a thought, start your answer with the tag [Thought] followed by a short reasoning statement about the problem or previous results.\n"
        "  - [Code]`: If code, start your answer with the tag [Code] followed by self-contained Python code. You can only use one of the available tools or control flow.\n"
        "* Make sure that your code saves the result in a variable called `final_result` if it is the final answer.\n"
        "* You cannot combine thoughts and code in the same answer.\n\n"
        f"Below are the available tools: {tool_definitions}\n\n"
    )

    step = 0
    memory = [problem]

    while step < max_steps:
        step += 1

        # Construct prompt from memory
        prompt = "\n\n".join(memory)

        # Get response from LLM
        response = litellm.completion(
            model_id,
            [{"role": "system", "content": code_agent_instructions}, {"role": "user", "content": prompt}]
        )
        obs = response.choices[0].message.content.strip()


        # Handle code execution
        if "[Code]" in obs:
            code_block = parse_code(obs)

            # Append code to memory
            memory.append(f"[Code]\n\n```{code_block}```\n\n")

            # Execute code block
            try:
              result = None
              local_vars = {}

              exec(code_block, {"math": math}, local_vars)

              memory.append(f"[Code Result] {local_vars}")

              if local_vars['final_result'] is not None:
                  memory.append(f"[Final Answer] {local_vars['final_result']}")
                  break

            except Exception as e:
                print(f"[Error] {e}")
                memory.append(f"[Error] {str(e)}. \nTry correcting it.")

        else:
            # Append as is
            memory.append(obs)

    if step == max_steps:
        memory.append("Warning: Maximum number of steps reached.")

    return "\n\n".join(memory)


printmd("### ReAct Solution with Coding Agent 🤖 🦾")

# code_agent_solution =
code_agent_solution = code_execution_agent(problem, model_id, tool_definitions, max_steps=5)
printmd(code_agent_solution)


printmd("### Actual Solution 🙏")
printmd(f"{math.sqrt(num1) * math.sqrt(num2):.2f}")

## We have only touched the surface...


We have only seen but some of the most basic approaches to agentic workflows.


There are many ➡️ Next steps ➡️

* 💾 More sophisticated memory: in the examples, we simply use the thought, observation, action history as the agent's memory. But for long sequences, we can use a RAG agent. Remember, each token costs money

* 🌲 Tree of thought: Many workflows emply a type of tree or graph search over thoughts. They can also evaluate self-consistency over thought replication to choose which path to explore. This approach increases the number of calls needed to the LLM, but often improves performance greaty.

* 👯 So far, we have approached the problem in a single-agent way. But many agentic frameworks allow to have multiple agents. A simple design is having an orchestrator agent which uses other agents as tools, but there are many use cases and designs.

* 🤖 In the next part of the tutorial, we will cover how to improve an agent performance with fine tuning and reinforcement learning.


<figure>
<img src="https://drive.google.com/uc?export=view&id=1tU2FRhTOPV0khzcB63dxAAqmHVym_HJ8" alt="drawing" width="800"/>
<figcaption>
<b>Fig</b>. Benchmarks from various agentic workflows for the HumanEval benchmark by OpenAi, which measures the LLM's ability in coding tasks. The figure shows that GPT 3.5 performs poorly and worst than GPT 4 in zero-shot, but when equipped with an agentic framework it is much stronger and matches GPT 4. Figure is reproduced from <a href='https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/'>this blog post by Andrew Ng</a>
</figcaption>
</figure>


# Conclusion

### 🤗 What have we learned? 🤔

* 🪜 In an agentic framework, a problem is solved step by step.
* 🆘 LLMs are trained for text completion only. Hence, they struggle at simple operations such as counting or arithmetic which are not aligned with the next-token prediction training.
* 🙇‍♂️ They can immediately solve more complex task by *thinking step by step*. We can implement it with the chain-of-thought prompting technique.
* 🛠️ By leveraging their ability to call tools (code or JSON), we can fill the gap in their abilities. We can implement it with a simple react loop, which underlies most agentic frameworks.


<br>
<img src="https://drive.google.com/uc?export=view&id=1gA9lNXqJunfai38RS6DSRenuXKFysHW6" alt="drawing" width="500"/>
