# OpenTools Agents Demo with Retrieval + Image (ReAct, OctoTools, OpenTools)

This notebook shows how **ReAct**, **OctoTools**, and **OpenTools** use tools for a **single image-based task** that can also benefit from web retrieval.

**Task:** Use tools to look at the illusion image in `docs/assets/image.jpg` and answer which orange circle **appears** bigger (left or right), and briefly explain the illusion.

**Table of contents**

1. Setup
2. ReAct with retrieval + image
3. OctoTools with retrieval + image
4. OpenTools with retrieval + image


## 1. Setup

Run this once to configure imports and helper utilities.

### How these agents behave (with retrieval + image)

In this retrieval+image demo, each agent wraps the same core ideas as in `2_agent_running.ipynb`, but now they can also use tools like `Visual_AI_Tool` and web / retrieval tools:

- **ReAct**: Uses a Thought ‚Üí Action (tool) ‚Üí Observation loop, deciding when to call visual or retrieval tools and when to stop with a Final Answer.
- **OctoTools**: Builds an explicit plan, chooses tools, and writes intermediate notes into memory (you‚Äôll see sections like *Query Analysis*, *Action Prediction*, *Command Generation*, *Command Execution*, and *Context Verification* before the final answer).
- **OpenTools**: Breaks the overall question into **sub-problems**, assigns each to a specialized sub-agent (e.g. search-oriented), routes tool calls through a generator/executor, and then verifies and aggregates results via a verifier + global memory.

When you scroll through the outputs:
- Treat the long logs as a **trace** of decisions and tool calls.
- The **final answer** is always clearly marked (and also stored in `direct_output` / `final_output` in the returned Python dict), so you can skim the trace or jump straight to the summary depending on what you care about.

In [1]:
import os
import json
import sys
sys.path.insert(0, "..")
sys.path.insert(0, "src")
from opentools import UnifiedSolver

  from .autonotebook import tqdm as notebook_tqdm


## 2. ReAct with vs. without Tool Retrieval (All Tools Available)

ReAct alternates between **Thought ‚Üí Action (tool) ‚Üí Observation**. In this section we enable **all tools** and compare two settings:

- **(a) Tool retrieval disabled**: the agent sees the entire toolset context.
- **(b) Tool retrieval enabled**: the agent first retrieves a smaller, relevant subset of tools.

The goal is to demonstrate how **tool retrieval simplifies the agent‚Äôs context** and reduces overall token usage while preserving correctness.

### a. Tool Retrieval **disabled**

In [7]:
react_solver = UnifiedSolver(
    agent_name="react",
    llm_engine_name="gpt-5-mini",
    verbose=True,
    enabled_tools=[
        "all"
    ],
    output_types="direct",
    enable_faiss_retrieval=False,
)

react_question = (
    "Use tool to solve this question, calculate 10 * 10 + 5"
)

react_result = react_solver.solve(
    question=react_question,
)

print("ReAct (all tools without any tool retrieval)", react_result)
print("\nDirect output:\n", react_result.get("direct_output"))


[94m[14:37:17][ReAct][INFO] Initializing ReAct reasoning components üß†...[0m
[94m[14:37:17][ReAct][INFO] ReAct reasoning components initialized successfully üß†[0m
[94m[14:37:17][ReAct][INFO] Enable FAISS retrieval: False at ReAct[0m
[94m[14:37:17][ReAct][INFO] Enabled tools üîß: ['all'][0m
[94m[14:37:17][ReAct][INFO] Initializing tool-based agent components...[0m
[94m[14:37:17][ReAct][INFO] Initializing tool capabilities...[0m
Error loading module tool: cannot import name 'BrowserProfile' from 'browser_use' (/Users/hydang/miniconda3/envs/opentools_submit/lib/python3.11/site-packages/browser_use/__init__.py)
Please install ffmpeg to enable full audio processing capabilities.
Error loading module tool: invalid syntax (tool.py, line 192)
Please install ffmpeg to enable full audio processing capabilities.
[94m[14:37:17][ReAct][INFO] Available tools that is successfully loaded üîß: ['Generalist_Solution_Generator_Tool', 'Video_Processing_Tool', 'Calendar_Calculation_Tool'

### b. Tool Retrieval **enabled**

In [8]:
react_solver = UnifiedSolver(
    agent_name="react",
    llm_engine_name="gpt-5-mini",
    verbose=True,
    enabled_tools=[
        "all"
    ],
    output_types="direct",
    enable_faiss_retrieval=True, #enable tool retrieval here
)

react_question = (
    "Use tool to solve this question, calculate 10 * 10 + 5"
)

react_result = react_solver.solve(
    question=react_question,
)

print("ReAct (all tools without any tool retrieval)", react_result)
print("\nDirect output:\n", react_result.get("direct_output"))


[94m[14:37:59][ReAct][INFO] Initializing ReAct reasoning components üß†...[0m
[94m[14:37:59][ReAct][INFO] ReAct reasoning components initialized successfully üß†[0m
[94m[14:37:59][ReAct][INFO] Enable FAISS retrieval: True at ReAct[0m
[94m[14:37:59][ReAct][INFO] Enabled tools üîß: ['all'][0m
[94m[14:37:59][ReAct][INFO] Initializing tool-based agent components...[0m
[94m[14:37:59][ReAct][INFO] Initializing tool capabilities...[0m
Error loading module tool: cannot import name 'BrowserProfile' from 'browser_use' (/Users/hydang/miniconda3/envs/opentools_submit/lib/python3.11/site-packages/browser_use/__init__.py)
Please install ffmpeg to enable full audio processing capabilities.
Error loading module tool: invalid syntax (tool.py, line 192)
Please install ffmpeg to enable full audio processing capabilities.
[94m[14:38:00][ReAct][INFO] Available tools that is successfully loaded üîß: ['Generalist_Solution_Generator_Tool', 'Video_Processing_Tool', 'Calendar_Calculation_Tool',

In both runs, ReAct returns the same correct answer (`105`), but the **token cost differs dramatically** depending on whether **tool retrieval** is enabled. With **tool retrieval enabled**, the agent uses **6,133 total tokens** versus **41,111 total tokens** without retrieval‚Äîabout a **6.7√ó reduction** in total tokens (‚âà **85% fewer**). This illustrates how tool retrieval can reduce prompt bloat when many tools are available, by selecting a smaller, more relevant tool subset for the agent.


## 3. ReAct with retrieval + image

ReAct alternates between **Thought ‚Üí Action (tool) ‚Üí Observation**. Here we
enable both **visual** and **retrieval** tools, but keep the task to a single,
simple question about the illusion image.

In [2]:
react_solver = UnifiedSolver(
    agent_name="react",
    llm_engine_name="gpt-5-mini",
    verbose=True,
    enabled_tools=[
        "Visual_AI_Tool",
        "Search_Engine_Tool",
        "Wiki_Search_Tool",
        "URL_Text_Extractor_Tool",
    ],
    output_types="direct",
    enable_faiss_retrieval=True,
)

react_question = (
    "Look at the provided image and use tools as needed. "
    "Question: What color is the dog in the image, what breed is it, and what is the dog lying next to? "
    "Answer in one or two short sentences."
)

react_result = react_solver.solve(
    question=react_question,
    image_path=r"../assets/image.jpg",
)

print("ReAct (retrieval + image)", react_result)
print("\nDirect output:\n", react_result.get("direct_output"))


[94m[14:29:00][ReAct][INFO] Initializing ReAct reasoning components üß†...[0m
[94m[14:29:01][ReAct][INFO] ReAct reasoning components initialized successfully üß†[0m
[94m[14:29:01][ReAct][INFO] Enable FAISS retrieval: True at ReAct[0m
[94m[14:29:01][ReAct][INFO] Enabled tools üîß: ['Visual_AI_Tool', 'Search_Engine_Tool', 'Wiki_Search_Tool', 'URL_Text_Extractor_Tool'][0m
[94m[14:29:01][ReAct][INFO] Initializing tool-based agent components...[0m
[94m[14:29:01][ReAct][INFO] Initializing tool capabilities...[0m




[94m[14:29:03][ReAct][INFO] Available tools that is successfully loaded üîß: ['Wiki_Search_Tool', 'Search_Engine_Tool', 'Visual_AI_Tool', 'URL_Text_Extractor_Tool'][0m
[94m[14:29:03][ReAct][INFO] Tool capabilities initialized successfully[0m
Loaded 37 tools into FAISS index
[94m[14:29:04][ReAct][INFO] FAISS tool retrieval enabled[0m
[94m[14:29:04][ReAct][INFO] Tool-based agent components initialized successfully[0m
UnifiedSolver initialized with agent: ReAct
Agent description: Reasoning and Acting agent - alternates between thinking and tool usage
[94m[14:29:04][ReAct][INFO] Received question: Look at the provided image and use tools as needed. Question: What color is the dog in the image, what breed is it, and what is the dog lying next to? Answer in one or two short sentences.[0m
[94m[14:29:04][ReAct][INFO] Received image: ../assets/image.jpg[0m
Using reasoning model: gpt-5-mini
Expanded query: Look at the provided image and use tools as needed. Question: What color is t

## 4. OpenTools with retrieval + image

OpenTools uses a planner‚Äìgenerator‚Äìexecutor loop over tools. We configure it
with the same visual + retrieval tools and the same illusion question.

In [3]:
opentools_solver = UnifiedSolver(
    agent_name="opentools",
    llm_engine_name="gpt-5-mini",
    verbose=True,
    enabled_tools=[
        "Visual_AI_Tool",
        "Search_Engine_Tool",
        "Wiki_Search_Tool",
        "URL_Text_Extractor_Tool",
    ],
    output_types="direct",
    enable_faiss_retrieval=True,
)

opentools_question = (
    "Look at the provided image and use tools as needed. "
    "Question: What color is the dog in the image, what breed is it, and what is the dog lying next to? "
    "Answer in one or two short sentences."
)

opentools_result = opentools_solver.solve(
    question=opentools_question,
    image_path=r"../assets/image.jpg",
)

print("OpenTools (retrieval + image)", opentools_result)
print("\nDirect output:\n", opentools_result.get("direct_output"))


[94m[14:30:16][OpenTools][INFO] Enabled tools üîß: ['Visual_AI_Tool', 'Search_Engine_Tool', 'Wiki_Search_Tool', 'URL_Text_Extractor_Tool'][0m
[94m[14:30:16][OpenTools][INFO] Initializing tool-based agent components...[0m
[94m[14:30:16][OpenTools][INFO] Initializing tool capabilities...[0m
[94m[14:30:16][OpenTools][INFO] Available tools that is successfully loaded üîß: ['Wiki_Search_Tool', 'Search_Engine_Tool', 'Visual_AI_Tool', 'URL_Text_Extractor_Tool'][0m
[94m[14:30:16][OpenTools][INFO] Tool capabilities initialized successfully[0m
Loaded 37 tools into FAISS index
[94m[14:30:16][OpenTools][INFO] FAISS tool retrieval enabled[0m
[94m[14:30:16][OpenTools][INFO] Tool-based agent components initialized successfully[0m
UnifiedSolver initialized with agent: OpenTools
Agent description: OpenTools agent - uses tools to solve problems
[94m[14:30:16][OpenTools][INFO] Received question: Look at the provided image and use tools as needed. Question: What color is the dog in the im