# Notebook 3 (Lecture 2) - Thinking vs Non-Thinking Agent



In this section we define and compare two agent variants that share the same ReAct tool set but differ in how the underlying LLM allocates cognitive effort.

- Non-Thinking (Standard) Model: produces a direct answer in (roughly) a single forward pass; minimal explicit reasoning; favors speed and cost.
- Thinking (Reasoning) Model: allocates extra internal computation (deliberation steps / structured scratchpad) before emitting the final answer; better for multi-step logic, composition, or fragile numeric reasoning.

### Objectives
1. Understand operational differences between a standard (single-pass) model and a reasoning-enabled model.  
2. Observe how tool usage (ReAct loop) changes: fewer calls vs planned multi-step invocations.  
3. Learn decision criteria: when the extra latency/cost of reasoning yields higher reliability or clarity.  
4. Build intuition for error modes (hallucination vs arithmetic slip vs tool misuse).  

### Why This Matters
As you scale multi-agent systems, choosing when to pay for deeper reasoning is an optimization problem across latency, budget, and accuracy. A common production pattern is a cascading strategy: default to the fast model; escalate to a reasoning model only when signals (task complexity, multi-step arithmetic, ambiguity, failure retries) exceed a threshold.

### Conceptual Contrast
**Non-Thinking (Direct)**  
- Single-shot generation; little or no iterative self-reflection.  
- Lower latency, cheaper, predictable with temperature=0.  
- Ideal for: factual lookup, formatting, simple arithmetic, routing, high-volume batch tasks.  
- Failure pattern: confidently skips required decomposition.

**Thinking (Reasoning-Enhanced)**  
- Allocates internal tokens to plan, decompose, and verify.  
- Better at multi-step numeric or conditional logic, combining tool outputs, resolving ambiguity.  
- Ideal for: chained calculations, comparisons, lightweight analytical explanation, structured transformation.  
- Failure pattern: unnecessary over-elaboration or marginal extra cost on trivial prompts.

### Evaluation Dimensions
| Dimension | Non-Thinking | Thinking |
|----------|--------------|----------|
| Latency | Faster | Slower |
| Cost (tokens/compute) | Lower | Higher |
| Reliability (simple tasks) | High | Similar |
| Multi-step correctness | Moderate | Higher |
| Interpretability | Low (no rationale) | Moderate (concise rationale) |
| Tool orchestration | Minimal | More deliberate sequencing |

### What You'll Build
You will instantiate two ReAct agents sharing the same tools (e.g., arithmetic + sequence generator) but backed by different model configurations:
- Non-Thinking Agent: concise, direct answers.
- Thinking Agent: brief structured reasoning + final answer.

You will run identical tasks through both and compare:
- Number and order of tool calls
- Latency and token usage
- Output style and correctness on composite tasks

### Ready to Get Started?
Next we prepare the environment and register the tools that both agents will leverage.

### 0. Setup the environment

In [None]:
# Import e setup ambiente
from uuid import uuid4
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool
from langgraph.checkpoint.memory import MemorySaver
from utils.stream import display_stream

load_dotenv(override=True)

### 1. Tool Definition

We now define a minimal, deterministic tool set that both agents (non-thinking vs thinking) will share inside the ReAct loop. The goal is to observe *behavioral differences* (planning, tool sequencing, intermediate reasoning) without the noise of complex business logic.

#### Tools Provided
1. add: baseline integer addition.
2. multiply: integer product for composite arithmetic.
3. fibonacci: generates the first n Fibonacci numbers.
4. sum_fibonacci: computes the sum of first n Fibonacci numbers. 

#### Why Fibonacci?
- Produces a known numeric sequence → good for testing whether the reasoning model chooses to call the tool rather than recall values.
- Encourages multi-step decomposition (generate → take subset → sum/compare).

In [None]:
from typing import List

@tool
def add(a: int, b: int) -> int:
    """
    Add two integers.

    Args:
        a (int): First integer.
        b (int): Second integer.

    Returns:
        int: The sum of a and b.
    """
    return a + b

@tool
def multiply(a: int, b: int) -> int:
    """
    Multiply two integers.

    Args:
        a (int): First factor.
        b (int): Second factor.

    Returns:
        int: The product a * b.
    """
    return a * b

@tool
def fibonacci(n: int) -> List[int]:
    """
    Generate the first n Fibonacci numbers.

    Args:
        n (int): Number of terms to generate (n >= 1).

    Returns:
        List[int]: List containing the first n Fibonacci numbers starting at 0.
    """
    out = [0, 1]
    while len(out) < n:
        out.append(out[-1] + out[-2])
    return out[:n]

def sum_fibonacci(n: int) -> int:
    """
    Calculate the sum of the first n Fibonacci numbers.

    Args:
        n (int): Number of terms to sum (n >= 1).

    Returns:
        int: The sum of the first n Fibonacci numbers.
    """
    fib_sequence = fibonacci({'n':n})
    return sum(fib_sequence)

# TOOLS = [add, multiply, fibonacci, sum_fibonacci]
TOOLS = [add, multiply, fibonacci]

### 2. Non-Thinking Agent

This agent uses a fast LLM with no internal “thinking budget”. It aims for minimal latency and cost, producing a direct answer unless a tool call is clearly required.

Workflow:
1. Initialize the lightweight model (temperature=0 for determinism).
2. Provide a concise instruction prompt focused on clear, short answers.
3. Create the ReAct agent by binding the shared tool set (add, multiply, fibonacci).
4. Run test queries and observe

When to use:
- Simple arithmetic
- Direct factual formatting
- High-throughput / low-latency pipelines

#### 1. Initializing the Non-Thinking LLM

In [None]:
NORMAL_MODEL_NAME = "openai:gpt-4o-mini-2024-07-18"

normal_llm = init_chat_model(NORMAL_MODEL_NAME, temperature=0)

#### 2. Crafting the Agent System Prompt

Define a system prompt for your agent. The prompt should instruct the non-thinking agent to respond in a direct, concise, and practical manner, without unnecessary reasoning steps or explanations. This helps ensure fast, predictable outputs focused on clarity and brevity.

In [None]:
NORMAL_AGENT_PROMPT = """Sei un assistente per la risoluzione di problemi matematici: rispondi in modo sintetico e chiaro."""

#### 3. Building the AI Agent

Now that you have the tools and system prompt ready, it’s time to build the AI agent. The agent should have access to both tools and should also have short-term memory to keep track of the conversation context.

In [None]:
memory = MemorySaver()

normal_agent = create_react_agent(
    model=normal_llm,
    tools=TOOLS,
    prompt=NORMAL_AGENT_PROMPT,
    checkpointer=memory
)

#### 4. Testing the Non-Thinking Agent

Now that the agent is fully built, it's time to test it! Interact with the agent using a variety of prompts to see how well it can assist users

In [None]:
normal_config = {"configurable": {"thread_id": str(uuid4())}}

In [None]:
user_task = "Calcola la somma dei primi 20 numeri di Fibonacci."
inputs = {"messages": [("user", user_task)]}


display_stream(normal_agent.stream(inputs, normal_config, stream_mode="values"), thinking=False)

### 3. Thinking Agent

This agent uses a reasoning-capable LLM with an internal “thinking budget” (extra deliberation steps before finalizing the answer). It trades higher latency and cost for improved reliability on multi-step or ambiguous tasks, and tends to orchestrate tools more deliberately (plan → call → verify → answer) while still producing a concise final response.

Workflow:
1. Initialize the reasoning model (set a reasoning/effort parameter, e.g. medium).
2. Provide a system prompt encouraging brief structured reasoning, explicit tool use when needed, and concise final answers.
3. Create the ReAct agent with the same shared tools (add, multiply, fibonacci).
4. Run comparative queries and observe: more frequent multi-step tool calls, clearer decomposition, reduced arithmetic/slip risk.

When to use:
- Multi-step arithmetic or chained calculations
- Sequence generation plus aggregation (e.g. sum / compare Fibonacci values)
- Conditional or comparative logic (“which is larger and by how much?”)
- Ambiguous or partially specified instructions needing clarification
- Higher accuracy or auditability requirements

#### 1. Initializing the Thinking LLM

In [None]:
THINKING_MODEL_NAME = "openai:o4-mini-2025-04-16"

thinking_llm = init_chat_model(
    THINKING_MODEL_NAME,
    reasoning={"effort": "medium"}
)

#### 2. Crafting the Agent System Prompt

Define a system prompt for your reasoning (thinking) agent. It should explicitly instruct the model to:

- Perform a brief internal plan (not all steps must be exposed).
- Use tools whenever a calculation, sequence generation, comparison, or multi-step numeric operation is required (avoid guessing numbers).
- Keep the visible reasoning summary concise (1–3 short lines) and then give the final answer.
- Prefer correct, tool-verified results over speed.
- Return output in a clean structure: (a) short rationale (optional), (b) final answer.

In [None]:
THINKING_MODEL_PROMPT = """Sei un assistente per la risoluzione di problemi matematici: rispondi in modo sintetico e chiaro."""

#### 3. Building the AI Agent

Now that you have the tools and system prompt ready, it’s time to build the AI agent. The agent should have access to both tools and should also have short-term memory to keep track of the conversation context.

In [None]:
memory = MemorySaver()

thinking_agent = create_react_agent(
    model=thinking_llm,
    tools=TOOLS,
    prompt=THINKING_MODEL_PROMPT, 
    checkpointer=memory
)

#### 4. Testing the Thinking Agent

Now that the agent is fully built, it's time to test it! Interact with the agent using a variety of prompts to see how well it can assist users

In [None]:
thinking_config = {"configurable": {"thread_id": str(uuid4())}}

In [None]:
user_task = "Calcola la somma dei primi 20 numeri di Fibonacci."
inputs = {"messages": [("user", user_task)]}

display_stream(
    thinking_agent.stream(inputs, thinking_config, stream_mode="values"),
    thinking=True
)