# Tool Calling with Guidance and Outlines

This notebook mirrors the original `docs/tutorials/guidance/tool_calling.ipynb` tutorial and extends it with an Outlines-based implementation so we can evaluate both stacks side-by-side. The sections follow the agreed outline and are intended to be executed top-to-bottom.


## 1. Environment Setup

We install or verify the required dependencies and configure shared imports. Guidance and Outlines both rely on a working LLM backend (OpenAI, vLLM, llama.cpp, etc.). The code assumes an OpenAI-compatible HTTP API is available via environment variables such as `OPENAI_API_KEY`.


In [1]:
# Optional: install dependencies into the current environment.
# Uncomment the lines below if you need to install packages.
# %pip install guidance outlines google-genai aisuite llama-cpp-python httpx sympy pydantic pandas


## 2. Replicate Guidance Tool-Calling Workflow

We reuse the original tutorial's tools to confirm the baseline behaviour before porting anything to Outlines. The cells below define the helpers, register them with Guidance, and (optionally) execute a request against a live OpenAI-compatible endpoint. Set `RUN_GUIDANCE_DEMO = True` if you have working OpenAI credentials; otherwise the demo remains skipped by default while we rely on Gemini for Outlines.

In [None]:
import os
import random
import re
import shutil
import subprocess
from ast import literal_eval
from functools import lru_cache
from typing import Literal
from urllib.parse import urlparse

from pydantic import BaseModel, ConfigDict, Field

try:
    from openai import BadRequestError
except ImportError:  # pragma: no cover - optional dependency
    BadRequestError = Exception  # type: ignore[assignment]

try:
    from sympy import sympify
except ImportError:  # pragma: no cover - optional dependency
    sympify = None  # type: ignore[assignment]

RNG = random.SystemRandom()
DOMAIN_PATTERN = re.compile(r"(?:(?:[A-Za-z0-9-]+\.)+[A-Za-z]{2,})")
WHOIS_EXECUTABLE = shutil.which("whois")


def get_weather_func(
    city: str, unit: Literal["celsius", "fahrenheit"] = "celsius"
) -> str:
    """Return a mock weather string so the demos stay deterministic-friendly."""
    temp = RNG.randint(-10, 35)
    weather = "snowy" if temp <= 0 else "cloudy" if temp <= 20 else "sunny"
    if unit == "fahrenheit":
        temp = temp * 9 / 5 + 32
    return (
        f"The current temperature in {city} is {temp} degrees "
        f"{unit} and it is {weather}."
    )


def whois_lookup(url: str) -> str:
    """Perform a WHOIS lookup for the provided URL and return the raw output."""
    if WHOIS_EXECUTABLE is None:
        msg = "The 'whois' executable is not available on this system."
        raise RuntimeError(msg)
    domain = urlparse(url).netloc or url
    if not domain or not DOMAIN_PATTERN.fullmatch(domain):
        msg = f"Invalid domain: {domain}"
        raise ValueError(msg)
    result = subprocess.run(  # noqa: S603
        [WHOIS_EXECUTABLE, domain],
        check=False,
        capture_output=True,
        text=True,
        timeout=30,
    )
    return result.stdout


def evaluate_expression(expr: str) -> str:
    """Safely evaluate a simple arithmetic expression using sympy as a guard."""
    if sympify is None:
        msg = "Install sympy to enable the calculator tool."
        raise RuntimeError(msg)
    sympy_expr = sympify(expr)
    if not sympy_expr.is_number:
        msg = f"Invalid expression: {expr}"
        raise ValueError(msg)
    return str(literal_eval(str(sympy_expr)))


class GetWeatherArgs(BaseModel):
    """Structured input expected by the weather tool."""

    city: str = Field(description="The city to inspect.")
    unit: Literal["celsius", "fahrenheit"] = Field(description="Unit selection.")

    model_config = ConfigDict(extra="forbid")


RUN_GUIDANCE_DEMO = os.getenv("RUN_GUIDANCE_DEMO", "false").lower() == "true"
GUIDANCE_MODEL_NAME = os.getenv("GUIDANCE_MODEL_NAME", "gpt-5-mini")

In [3]:
from guidance import Tool, assistant, gen, user
from guidance.models import OpenAI

get_weather_tool = Tool.from_callable(
    callable=get_weather_func,
    name="get_weather",
    description="Get the current weather for a given city.",
    parameters=GetWeatherArgs,
)
whois_tool = Tool.from_regex(
    pattern=r"https?:\/\/[^\s]+",
    callable=whois_lookup,
    name="whois_lookup",
    description="Perform a WHOIS lookup on a URL.",
)
calculator_tool = Tool.from_callable(
    callable=evaluate_expression,
    name="calculator",
    description="Evaluate a basic arithmetic expression.",
)
guidance_tools = [get_weather_tool, whois_tool, calculator_tool]


@lru_cache(maxsize=1)
def get_guidance_model() -> OpenAI:
    """Instantiate and cache the Guidance model backend."""
    return OpenAI(GUIDANCE_MODEL_NAME, echo=False)

In [4]:
def run_guidance_demo(prompt: str) -> str:
    """Execute a guidance tool call while handling optional API failures gracefully."""
    if not RUN_GUIDANCE_DEMO:
        return "Skipping live Guidance call. Set RUN_GUIDANCE_DEMO = True to execute."
    lm = get_guidance_model()
    try:
        with user():
            lm += prompt
        with assistant():
            lm += gen(tools=guidance_tools, tool_choice="required")
        with assistant():
            lm += gen()
        return str(lm)
    except BadRequestError:
        return (
            "OpenAI rejected the streaming request (often due to unverified org). "
            "Set RUN_GUIDANCE_DEMO = False or switch to a streaming-capable backend."
        )


guidance_demo_prompt = "Who owns the openai.com website?"
run_guidance_demo(guidance_demo_prompt)

'OpenAI rejected the streaming request (often due to unverified org). Set RUN_GUIDANCE_DEMO = False or switch to a streaming-capable backend.'

## 3. Port the Workflow to Outlines

Outlines does not execute callbacks automatically, so we compile a structured schema that asks the model which tool to call and with what arguments. The runtime then invokes the Python callable, records the observation, and loops until a final answer is produced. This mirrors the behaviour of Guidance while keeping the decoding constrained.


In [5]:
import json
from collections.abc import Callable
from enum import Enum
from typing import Annotated, Any

from outlines import Generator, json_schema
from pydantic import TypeAdapter

from gvo.llm import get_outlines_runtime

OUTLINES_PROVIDER = (
    os.getenv("GVO_OUTLINES_PROVIDER") or os.getenv("OUTLINES_PROVIDER") or "gemini"
).lower()
if OUTLINES_PROVIDER not in {"gemini", "openai"}:
    OUTLINES_PROVIDER = "gemini"

_DEFAULT_OUTLINES_MODELS = {
    "gemini": "gemini-1.5-flash",
    "openai": "gpt-5-mini",
}

RUN_OUTLINES_DEMO = True
OUTLINES_MODEL_NAME = os.getenv(
    "OUTLINES_MODEL_NAME", _DEFAULT_OUTLINES_MODELS[OUTLINES_PROVIDER]
)


class ToolName(str, Enum):
    """Enumerate the available tool names exposed to the model."""

    get_weather = "get_weather"
    whois_lookup = "whois_lookup"
    calculator = "calculator"
    final_answer = "final_answer"


class WeatherCall(BaseModel):
    """Payload for the weather lookup tool."""

    tool: Literal["get_weather"] = "get_weather"
    arguments: GetWeatherArgs


class WhoisArgs(BaseModel):
    """Arguments expected by the WHOIS lookup tool."""

    url: str = Field(pattern=r"https?://[^\s]+", description="Absolute URL to inspect.")


class WhoisCall(BaseModel):
    """Payload for the WHOIS lookup tool."""

    tool: Literal["whois_lookup"] = "whois_lookup"
    arguments: WhoisArgs


class CalculatorArgs(BaseModel):
    """Arguments expected by the calculator tool."""

    expression: str = Field(description="Arithmetic expression such as 2+2")


class CalculatorCall(BaseModel):
    """Payload for the calculator tool."""

    tool: Literal["calculator"] = "calculator"
    arguments: CalculatorArgs


class FinalAnswer(BaseModel):
    """Terminal payload emitted when no more tools are required."""

    tool: Literal["final_answer"] = "final_answer"
    answer: str = Field(description="The final natural language response.")


ToolDecisionType = WeatherCall | WhoisCall | CalculatorCall | FinalAnswer

TOOL_DECISION_SCHEMA = json_schema(TypeAdapter(ToolDecisionType).json_schema())

TOOL_REGISTRY: dict[str, Callable[[dict[str, Any]], str]] = {
    "get_weather": lambda payload: get_weather_func(**payload),
    "whois_lookup": lambda payload: whois_lookup(payload["url"]),
    "calculator": lambda payload: str(evaluate_expression(payload["expression"])),
}

In [6]:
OUTLINES_SYSTEM_PROMPT = (
    "You are an assistant that can call tools to answer the user's question.\n"
    "Use the provided tool names exactly. When you have enough information, "
    "return `final_answer`."
)


def render_transcript(question: str, scratchpad: str) -> str:
    """Format the conversation transcript expected by the Outlines runner."""
    base = f"{OUTLINES_SYSTEM_PROMPT}\n\nQuestion: {question}\nScratchpad:"
    if scratchpad:
        return f"{base}\n{scratchpad.strip()}"
    return base


def _coerce_payload(payload: object) -> object:
    """Convert pydantic models or other objects into plain dictionaries."""
    if hasattr(payload, "model_dump"):
        return payload.model_dump()
    return payload


def _normalize_decision(decision: object) -> dict[str, object]:
    """Normalise the model response into a dictionary for downstream handling."""
    if isinstance(decision, str):
        return json.loads(decision)
    if isinstance(decision, dict):
        return {key: _coerce_payload(value) for key, value in decision.items()}
    coerced = _coerce_payload(decision)
    if isinstance(coerced, dict):
        return coerced
    msg = f"Unexpected decision payload type: {type(decision)!r}"
    raise TypeError(msg)


def run_outlines_agent(
    question: str,
    max_turns: int = 5,
    inference_overrides: dict[str, object] | None = None,
) -> str:
    """Loop until the model emits `final_answer` or `max_turns` turns elapse."""
    if not RUN_OUTLINES_DEMO:
        return "Skipping live Outlines call. Set RUN_OUTLINES_DEMO = True to execute."
    try:
        runtime = get_outlines_runtime()
    except RuntimeError as exc:
        return f"Failed to initialize Outlines model: {exc}"
    generator = Generator(runtime.model, TOOL_DECISION_SCHEMA)
    scratchpad = ""
    for turn in range(max_turns):
        prompt = render_transcript(question, scratchpad)
        inference_kwargs = dict(runtime.inference_defaults)
        if inference_overrides:
            inference_kwargs.update(inference_overrides)
        raw_decision = generator(prompt, **inference_kwargs)
        decision = _normalize_decision(raw_decision)
        tool_name = decision.get("tool")
        if tool_name == ToolName.final_answer.value:
            return str(decision.get("answer", ""))
        arguments = _coerce_payload(decision.get("arguments", {})) or {}
        executor = TOOL_REGISTRY.get(tool_name)
        if executor is None:
            observation = f"Unknown tool: {tool_name}"
        else:
            try:
                observation = executor(arguments)
            except (RuntimeError, ValueError) as exc:
                observation = f"Tool error: {exc}"
        scratchpad += (
            f"\nTurn {turn + 1}"
            f"\nTool: {tool_name}"
            f"\nArgs: {json.dumps(arguments)}"
            f"\nObservation: {observation}"
        )
    return "Reached max turns without final answer."


outlines_demo_prompt = (
    "If Johnny has 5 apples and gives 2 to Mary, how many apples remain?"
)
run_outlines_agent(outlines_demo_prompt)

ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': '* GenerateContentRequest.generation_config.response_schema.properties: should be non-empty for OBJECT type\n', 'status': 'INVALID_ARGUMENT'}}

## 4. Side-by-Side Harness

This helper runs the same question through both stacks (respecting the demo flags) so you can diff the traces and assess parity.


In [None]:
def compare_workflows(question: str) -> dict[str, str]:
    """Execute the same query via Guidance and Outlines runners."""
    return {
        "guidance": run_guidance_demo(question),
        "outlines": run_outlines_agent(question),
    }


comparison_prompt = "What is the weather like in San Francisco today?"
compare_workflows(comparison_prompt)

{'guidance': 'OpenAI rejected the streaming request (often due to unverified org). Set RUN_GUIDANCE_DEMO = False or switch to a streaming-capable backend.',
 'outlines': 'Skipping live Outlines call. Set RUN_OUTLINES_DEMO = True to execute.'}

## 5. Where to Go Next

- Toggle `RUN_GUIDANCE_DEMO` and `RUN_OUTLINES_DEMO` to compare real traces once credentials are set up.
- Swap `GUIDANCE_MODEL_NAME` or set `GVO_OUTLINES_PROVIDER`/`OUTLINES_MODEL_NAME` to experiment with other backends (e.g. OpenAI or self-hosted llama.cpp).
- Extend `TOOL_REGISTRY` with more capabilities and update the Outlines discriminated union to benchmark richer agents.
