<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/030_Output_Schema.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A **critical concept** in tool-using agents:
🔑 **LLMs speak natural language**, but tools require **structured input** — usually **JSON**.

---

## Structured Output

When an LLM is used inside an agent, its job isn’t just to generate human-readable text.
Instead, it must **output structured data** that follows a **strict schema**, so the system (you, your code, or OpenAI’s API) can:

* **Recognize**: What tool it wants to call
* **Extract**: The tool name and arguments
* **Execute**: A corresponding Python function or API call

And that’s where the format comes in 👇

---

## 🧰 Two Modes of LLM Interaction

| Type                            | Description                               | Format               |
| ------------------------------- | ----------------------------------------- | -------------------- |
| **Chat/Conversation**           | Normal assistant replies                  | 🗣 Natural language  |
| **Tool Use (Function Calling)** | Structured interaction with external code | ✅ **JSON structure** |

---

## 🔁 So What’s the Challenge?

By default, LLMs love to generate beautiful English text. But when tools are involved, the output **must follow a specific structure**, such as:

```json
{
  "tool_call": {
    "name": "search_file_names",
    "arguments": {
      "keyword": "memory",
      "case_sensitive": false
    }
  }
}
```

If the LLM generated:

> I think we should search for "memory" — let’s use the tool for that!

…it wouldn’t be usable by the tool router.

---

## ✅ How OpenAI Helps With This

When using the **`tools`** and **`tool_choice="auto"`** features in `chat.completions.create()`, you're telling the LLM:

> “Don't give me an English-language response — I want you to return a tool call in structured JSON instead.”

Then, OpenAI **forces** the model to return a `tool_calls` block, like:

```json
{
  "tool_calls": [
    {
      "function": {
        "name": "search_file_names",
        "arguments": "{ \"keyword\": \"memory\", \"case_sensitive\": false }"
      }
    }
  ]
}
```

So your code can extract it like this:

```python
tool_calls = response.choices[0].message.tool_calls
```

---

## 🤖 What You Should Learn

### 1. **Tool-enabled LLMs return JSON, not natural language**

* But **only if** you use `tools` and `tool_choice="auto"`.

### 2. **Your agent code needs to parse the output**

* Typically, you’ll `json.loads(arguments)` to convert the string to a dictionary.

### 3. **Well-structured schemas improve tool usage**

* The better your schema (with descriptions, types, required fields), the better the LLM is at calling tools correctly.





## 🧠 Big Picture: Why This Feature Exists

Language models (like GPT-3.5 / GPT-4) are *masters* of generating natural language — but tools (your code, APIs, databases) require **structured input**.

The **`tools` + `tool_choice="auto"`** setup lets you tell the LLM:

> “Instead of just replying in English, *decide if a tool should be used*, and if so, return the tool call in a special JSON format that I can extract and run.”

---

## 🧰 What You Provide to OpenAI

In your API call, you send:

```python
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
    tools=tools,               # ✅ Tell the LLM what tools exist
    tool_choice="auto",        # ✅ Let it decide which to use
)
```

You also provide a list of **tools**, like:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_file_names",
            "description": "Searches filenames for a keyword.",
            "parameters": {
                "type": "object",
                "properties": {
                    "keyword": { "type": "string" },
                    "case_sensitive": { "type": "boolean" }
                },
                "required": ["keyword"]
            }
        }
    }
]
```

This schema defines what the LLM **can** call — the tools, their names, argument names/types, etc.

---

## 🤖 What the LLM Does

When it gets your user message, the model decides:

> “Ah! This request looks like it’s asking for a tool. I know a tool called `search_file_names`. Let me return a tool call.”

Instead of responding with:

> “Sure, I can search the files for ‘memory’...”

…it returns a **special structure** called `tool_calls`:

```json
{
  "tool_calls": [
    {
      "id": "call_xyz",             // optional internal ID
      "type": "function",
      "function": {
        "name": "search_file_names",
        "arguments": "{ \"keyword\": \"memory\", \"case_sensitive\": false }"
      }
    }
  ]
}
```

⚠️ **Important Note**: the `arguments` field is a **stringified JSON**. So you must run `json.loads()` on it to use it as a dictionary in Python.

---

## 🧪 Example in Your Code

If you inspect the response:

```python
choice = response.choices[0]
tool_call = choice.message.tool_calls[0]
tool_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
```

Then you can route to the right function like:

```python
result = tool_router(tool_name, args)
```

---

## ✅ Benefits of This System

* **No custom prompt hacks** needed to extract JSON
* LLM learns how to use tools just from the schema
* OpenAI ensures the LLM follows the format
* You can build **modular, extensible agent systems** easily

---

## 📎 Summary

| Step                     | Role                                                         |
| ------------------------ | ------------------------------------------------------------ |
| `tools=`                 | Tell the LLM what tools are available and how to call them   |
| `tool_choice="auto"`     | Let the model decide if/when to use a tool                   |
| `tool_calls` in response | Structured output with name and args of the function to call |
| `json.loads(arguments)`  | Parse the input string back to Python dict                   |
| `tool_router()`          | Run the actual function in Python                            |






The `tools` and `tool_choice="auto"` features in the **OpenAI API** are **built-in conveniences** that handle all the tricky parts of tool calling for you. Here’s how it breaks down:

---

## 🎁 What OpenAI Gives You: A Tool-Calling Convenience Layer

Without this special API support, you’d have to:

### 🛠️ *Manual Method (Before Tool Support)*:

1. **Prompt** the LLM to respond in *strict JSON*:

   > "Respond with JSON only, do not include any extra text..."
2. **Hope** it doesn't slip into natural language or hallucinate field names.
3. **Regex** or `json.loads()` and catch formatting errors.
4. **Manually decide** if the LLM response even implies a function call.
5. **Route** the response to a tool yourself.

This is fragile, error-prone, and often frustrating.

---

### ✅ *With Tool Calling Support (Modern Way)*:

1. You register your tools using strict JSON Schema (standardized).

2. You send user input and say:

   ```python
   tool_choice="auto"
   ```

3. **OpenAI handles everything**:

   * Detects if a tool is needed.
   * Selects the right tool.
   * Formats a clean `tool_calls` block.
   * Validates the input matches your schema.
   * Guarantees it's parseable JSON inside a string.

4. You just extract and run:

   ```python
   tool_name = tool_call.function.name
   args = json.loads(tool_call.function.arguments)
   ```

---

## 💡 Bottom Line:

Tool calling support in the API lets the LLM act like an **intelligent dispatcher** that:

* Decides *what* to call
* Follows strict *input format*
* Returns *structured JSON* reliably

So yes — without this feature, you’d have to **engineer all of that behavior yourself**, which is hard, brittle, and not nearly as elegant.




In [1]:
%pip install -qU dotenv openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/765.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m337.9/765.0 kB[0m [31m9.9 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m757.8/765.0 kB[0m [31m14.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m765.0/765.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from openai import OpenAI
from dotenv import load_dotenv
import os
import json
import re
import textwrap

load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# 🔹 Step 1: Imports and Setup
source_dir = "/content/docs_folder"

# Make sure the directory exists
if not os.path.exists(source_dir):
    raise FileNotFoundError(f"📁 Directory not found: {source_dir}")

# List and build full file paths
file_list = [
    os.path.join(source_dir, f)
    for f in os.listdir(source_dir)
    if os.path.isfile(os.path.join(source_dir, f))
]

# Display the found files
print("📂 Files found:")
for file in file_list:
    print("  -", file)

📂 Files found:
  - /content/docs_folder/004_AGENT_Tools.txt
  - /content/docs_folder/001_PArse_the Response.txt
  - /content/docs_folder/003_gent Feedback and Memory.txt
  - /content/docs_folder/000_Prompting for Agents -GAIL.txt
  - /content/docs_folder/002_Execute_the_Action.txt


You can think of your code in two main **sections** that correspond to stages of building an agent:

---

## 🧰 **Part 1: Tool Creation & Declaration**

This is everything *before* the agent does anything. It includes:

### 1. **Python Tool Functions**

You create Python functions like `list_files()`, `read_file()`, etc. — these are real, working backend tools.

### 2. **Tool Schema Definitions**

You define JSON-like schemas for each tool, describing:

* What the tool is called (`name`)
* What it does (`description`)
* What parameters it needs (`parameters`)
* And the OpenAI-required `"type": "function"` and `"function": { ... }` format.

These schemas are needed **so the LLM knows how to talk to your tools**, but the LLM doesn’t run them — you do.

### 3. **Master Tool List (`tools`)**

This list is passed into `client.chat.completions.create()` so the LLM is aware that tools exist and knows what options it has.

---

## 🤖 **Part 2: Agent Operation & Orchestration**

This is where your LLM-based agent comes alive and begins responding to user input.

### 4. **`generate_agent_response()`**

* Builds the message thread
* Sends it to OpenAI with tools + `tool_choice="auto"`
* Returns either:

  * a **tool call** (`type: "tool"`) with structured arguments, or
  * a **text response** (`type: "text"`)

### 5. **`handle_tool_call()`**

* Inspects the choice
* If it’s a tool, it extracts the name + args and routes the call
* If it’s just text, it prints it

### 6. **`tool_router()`**

* Executes the correct Python function using the arguments
* Returns the result back to `handle_tool_call()` to be printed

---

## ✅ Summary

Your idea to split the notebook into:

* **Notebook 1: Tool Building**
* **Notebook 2: Agent Logic**

...is an excellent learning strategy and mirrors real-world agent development.

You’ve built a fully functioning, modular, easy-to-debug agent architecture — and you now understand:

* The difference between tool logic and tool schema
* The structure and purpose of OpenAI’s function calling interface
* How agents decide between generating text vs. using tools



In [5]:
# ✅ Tool 1: List all .txt files
def list_files():
    return [os.path.basename(f) for f in file_list if f.endswith(".txt")]

# ✅ Tool 2: Read a specific file
def read_file(filename):
    path = os.path.join(source_dir, filename)
    if not os.path.isfile(path):
        return f"⚠️ File not found: {filename}"
    with open(path, "r") as f:
        return f.read()

# ✅ Tool 3: Search for keyword in file names
def search_file_names(keyword, case_sensitive=False):
    matches = []
    for f in file_list:
        name = os.path.basename(f)
        haystack = name if case_sensitive else name.lower()
        needle = keyword if case_sensitive else keyword.lower()
        if needle in haystack:
            matches.append(name)
    return matches

# 🔧 Define tools individually
list_files_tool = {
    "type": "function",  # ✅ Required by OpenAI function calling
    "name": "list_files",
    "description": "Lists all files in the source directory.",
    "parameters": {
        "type": "object",
        "properties": {},
        "required": []
    }
}

read_file_tool = {
    "type": "function",  # ✅ Required by OpenAI function calling
    "name": "read_file",
    "description": "Reads the content of a specified file.",
    "parameters": {
        "type": "object",
        "properties": {
            "filename": {
                "type": "string",
                "description": "The name of the file to read."
            }
        },
        "required": ["filename"]
    }
}

search_file_names_tool = {
    "type": "function",  # ✅ Required by OpenAI function calling
    "name": "search_file_names",
    "description": "Searches for files whose names include the given keyword.",
    "parameters": {
        "type": "object",
        "properties": {
            "keyword": {
                "type": "string",
                "description": "The keyword to search for in the file names."
            },
            "case_sensitive": {
                "type": "boolean",
                "description": "Whether the search should be case sensitive.",
                "default": False
            }
        },
        "required": ["keyword"]
    }
}

#=======================================
# 🧰 This is the “Tool Declaration” Step
#=======================================

# You're defining what tools exist — the LLM needs to know:

# What each tool is called
# What it does
# What inputs it expects (parameters, types, required fields, etc.)

# This tools list is then passed to the LLM inside the generate_agent_response() function like so

tools = [
    {
        "type": "function",               # ✅ Declares this is a function-style tool
        "function": list_files_tool       # ✅ Embeds the tool definition
    },
    {
        "type": "function",
        "function": read_file_tool
    },
    {
        "type": "function",
        "function": search_file_names_tool
    }
]

#=======================================
# generate_agent_response() Runs
#=======================================

# Prepares a messages list that includes:
# A system prompt (describes the assistant’s behavior)

# The user’s input Sends the request to the OpenAI API using client.chat.completions.create(...)

# Includes:

# Your tools
# tool_choice="auto" → LLM chooses if it wants to use a tool

# Returns either:

# A tool_call → structured JSON telling you which tool to run
# A regular text message

def generate_agent_response(user_input):
    messages = [
        {"role": "system", "content": "You are an assistant that helps with managing files. Use tools when needed."},
        {"role": "user", "content": user_input}
    ]

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        tools=tools,            # ✅ Tell the LLM what tools exist
        tool_choice="auto",     # ✅ Let it decide which to use
    )

    choice = response.choices[0].message

    if choice.tool_calls:
        # The LLM decided to use a tool
        tool_call = choice.tool_calls[0]
        return {"type": "tool", "tool_call": tool_call}
    else:
        # Regular assistant reply
        return {"type": "text", "content": choice.content}

#=========================================
# 🧰 handle_tool_call(choice) Runs
#=========================================

# Receives the LLM’s response (the choice)
# Checks: Is it a tool call?

# ✅ Yes → extract tool name and arguments

# 👉 Send to tool_router() to run the actual Python function
# 💬 Print tool result

# ❌ No → just print the LLM’s response directly

def handle_tool_call(choice):
    if choice["type"] == "tool":
        tool_name = choice["tool_call"].function.name
        args = json.loads(choice["tool_call"].function.arguments)
        result = tool_router(tool_name, args)
        print(f"🛠️ Used Tool: {tool_name}")
        print(f"📤 Args: {args}")
        print(f"📥 Result:\n{result}")
    elif choice["type"] == "text":
        print(f"💬 Assistant Response:\n{choice['content']}")
    else:
        print("❓ Unrecognized response type.")

#=========================================
# tool_router() Executes the Right Tool
#=========================================

# Gets called from inside handle_tool_call()
# Based on the tool_name, it picks and runs the correct Python function
# Returns the result

def tool_router(tool_name, args):
    if tool_name == "list_files":
        return list_files()
    elif tool_name == "read_file":
        return read_file(args["filename"])
    elif tool_name == "search_file_names":
        return search_file_names(
            keyword=args["keyword"],
            case_sensitive=args.get("case_sensitive", False)
        )
    else:
        return f"❌ Unknown tool: {tool_name}"

#================
# User Input
#================

user_input = "List all the files that contain the word 'memory'"
choice = generate_agent_response(user_input)
handle_tool_call(choice)


🛠️ Used Tool: search_file_names
📤 Args: {'keyword': 'memory'}
📥 Result:
['003_gent Feedback and Memory.txt']


In [6]:
user_input = "List all the files that contain the word 'Agent'"
choice = generate_agent_response(user_input)
handle_tool_call(choice)

🛠️ Used Tool: search_file_names
📤 Args: {'keyword': 'Agent'}
📥 Result:
['004_AGENT_Tools.txt', '000_Prompting for Agents -GAIL.txt']



## 🔧 TOOL & AGENT DESIGN LESSONS

### 1. **Tool Design Is an API Contract**

Each tool you define (via the JSON schema) acts like a **mini-API endpoint**:

* You specify the **interface** (what arguments it takes)
* The LLM decides **when to call it** based on the user message
* You write the **backend function** to actually perform the task

👉 Think of tools like Lego blocks: small, reusable, testable units that can be composed by the agent as needed.

---

### 2. **LLM Tool Usage Is Purely Declarative**

The LLM does *not* run your code.

* It merely **calls out what tool it thinks should be used**
* It formats the request like:

  ```json
  {
    "tool_calls": [
      {
        "function": {
          "name": "read_file",
          "arguments": "{ \"filename\": \"example.txt\" }"
        }
      }
    ]
  }
  ```
* It's **your job** to run the Python code based on that request.

---

### 3. **Tool Invocation Format Is Strict**

OpenAI’s API expects a very specific tool structure:

* Outer list of tool objects
* Each tool:

  ```json
  {
    "type": "function",
    "function": { ... }
  }
  ```
* The inner `function` must include: `name`, `description`, `parameters`

⚠️ If you omit or misname one of these keys, you’ll get cryptic 400 errors like:

> `"Missing required parameter: tools[0].function"`

---

### 4. **API Models Must Support Tools**

Only some models support tool calling:

* ✅ `gpt-4`, `gpt-4-0613`, `gpt-4-1106-preview`, `gpt-4o`
* ✅ `gpt-3.5-turbo-0613`, `gpt-3.5-turbo-1106`
* ❌ Older models (like `gpt-3.5-turbo` with no suffix) *don’t* support it.

---

### 5. **You’re Building an Agent Loop**

A full agent doesn’t just use a tool once — it loops:

1. LLM chooses a tool
2. You execute it
3. You give back the result as a message
4. LLM may use another tool… or stop

This can become a full **think-act-learn loop** with memory and planning if you add history or reflection.

---

### 6. **Agent System Messages Matter**

Your `"system"` prompt should:

* Tell the model *what role* it's playing
* Give it hints like:

  > “You are a file management assistant. Use tools to retrieve or read files. Only respond in plain language if a tool is not required.”

This helps the LLM reason about *when* to use a tool.

---

## 🧠 Pro Tips

| Tip                                                    | Why It Matters                                              |
| ------------------------------------------------------ | ----------------------------------------------------------- |
| ✅ Always test tool functions standalone                | Confirms your backend works before involving the LLM        |
| 🧪 Use print statements or logging in the router       | Helps trace LLM decisions during development                |
| 🛑 Be defensive: use `.get()` and try/except on `args` | Avoids runtime crashes if the LLM returns unexpected values |
| 📎 Keep your tools modular                             | Easier to test, debug, and evolve your agent                |
| 🧰 Start small: 2–3 tools per agent                    | Keeps behavior predictable while you develop                |



This is **one of the most important things to understand** when working with tools and agents.

---

### ✅ **It is the job of the LLM to generate this `arguments` block.**

When you define your tools like this:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Reads the content of a specified file.",
            "parameters": {
                "type": "object",
                "properties": {
                    "filename": {
                        "type": "string",
                        "description": "The name of the file to read."
                    }
                },
                "required": ["filename"]
            }
        }
    }
]
```

You are telling the LLM:

> "If the user input seems to require reading a file, and you want to call a tool to do that, then use this `read_file` tool, and fill out the `filename` parameter appropriately."

So, when the user says:

> “Can you show me what's inside `tool_notes.txt`?”

The **LLM parses this**, understands that `tool_notes.txt` is the value for `filename`, and **generates** this tool call:

```json
{
  "tool_calls": [
    {
      "function": {
        "name": "read_file",
        "arguments": "{ \"filename\": \"tool_notes.txt\" }"
      }
    }
  ]
}
```

---

### 🚫 You do **not** write this yourself.

If the model is given your `tools` definition and you use:

```python
tool_choice="auto"
```

then OpenAI’s API *forces* the LLM to return a `tool_calls` block **instead of normal text**, and includes the arguments it parsed from the user’s message.

---

### ✅ Your Job:

Once you receive the response with:

```python
response.choices[0].message.tool_calls
```

You:

1. Parse the arguments (with `json.loads(...)`)
2. Route them to the right Python function
3. Pass in the arguments
4. Return the output back to the LLM (if you're looping)

---

### 🔁 Recap Flow:

1. **User Input**: `"Show me what's inside 'report.txt'"`
2. **LLM → tool\_call**:

   ```json
   {
     "name": "read_file",
     "arguments": "{ \"filename\": \"report.txt\" }"
   }
   ```
3. **You run**: `read_file("report.txt")`
4. **You return**: The file contents back to the LLM (optional, in looped agents)


