<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/062_Generating_Structured_Responses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ‚ú® Generating Structured Responses

Free-text answers are powerful ‚Äî they‚Äôre expressive, flexible, and human-readable. But sometimes‚Ä¶ we need more than just words. We need structure. We need certainty. We need **data**.

Imagine you're validating invoices. You don‚Äôt want just a paragraph explaining what went wrong ‚Äî you want a clear, unambiguous signal:

- ‚úÖ Is it compliant? (`true` or `false`)
- üìù Why or why not?

That‚Äôs where **structured outputs** come in.

Instead of asking the LLM to just talk, we give it a shape to fill: a well-defined **JSON schema**. By switching to `prompt_llm_for_json`, we ensure:

- üì¶ Clean, predictable outputs
- üîå Easy integration with downstream systems
- üö´ No more parsing paragraphs to find a simple yes/no!

Structured prompting turns reasoning into **actionable data**, and that‚Äôs the secret sauce behind reliable, production-grade AI systems.


In [None]:
@register_tool(tags=["invoice_processing", "validation"])
def check_purchasing_rules(action_context: ActionContext, invoice_data: dict) -> dict:
    """
    Validate an invoice against company purchasing policies, returning a structured response.

    Args:
        invoice_data: Extracted invoice details, including vendor, amount, and line items.

    Returns:
        A structured JSON response indicating whether the invoice is compliant and why.
    """
    rules_path = "config/purchasing_rules.txt"

    try:
        with open(rules_path, "r") as f:
            purchasing_rules = f.read()
    except FileNotFoundError:
        purchasing_rules = "No rules available. Assume all invoices are compliant."

    validation_schema = {
        "type": "object",
        "properties": {
            "compliant": {"type": "boolean"},
            "issues": {"type": "string"}
        }
    }

    return prompt_llm_for_json(
        action_context=action_context,
        schema=validation_schema,
        prompt=f"""
        Given this invoice data: {invoice_data}, check whether it complies with company purchasing rules.
        The latest purchasing rules are as follows:

        {purchasing_rules}

        Respond with a JSON object containing:
        - `compliant`: true if the invoice follows all policies, false otherwise.
        - `issues`: A brief explanation of any violations or missing requirements.
        """
    )

This shows how to combine **structured LLM output** with a **modular, rules-driven agent system**. Here's what stands out and what you should be focusing on:

---

### üß© 1. **Structured Output with JSON Schema**

**Key concept**: You're not asking the LLM to ‚Äújust respond‚Äù ‚Äî you're saying *respond in this shape*:

```python
validation_schema = {
    "type": "object",
    "properties": {
        "compliant": {"type": "boolean"},
        "issues": {"type": "string"}
    }
}
```

* ‚úÖ `compliant`: A strict true/false value ‚Äî easy for automation.
* üõ†Ô∏è `issues`: Human-readable explanation ‚Äî useful for review or escalation.

‚û°Ô∏è **Focus**: This kind of schema makes your system interoperable and robust. No guesswork when parsing the LLM‚Äôs response.

---

### üìö 2. **Rules as Dynamic Text, Not Hardcoded Logic**

```python
with open(rules_path, "r") as f:
    purchasing_rules = f.read()
```

* The rules live outside the code (in a file), which means:

  * üßë‚Äçüíº Non-technical stakeholders can update them.
  * ‚ôªÔ∏è The system adapts without redeployment.
  * üîç The LLM reads and reasons over *human-readable policy*, not rigid logic.

‚û°Ô∏è **Focus**: This design shows off a key strength of LLMs ‚Äî they interpret language like policy documents just as well as humans do.

---

### üß† 3. **LLM as a Reasoning Engine, Not a Rules Engine**

```python
prompt_llm_for_json(...)
```

You're using the LLM like a flexible brain that reads a set of rules and interprets them against live data (the invoice).

* No `if`/`else` statements.
* No brittle regex.
* No fragile rule-based systems.

‚û°Ô∏è **Focus**: This is modern AI software engineering. You get reasoning + structure in one clean package.

---

### üõ†Ô∏è 4. **The Agent Architecture Is Modular**

This is just one tool in the broader invoice-processing system. You can imagine others:

* `extract_invoice_data`
* `categorize_expenditure`
* `store_invoice`

‚û°Ô∏è **Focus**: Each tool does *one thing well* and plugs into a clean pipeline.

---

### üí° Final Takeaway

This tool exemplifies the **best practices** of agent design:

* Clear responsibility
* Externalized knowledge (rules)
* Structured, machine-readable output
* Natural language reasoning power of LLMs

It‚Äôs elegant, practical, and scalable. You're not just building prompts ‚Äî you‚Äôre building AI-native infrastructure.




### ‚úÖ Yes, this *is* a precursor or **gatekeeper** tool in your agent pipeline.

Just like in traditional software pipelines where you'd validate inputs before processing (e.g., form validation, schema checks, auth checks), this tool:

* **Verifies compliance with policy** before any deeper processing begins.
* **Acts as a filter** so only valid, policy-compliant invoices make it to storage, categorization, or payment systems.
* **Encapsulates business rules** without hardcoding logic into the core agent.

---

### üîÅ How it fits in a modern LLM-based pipeline:

```
         Incoming Invoice
                |
        üîç check_purchasing_rules (LLM tool)
                |
        ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
     ‚ùå Fails        ‚úÖ Passes
     Rejected        ‚Üì
             extract_invoice_data
                    ‚Üì
             categorize_expenditure
                    ‚Üì
               store_invoice
```

---

### üß† Why it‚Äôs ‚Äúmodern‚Äù:

Traditional:

* Rules are coded (`if amount > 5000 and not pre_approved: error`)
* Requires software devs to update policy logic

LLM-enhanced:

* Reads and interprets **natural language policy**
* Updates come from editing a `.txt` file, not changing code
* Still gives you **structured, reliable output**

---

### üèóÔ∏è This is software engineering ‚Äî just evolved.

You're still designing:

* **Input validation**
* **Fail-fast systems**
* **Composable modules**

You're just replacing brittle logic with **language-native reasoning**, which makes your tools more **maintainable**, **transparent**, and **human-aligned**.

And yes ‚Äî this is exactly the kind of modern system design that blends **AI and software best practices**. You're not abandoning your foundations ‚Äî you're upgrading them.




### üõ†Ô∏è Traditional software:

* **Policies are hardcoded** into conditionals and logic trees:

  ```python
  if invoice.total > 5000 and not invoice.has_pre_approval:
      raise PolicyViolation("Approval required for invoices over $5,000")
  ```
* Updating a policy means:

  1. A developer changes the code.
  2. QA tests the new logic.
  3. It‚Äôs deployed via a code release cycle.
  4. Any errors ‚Üí debugging, regression tests.

### ü§ñ With the LLM-based tool:

* The rules live in a **plain-text file** (`purchasing_rules.txt`).
* Anyone ‚Äî even a non-technical policy analyst ‚Äî can change the policy:

  > *‚ÄúAll software purchases over \$2,000 must be reviewed by IT.‚Äù*
* The **LLM interprets the new policy in real time**, guided by your structured prompt and schema.
* ‚úÖ No changes to code
* ‚úÖ No developer time required
* ‚úÖ No re-deploying

---

### üîÑ Who owns what?

| Role             | Responsibility                            |
| ---------------- | ----------------------------------------- |
| **Policy Owner** | Updates the text rule file                |
| **Developer**    | Builds & maintains the rule-checking tool |
| **LLM**          | Interprets and applies the rules          |
| **Agent System** | Integrates the compliance check           |

---

### üî• Why this matters:

* **Faster response to change:** Policies evolve weekly in many orgs. You don‚Äôt want devs on the hook every time.
* **Fewer bugs:** Natural language rules ‚Üí less risk of logic bugs from misunderstood policy.
* **Empowered stakeholders:** Policy experts don‚Äôt need devs as intermediaries.
* **More agile systems:** Your software becomes flexible and adaptable without increasing fragility.

So yes ‚Äî **this is real separation of concerns** in the LLM era. And it‚Äôs *exactly* how modern, dynamic systems should be built.




## üöÄ Updating the Invoice Processing Agent

Now that we‚Äôve equipped ourselves with expert tools, it‚Äôs time to **level up our agent** into a full-fledged processing powerhouse. This upgraded agent doesn‚Äôt just follow static logic ‚Äî it **makes decisions dynamically** by invoking the right expertise at the right time.

### üß† The Agent Will Now Know:

* **When to extract** invoice data from raw text
* **When to categorize** the expenditure using a financial expert persona
* **When to validate** the invoice against real-time purchasing policies
* **How to store** compliant invoices into persistent storage

Each step is modular, maintainable, and powered by the appropriate specialist ‚Äî just like in a real-world workflow.

---

### üßæ Full Agent Code (Coming Next)

This agent becomes the **orchestrator** ‚Äî consulting the experts, validating the rules, and persisting results ‚Äî all without hardcoding logic or rules. You‚Äôve officially entered the realm of **expert-agent architecture**.



In [None]:
def create_invoice_agent():
    # Create action registry with invoice tools
    action_registry = PythonActionRegistry()

    # Define invoice processing goals
    goals = [
        Goal(
            name="Persona",
            description="You are an Invoice Processing Agent, specialized in handling invoices efficiently."
        ),
        Goal(
            name="Process Invoices",
            description="""
            Your goal is to process invoices accurately. For each invoice:
            1. Extract key details such as vendor, amount, and line items.
            2. Generate a one-sentence summary of the expenditure.
            3. Categorize the expenditure using an expert.
            4. Validate the invoice against purchasing policies.
            5. Store the processed invoice with categorization and validation status.
            6. Return a summary of the invoice processing results.
            """
        )
    ]

    # Define agent environment
    environment = PythonEnvironment()

    return Agent(
        goals=goals,
        agent_language=AgentFunctionCallingActionLanguage(),
        action_registry=action_registry,
        generate_response=generate_response,
        environment=environment
    )


This code reflects a shift from standalone tools toward a **fully integrated, multi-stage agent workflow**. Here‚Äôs what stands out as different and important compared to the previous examples:

---

### üß† 1. **High-Level Orchestration of Expert Tools**

Before, we built **individual tools** (like `check_purchasing_rules` or `categorize_expenditure`). Now, this function creates an **agent** that knows how to **sequence those tools together** as part of its decision-making.

* The agent is no longer just "calling" a tool.
* It now **understands the workflow**: extraction ‚Üí summarization ‚Üí categorization ‚Üí validation ‚Üí storage ‚Üí reporting.

---

### üéØ 2. **Declarative Goals, Not Imperative Logic**

Rather than specifying what to do in imperative code (e.g. calling tool functions directly), this agent is driven by **goals**:

```python
goals = [
    Goal(name="Persona", description="..."),
    Goal(name="Process Invoices", description="...")
]
```

These goals are declarative ‚Äî telling the LLM ‚Äúhere‚Äôs what you‚Äôre trying to accomplish,‚Äù and letting the LLM figure out which tool to call and when.

‚úÖ This is the **Agent Function Calling** paradigm in action.

---

### üß∞ 3. **Modular, Plug-and-Play Environment**

This line:

```python
action_registry = PythonActionRegistry()
```

...registers all the tools you've written (like `extract_invoice_data`, `store_invoice`, etc.).

This means:

* You can add or swap tools **without rewriting agent code**
* The agent automatically ‚Äúlearns‚Äù what tools are available

This pattern is clean, scalable, and decoupled.

---

### üåç 4. **Dedicated Agent Environment**

```python
environment = PythonEnvironment()
```

This allows the agent to have **state, memory, and persistence**, if needed ‚Äî something individual tool calls did not support before.

---

### üîÑ 5. **Reusable Agent Constructor**

This `create_invoice_agent()` function can be used to:

* Launch the agent repeatedly in different contexts
* Be embedded in a pipeline
* Plug into a larger system

It separates **agent definition** from **agent execution**, just like `__init__()` in an object-oriented design.

---

### ‚úÖ Summary: What‚Äôs New and Powerful

| Concept             | Before           | Now                                  |
| ------------------- | ---------------- | ------------------------------------ |
| **Level**           | Tool-level       | Full agent orchestration             |
| **Goal Definition** | Manual prompting | Declarative `Goal()` objects         |
| **Workflow**        | Ad-hoc           | Agent-managed step-by-step pipeline  |
| **Modularity**      | Per function     | Full plug-and-play action registry   |
| **State/Memory**    | None             | Optional environment for persistence |



Let‚Äôs break down how each of the **six steps in the goal** can (and should) map to **individual tools**, most of which are driven by **specialized personas** using the persona pattern.

---

### üß© Goal-to-Tool Mapping

#### **1. Extract key details such as vendor, amount, and line items.**

* ‚úÖ **Tool**: `extract_invoice_data`
* üß† **Persona**: *Invoice Data Extraction Specialist*
* üîß Method: Uses `prompt_llm_for_json` with a fixed schema for consistency.

---

#### **2. Generate a one-sentence summary of the expenditure.**

* ‚úÖ **Tool**: `summarize_invoice` (you‚Äôd define this)
* üß† **Persona**: *Financial Communicator / Procurement Analyst*
* üìù Goal: Convert structured invoice data into a concise description for categorization.

---

#### **3. Categorize the expenditure using an expert.**

* ‚úÖ **Tool**: `categorize_expenditure`
* üß† **Persona**: *Senior Financial Analyst*
* üì¶ Chooses one of 20 predefined categories based on that one-liner.

---

#### **4. Validate the invoice against purchasing policies.**

* ‚úÖ **Tool**: `check_purchasing_rules`
* üß† **Persona**: *Corporate Compliance Officer*
* üìú Reads a policy file and evaluates adherence dynamically using `prompt_llm_for_json`.

---

#### **5. Store the processed invoice with categorization and validation status.**

* ‚úÖ **Tool**: `store_invoice`
* üß† Likely no persona needed here ‚Äî this is a **procedural task**.
* üóÇÔ∏è Logic: Store invoice data keyed by invoice number, for retrieval and persistence.

---

#### **6. Return a summary of the invoice processing results.**

* ‚úÖ **Tool**: `summarize_processing_results` (you‚Äôd define this)
* üß† **Persona**: *Operations Analyst*
* üéØ Takes output from all previous steps and assembles a human-readable report or dashboard update.

---

### üß† Why Persona-Based Tools Are Powerful Here:

Each step encapsulates:

* A **distinct skill set**
* A **clear contract** for inputs and outputs
* **Human-comprehensible documentation** via the persona description
* The ability to **evolve independently** (e.g., if your company changes how it categorizes expenses, you update that one tool/persona only)





## üß† Step-by-Step Agent Execution

### ‚úÖ **Step 1: Extract Invoice Data**

**Tool Called**: `extract_invoice_data`
**Input**: Raw invoice text (PDF converted to text, email body, etc.)
**Output**: Structured JSON (e.g., vendor name, date, total, line items)
**Passed to**: Step 2

---

### ‚úÖ **Step 2: Generate Summary Description**

**Tool Called**: `summarize_invoice`
**Input**: The structured JSON output from Step 1
**Output**: A single sentence like:

> ‚ÄúPurchase of one high-end workstation for IT department.‚Äù

**Passed to**: Step 3

---

### ‚úÖ **Step 3: Categorize Expenditure**

**Tool Called**: `categorize_expenditure`
**Input**: The one-sentence summary from Step 2
**Output**: One of 20 predefined categories
**Passed to**: Step 4

---

### ‚úÖ **Step 4: Validate Against Purchasing Rules**

**Tool Called**: `check_purchasing_rules`
**Input**: Full structured invoice JSON from Step 1
**Also Uses**: `purchasing_rules.txt` (loaded at runtime)
**Output**:

```json
{
  "compliant": false,
  "issues": "This purchase exceeds $5,000 but lacks required pre-approval."
}
```

**Passed to**: Step 5

---

### ‚úÖ **Step 5: Store Processed Invoice**

**Tool Called**: `store_invoice`
**Input**: Full invoice data, plus category and compliance result
**Output**: Storage status like:

```json
{
  "status": "success",
  "message": "Stored invoice 7890",
  "invoice_number": "7890"
}
```

---

### ‚úÖ **Step 6: Return Final Summary**

**Tool Called**: `summarize_processing_results`
**Input**: Outputs from all previous steps
**Output**: Human-readable message, or dashboard-ready summary like:

> ‚ÄúInvoice 7890 from Tech Solutions Inc. (IT Equipment) is non-compliant due to lack of pre-approval for a \$6,000 workstation. Stored successfully.‚Äù

---

## üí° Why This Structure Works So Well

* üß± **Composability**: Each step is its own tool and can be reused elsewhere (e.g. validating invoices, validating reimbursements, etc.).
* üîÑ **Error Isolation**: If something breaks (e.g., schema mismatch), you know exactly where.
* üß† **Clear Persona Use**: Each tool is paired with a focused, realistic expert that guides the LLM's reasoning.
* üîß **Easily Replaceable Logic**: You can upgrade just the categorization logic without touching the rest of the pipeline.






This **agent architecture mirrors traditional software design** in its best practices:

---

### üß± **Traditional Software Patterns Still Apply**

| Traditional Concept    | LLM Agent Equivalent                                     |
| ---------------------- | -------------------------------------------------------- |
| Functions              | Tools or tool-registered functions                       |
| Function Composition   | Chained tools (pipeline of persona-driven steps)         |
| Modularity             | Isolated expert tools (e.g., validation, extraction)     |
| Abstraction            | Persona descriptions (encapsulate domain expertise)      |
| Separation of Concerns | One tool = one job (single responsibility principle)     |
| Config-driven behavior | Human-readable files (e.g., purchasing\_rules.txt)       |
| Testable Units         | Tools can be unit tested independently                   |
| Versioning             | Swap in new expert prompts or rules without code changes |

---

### üî• What‚Äôs Changed (Dramatically)

* **Interpretation & Judgment are now part of the code path.**

  * LLMs can "understand" the content and make nuanced decisions.
  * This used to require rigid logic, regexes, rulesets, or ML models.

* **Expertise is now dynamic.**

  * You no longer need to encode all knowledge in advance.
  * The LLM can *simulate* any role, on demand, using a prompt.

* **System behavior becomes ‚Äúconfigurable via language.‚Äù**

  * A non-programmer can modify behavior just by updating instructions, descriptions, or rules ‚Äî no code needed.

---

### üß† Why This Matters

You're not replacing software engineering ‚Äî you're **augmenting it**.
The better your fundamentals (pipelines, modularity, testing, logging), the better your LLM-powered system will be.

This is why **software architecture + systems thinking** are now *superpowers* in AI-native tools and agents.






## üßô ‚ÄúOne Prompt to Rule Them All‚Äù Mentality:

This is the idea that you can solve a complex problem with a **single, massive, magical prompt** ‚Äî typically using an instruction like:

> "You're a world-class accountant, lawyer, engineer, product manager, and therapist. Do X, Y, Z. Return everything perfectly."

It‚Äôs a tempting approach because:

* It feels powerful and clever.
* It‚Äôs fast to prototype.
* It avoids any real system design.

But in practice? It **breaks down quickly**:

| Problem                    | Why It Fails                                                                       |
| -------------------------- | ---------------------------------------------------------------------------------- |
| üîÑ Poor Reusability        | Hard to re-use in different workflows. One monolith = no flexibility.              |
| üß± No Modularity           | If one thing breaks, you have to tweak the whole giant prompt.                     |
| üß™ Unpredictable           | Complex instructions overload the LLM‚Äôs context window and can reduce reliability. |
| ü§π Too Much Cognitive Load | Asking the LLM to juggle 10 roles at once weakens performance on all of them.      |
| üß© No Intermediates        | No structured intermediate steps = no introspection, no debugging, no logging.     |

---

## üß† Modular, Persona-Based Agent Systems (Like You‚Äôre Learning)

This is the opposite philosophy:
**Break the problem into manageable parts**, and use the LLM's strength ‚Äî *contextual reasoning* ‚Äî in focused, specialized roles.

| Strength                       | Why It Works                                                                                        |
| ------------------------------ | --------------------------------------------------------------------------------------------------- |
| ‚úÖ Clear Separation of Concerns | Each persona (tool) does one job well.                                                              |
| üîß Swappable Parts             | Update or replace one tool without touching the rest.                                               |
| üßµ Chainable Reasoning         | Intermediate results allow debugging, analysis, and structured thinking.                            |
| üìà Scales With Complexity      | As your use case grows, you can add tools/personas instead of bloating one prompt.                  |
| ü§ù Human-Aligned               | This mirrors how real teams work ‚Äî specialists collaborating, not one ‚Äúuberhuman‚Äù doing everything. |

---

### üî• Bottom Line

> **One giant prompt is a prototype.**
> **A modular agent system is a product.**

The single-prompt method might *look* smart, but it quickly becomes brittle, opaque, and hard to scale.
In contrast, **systems like the one you're building are transparent, testable, maintainable, and extensible** ‚Äî just like real software.





### üè≠ **Your Approach = An Expert Factory (Assembly Line for Thinking)**

Think of it like this:

* **Each tool** = a highly trained specialist with a clear job (extract data, write docs, check compliance‚Ä¶).
* **Each persona** = a real expert with unique methodology and domain-specific thinking.
* **The agent** = a project manager orchestrating the right experts at the right time.

This setup gives you:
‚úÖ *Clarity*: Each part does one thing well.
‚úÖ *Control*: Easy to swap tools or update individual personas.
‚úÖ *Scalability*: Add complexity by composing more tools ‚Äî not rewriting prompts.
‚úÖ *Debuggability*: If something breaks, you can see *which* step failed and why.
‚úÖ *Performance*: Experts don‚Äôt get confused ‚Äî they‚Äôre focused.

This is what real systems design looks like in the LLM era.

---

### üßô‚Äç‚ôÇÔ∏è **The God Prompt = One Overworked Genius Alone**

That prompt is like saying:

> ‚ÄúHere‚Äôs a vague problem. You‚Äôre the architect, lawyer, doctor, designer, QA, and product lead. Figure it out perfectly. Now. In one shot.‚Äù

Even if the LLM *can* do a lot ‚Äî you‚Äôre:

* Stressing the context window,
* Forcing it to remember too much at once,
* Getting shallow performance across roles,
* And leaving no traceable steps if anything goes wrong.

It‚Äôs like asking a single person to design, build, inspect, and explain a skyscraper ‚Äî *in one breath.*

---

### üîÅ **Systems vs. Prompts**

| God Prompt üßô    | Modular Agent üè≠       |
| ---------------- | ---------------------- |
| Impressive demo  | Reliable product       |
| One giant brain  | Team of experts        |
| Hard to maintain | Easy to debug & update |
| No structure     | Clear workflows        |
| Fragile          | Extensible & scalable  |

---

If your goal is **research**, the God prompt might be fun.

But if your goal is **production**, **collaboration**, or **automation at scale** ‚Äî
this factory-of-specialists approach is *the future*.






### üîÑ **Experts On-Demand = Dynamic, Contextual Intelligence**

You're not *predefining* a fixed team of experts.

Instead:

* üõ†Ô∏è You‚Äôve built a **tool** that can *generate* the perfect expert **on the fly**
* üß† You provide a description of the job, and the LLM constructs someone with:

  * Relevant background and experience
  * Specialized problem-solving strategies
  * Domain-specific values and focus
* ‚öôÔ∏è This expert *only exists* for that task ‚Äî they‚Äôre disposable, lightweight, and purpose-built

---

### Why This Is a Game-Changer:

1. **üî¨ Ultra-Specific Expertise**
   You don‚Äôt just have a ‚Äúdata scientist.‚Äù You have a *data scientist who specializes in small sample anomaly detection in financial time series* ‚Äî if that‚Äôs what the problem calls for.

2. **üß± Infinite Composability**
   You can build workflows by *chaining* these experts:
   ‚Üí One defines a strategy
   ‚Üí One builds it
   ‚Üí One tests it
   ‚Üí One explains it to a stakeholder

3. **üßë‚Äçüíº Role Fit = Better Results**
   Experts bring contextually appropriate methods.
   A ‚Äúcompliance officer‚Äù looks for risks differently than a ‚Äúgrowth marketer‚Äù ‚Äî and you get those differences *for free*.

4. **üìà Future-Proof & Adaptable**
   New domain? Just describe the expert. No refactoring. No code rewrite.
   The system evolves as fast as your needs do.

---

### Traditional Analogy:

In the old world of software, this would be like having a **script that writes new microservices** every time a ticket came in ‚Äî optimized, documented, and ready to plug in.

Now, instead of services, you're spinning up **domain experts**.

---

You‚Äôve got it exactly right:

> Instead of one God prompt doing everything, we‚Äôve built a system that *hires* the right expert for the job ‚Äî every time.

And because it‚Äôs all tool-driven, the logic is composable, traceable, and endlessly flexible.
That‚Äôs **real systems thinking** for the LLM era.




## üß™ Step 4: Testing the New Capabilities

With the new expert-driven tools in place, our **Invoice Processing Agent** is now equipped to:

* ‚úÖ Extract structured data from invoice text
* ‚úÖ Generate summaries for categorization
* ‚úÖ Consult an expert to classify the expenditure
* ‚úÖ Validate compliance with up-to-date purchasing policies
* ‚úÖ Store the full, processed invoice for downstream use

Here‚Äôs how you can test the full workflow end-to-end:

```python
invoice_text = """
    Invoice #4567
    Date: 2025-02-01
    Vendor: Tech Solutions Inc.
    Items:
      - Laptop - $1,200
      - External Monitor - $300
    Total: $1,500
"""

# Create an agent instance
agent = create_invoice_agent()

# Process the invoice
response = agent.run(f"Process this invoice:\n\n{invoice_text}")

print(response)
```

### üîç What to Expect:

The agent will:

1. Parse the raw text into structured JSON
2. Generate a one-line description (e.g., "Purchase of IT hardware for employee workspace")
3. Categorize the expense (likely: `IT Equipment`)
4. Check it against purchasing rules (e.g., thresholds, vendor compliance)
5. Return a structured summary with compliance status and any issues

