<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/054_SelfPrompting_Confidence_in_Convergence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧾 Self-Prompting for Structured Data Extraction

## 📦 Automating Accounts Payable with LLMs

Imagine we’re building an agent to automate accounts payable processing. Every day, the agent receives dozens of emails with attached invoices from different vendors, each using their own unique format and layout:
- PDFs
- Scanned image-to-text
- Plain text in the email body

Our agent needs to:
- Understand each invoice
- Extract key fields: invoice number, date, amount, line items
- Insert this data into the company’s accounting system

Without automation, this is a **tedious manual task** — reading each invoice and transcribing data by hand.

---

## 🤖 Why LLMs Are Transformative Here

With **self-prompting**, we use an LLM as a **universal parser** that can understand and interpret natural invoice structures, regardless of format.

### Key Capabilities:
- Read unstructured text
- Extract structured data via prompting
- Hand off that data to APIs or databases
- Make decisions based on what was extracted

This creates a powerful bridge between:
> 🗒️ **Human-style input** (unstructured text)  
> ➜ 🧾 **Machine-style output** (structured JSON)

---

## 🛠️ Workflow Overview

The agent can now:
1. Receive messy invoice text from any source
2. Use a specialized LLM tool (`prompt_llm_for_json`) to extract structured data
3. Route the structured output into downstream systems
4. Use extracted data to take action or make decisions


In [None]:

@register_tool()
def prompt_llm_for_json(action_context: ActionContext, schema: dict, prompt: str):
    """
    Have the LLM generate JSON in response to a prompt. Always use this tool when you need structured data out of the LLM.
    This function takes a JSON schema that specifies the structure of the expected JSON response.

    Args:
        schema: JSON schema defining the expected structure
        prompt: The prompt to send to the LLM

    Returns:
        A dictionary matching the provided schema with extracted information
    """
    generate_response = action_context.get("llm")

    # Try up to 3 times to get valid JSON
    for i in range(3):
        try:
            # Send prompt with schema instruction and get response
            response = generate_response(Prompt(messages=[
                {"role": "system",
                 "content": f"You MUST produce output that adheres to the following JSON schema:\n\n{json.dumps(schema, indent=4)}. Output your JSON in a ```json markdown block."},
                {"role": "user", "content": prompt}
            ]))

            # Check if the response has json inside of a markdown code block
            if "```json" in response:
                # Search from the front and then the back
                start = response.find("```json")
                end = response.rfind("```")
                response = response[start+7:end].strip()

            # Parse and validate the JSON response
            return json.loads(response)

        except Exception as e:
            if i == 2:  # On last try, raise the error
                raise e
            print(f"Error generating response: {e}")
            print("Retrying...")

invoice_schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string"},
        "date": {"type": "string"},
        "amount": {"type": "number"},
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "number"},
                    "unit_price": {"type": "number"}
                }
            }
        }
    }
}

extracted_data = prompt_llm_for_json(
    action_context=context,
    schema=invoice_schema,
    prompt="Extract invoice details from this text: 'INVOICE #1234...'"
)


## ✅ Why This Design Is So Smart

### 1. 🧠 The `prompt` Is Passed In

This lets you use **any prompt you want**, without changing the function. That makes it:

* Flexible
* Reusable
* Composable with different tasks

```python
prompt="Extract invoice details from this text: 'INVOICE #1234...'"
```

You can swap that out for:

```python
prompt="Extract meeting details from this email transcript..."
```

Same function. Different job. Beautiful.

---

### 2. 🧾 The `schema` Is Passed In

You don’t hardcode the output structure either — you provide it per use-case:

```python
invoice_schema = {...}
```

So this same tool could handle:

* Resumes
* Contracts
* Support tickets
* Social media posts

…simply by passing a new schema.

That’s what makes the tool **modular and schema-agnostic** — it doesn’t care *what* data you want. It just makes sure the LLM outputs it in the format you asked for.

---

## 🔄 Function = Prompt + Schema = Structured Output

```python
extracted_data = prompt_llm_for_json(
    action_context=context,
    schema=invoice_schema,
    prompt="Extract invoice details from this text..."
)
```

This is a **parameterized prompt tool** — and you can reuse it across your agent pipeline with different inputs.

---

## 🛠️ Final Thought

This is **the essence of scalable agent tooling**:

* You define the *interface* (prompt + schema)
* The tool does *one thing well* (structured extraction)
* You plug in *different use cases* as needed

That’s why it’s a powerful part of your prompt library.






> ✅ **A deceptively simple Python function becomes a highly intelligent tool — because the LLM does the hard work.**

---

## 🧠 Why This Simplicity Is Revolutionary

This tool does only a few things:

1. Gets the LLM callable from a context (`generate_response`)
2. Sends a carefully structured **prompt**
3. Parses the model’s output (inside a markdown JSON block)
4. Validates and returns the structured data
5. Retries up to 3 times if it fails

It’s **just a wrapper** — but what it wraps is *intelligence*.

---

## 🚀 What Makes It Powerful

This little function turns a **language model** into a:

* PDF parser
* Invoice extractor
* Resume summarizer
* Email intent classifier
* Bug report normalizer
* Meeting note distiller

…just by **swapping the prompt and schema**.

---

## 🔧 Compare to Traditional Code

### In classic code:

You’d have to write:

* Custom regex or NLP parsers
* XML/JSON extractors
* Schema validation logic
* Special-case handlers for every format

💀 Hours of code for each use-case.

---

### With this LLM-powered tool:

You just pass:

```python
schema = {...}
prompt = "Extract X from Y"
```

…And the LLM does the interpretation, cleaning, formatting, and structuring for you.

🔁 And with retries, it even handles **its own fallibility**.

---

## 💡 Final Insight

> You’ve just seen how **a few lines of orchestration code + a great prompt** can replace entire categories of traditional logic.

This is why prompt + tool design is **the new programming frontier** — and why your intuition to dig into this is so spot on.





## 🔍 What Stands Out in This Code

### 1. 🧠 **LLM as a Structured Tool (Not Just a Chatbot)**

```python
@register_tool()
def prompt_llm_for_json(action_context: ActionContext, schema: dict, prompt: str):
```

**Why it's important:**
This function isn’t just “chatting” — it’s being used as a **component inside a pipeline**, with clear inputs and expected outputs.

* The LLM becomes **predictable** and **reusable**, like a traditional software function.
* This is **language-as-API** in action.

---

### 2. 🛡️ **Schema-Constrained Prompting**

```python
"You MUST produce output that adheres to the following JSON schema..."
```

**Why it's important:**

* Instead of just saying "give me invoice data," we give the LLM a **strict structure** (via a JSON schema).
* This *greatly reduces hallucinations and formatting errors*.
* This lets you plug the output directly into downstream code.

🧠 **Lesson:** Good prompt design includes **format enforcement**, not just task description.

---

### 3. 🔁 **Retry Mechanism = Real-World Resilience**

```python
for i in range(3):
    try:
        ...
    except Exception as e:
        if i == 2:
            raise e
        print("Retrying...")
```

**Why it's important:**

* LLMs are probabilistic. Sometimes they return broken or partial outputs.
* A retry loop is a **simple but powerful fault-tolerance pattern**.
* It makes your agent **more robust** and production-ready.

🧠 **Lesson:** Always assume some chance of error when working with LLMs — and design with that in mind.

---

### 4. 🧾 **Markdown JSON Parsing (LLM-Friendly Convention)**

````python
if "```json" in response:
    start = response.find("```json")
    end = response.rfind("```")
    response = response[start+7:end].strip()
````

**Why it's important:**

* Many LLMs (especially OpenAI models) **wrap structured output in markdown code blocks** like this:

  ```json
  { "key": "value" }
  ```
* This code strips that formatting cleanly so we can parse it.

🧠 **Lesson:** Understand common **LLM output formatting conventions** and build around them.

---

### 5. ✅ **Single Responsibility Tool Design**

The function does one thing well:

> Prompt the LLM ➜ Validate response ➜ Return clean structured data

It doesn’t:

* Postprocess results
* Store them
* Trigger follow-up actions

That **separation of concerns** makes this tool **composable**, **testable**, and **reusable** across many agents or workflows.

---

## 🧭 Where to Focus as a Learner

| Focus Area                  | Why It Matters                                                        |
| --------------------------- | --------------------------------------------------------------------- |
| **Schema-guided prompting** | Teaches you how to constrain and format LLM outputs predictably       |
| **Retries and validation**  | Helps you build robust AI systems that don’t break on first error     |
| **Tool modularity**         | Shows how to make small, composable LLM tools you can reuse in agents |
| **Prompt clarity**          | System prompt + user prompt design is clear, purpose-driven           |
| **I/O patterns**            | Markdown → JSON → Python dict flow is a key real-world pattern        |






This code is a **living example** of what we meant earlier when we talked about **prompt crafting as the new coding** — but with the twist that now, **prompts are embedded in actual software functions**.

---

## 🔧 Let's Break It Down Further:

### ✅ 1. **The Prompt Is Treated as a Tool**

```python
@register_tool()
def prompt_llm_for_json(action_context: ActionContext, schema: dict, prompt: str):
```

This is **not**:

* A prompt you copy/paste into ChatGPT manually
* A throwaway string in a one-off script

This *is*:

* A **named, documented, reusable** function
* Something you can call from inside an agent pipeline, API, or UI
* A unit of behavior — like a software module, but powered by natural language and a model

This is exactly what we mean by:

> ✍️ **Prompt = Code Unit**

You're wrapping a **natural language instruction** into a **callable function** — this *is* the evolution of programming.

---

### 🔁 2. **Modular, Multi-Step Prompting**

The system prompt defines *rules and structure*:

```python
"You MUST produce output that adheres to the following JSON schema..."
```

Then the user prompt supplies *context and intent*:

```python
"Extract invoice details from this text: 'INVOICE #1234...'"
```

Together, they form a **prompt pair** — a reusable **template + task-specific data** combo.

This is modularity in action:

* You could **reuse this exact prompt function** with 100 different schemas and 1000 different inputs.
* The *structure* stays the same — only the content changes.

---

### 🛠️ 3. **Composable Prompts = Prompt Libraries**

You could build a whole toolkit of functions like this:

```python
extract_from_resume()
normalize_address()
summarize_contract()
generate_meeting_notes()
```

Each one is just:

* A well-crafted system prompt
* An input string
* A JSON schema or expected output format

And together they form your **Prompt API** — your **LLM toolkit** — the same way we used to build **class libraries or function modules** in classic coding.

> 📦 Just like you import `math.sqrt()`, you now might import `prompt_llm_for_json()`.

---

### 🎯 4. **Prompt Design Is Now About Interface Design**

This function is successful because:

* It defines clear **input contracts** (`schema`, `prompt`)
* It defines a clear **output expectation** (valid JSON)
* It has **error handling**
* It’s easy to **compose into larger workflows**

Which is exactly what we aim for in good software interface design.

So prompt engineering isn’t just about “wordsmithing.”
It’s about building **stable interfaces to intelligence**.

---

## 🧭 TL;DR — What This Teaches Us

| Classic Programming | LLM Programming         |
| ------------------- | ----------------------- |
| Functions and APIs  | Prompt-wrapped tools    |
| Code reuse          | Prompt reuse            |
| Type contracts      | JSON schema constraints |
| Libraries           | Prompt toolkits         |
| Modular design      | Modular instructions    |

You're now thinking **not just in code**, but in **reasoning components**.
That’s the key to scaling LLM systems.





## 🧠 Why Predictable Format = Power

When you know a prompt will return:

```json
{
  "invoice_number": "string",
  "date": "string",
  "amount": "number",
  "line_items": [...]
}
```

Then downstream tools can:

* **Parse and validate** the output with confidence
* Feed it into a **database insert function**
* Trigger **follow-up prompts** (e.g., “Approve if amount < \$5,000”)
* **Log, visualize, or report** without guessing

---

## 🧩 This Is the Foundation of “Prompt as Interface”

Each tool becomes like a **Lego block** — with:

* **Defined input shape** (e.g., a schema or structured string)
* **Predictable output shape**
* Clear “plugs” for upstream/downstream steps

So you can now:

* Build pipelines like **Input ➜ Extract ➜ Transform ➜ Route ➜ Decide ➜ Store**
* Swap out parts (e.g., different extractors for different doc types)
* Add agents or logic dynamically without breaking the system

---

## 🛠️ In Classic Dev Terms…

| Classic Dev Concept    | LLM Equivalent                                       |
| ---------------------- | ---------------------------------------------------- |
| Interfaces / contracts | Prompt schema + format constraints                   |
| Dependency injection   | Pass different prompts or schemas into same function |
| Middleware / pipes     | Prompt-chain steps in LangChain or ReAct             |
| Plug-and-play modules  | Prompt tools with reusable logic                     |
| API chaining           | Tool-call → JSON → next tool → etc.                  |

---

## 💡 Why This Matters

In early LLM experiments, everything was **one-off and fragile**.
Now, we’re moving into **reliable, modular AI software design**:

* You don’t just write prompts — you write **promptable modules**
* You don’t just process text — you build **structured pipelines**
* You don’t just hope it works — you **design around contracts and expectations**

This mindset shift turns prompting into **true engineering**.





## 🧠 Why System Thinking Matters in the Age of LLMs

We’ve moved beyond:

* One-shot prompts
* Isolated chat completions
* Magic tricks with clever wording

Now we’re building:

* **Agents**
* **Toolchains**
* **Multistep logic**
* **Autonomous workflows**
* **Interacting roles and modules**

This is **system design** — and it’s how you go from "interesting demo" to "real-world product".

---

## 🛠️ The LLM is Just One Part of the System

> Think of the LLM like the brain in a much larger nervous system.

To build something meaningful, you also need:

* **Memory** (vector DBs, structured state)
* **Tools** (APIs, code functions, search)
* **Input/output interfaces** (chat UIs, webhooks, emails, sensors)
* **Reasoning flows** (step-by-step logic, plans, sub-goals)
* **Control loops** (self-evaluation, retries, agent decisions)

These are the building blocks of a **thinking machine** — and your job is to be its **architect**.

---

## 🎯 What System Thinking Looks Like in Practice

| Without Systems Thinking | With Systems Thinking           |
| ------------------------ | ------------------------------- |
| Write a prompt           | Build a tool with schema        |
| Get an answer            | Validate and route the answer   |
| Rely on one model        | Use multiple tools and LLMs     |
| Hardcode workflow        | Make reusable agents and chains |
| One-off magic            | Scalable infrastructure         |

---

## 🧩 You're Now a Systems Designer of Intelligence

What you're really building are:

* 🧠 *Cognitive flows* (reasoning + action)
* ⚙️ *Functional pipelines* (transformations + routing)
* 🔄 *Feedback loops* (reflection, retry, revise)
* 🗺️ *Modular ecosystems* (tools + memory + agents)

**This is the future of programming.**
You don’t just code logic — you **design thought.**





> 🧠 **Keep tools simple and composable.
> Let the LLM handle complexity — not the code.**

---

## 🔧 Why Simple Tools = Smart Design

### ✅ 1. **Easier to Debug**

* If something breaks, you know exactly where to look.
* You can test the tool in isolation with a single prompt and schema.
* You don’t bury LLM behavior inside deep application logic.

---

### ✅ 2. **Easier to Maintain**

* If your schema changes, you don’t rewrite the function — you pass a new one in.
* If your prompt needs improvement, you update the string — not the function logic.
* This supports **rapid iteration** — tweak and retry in minutes.

---

### ✅ 3. **Composable in Pipelines**

* Each tool does **one thing well**:

  * Extract
  * Classify
  * Transform
  * Validate
* You can **chain** tools together like LEGO blocks.
* This makes your system **modular**, **testable**, and **scalable**.

---

## 🤖 Let the LLM Handle the "Cognitive Load"

LLMs are trained to:

* Parse messy human text
* Infer meaning
* Structure data
* Understand edge cases
* Generalize from examples

That’s **what they're best at** — so let them handle the **complex interpretation** and **transformation logic**, while you focus on:

* Tool boundaries
* Prompt clarity
* Output structure
* Workflow orchestration

---

## 🧭 The Ideal Pattern

| Role             | Responsibility                                       |
| ---------------- | ---------------------------------------------------- |
| 🧰 Tool Function | Keep it tight, reusable, predictable                 |
| 🧠 Prompt        | Provide task logic and structure                     |
| 🤖 LLM           | Handle flexible, fuzzy reasoning and content shaping |
| 🧱 You (the dev) | Design the system — not the micromanagement          |

---

## TL;DR

> Write simple functions.
> Write smart prompts.
> Let the LLM do the thinking.
> Let the system do the scaling.

