<a href="https://colab.research.google.com/github/micah-shull/AI_Agents/blob/main/023_ParseResponse_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



> “Should we use an LLM for something we can do easily with Python?”

Short answer: **No — not if it’s deterministic and cheap to do in code.**
But let’s unpack the bigger picture, because this is where **agent design maturity** really starts to show.

This is a **crucial point** in understanding how to design intelligent systems with agents.

---

## 🧠 When to Use an LLM vs. Traditional Code

| Task Type                                   | Use Python                        | Use LLM                              |
| ------------------------------------------- | --------------------------------- | ------------------------------------ |
| File listing, renaming, matching extensions | ✅ Yes — fast, deterministic, free | ❌ No — overkill, costly, error-prone |
| Sorting files by type/date                  | ✅ Yes                             | ❌ No                                 |
| Inferring *intent* from natural language    | ❌ No                              | ✅ Yes                                |
| Reasoning across vague instructions         | ❌ No                              | ✅ Yes                                |
| Multi-step planning                         | ❌ No                              | ✅ Yes                                |
| Responding conversationally to user         | ❌ No                              | ✅ Yes                                |
| Matching user instruction to a tool         | ⚠️ Possible but complex           | ✅ Easy with prompting                |

---

## 💡 Key distinction:

> **Use LLMs for language understanding and planning.**
> **Use code for everything deterministic and efficient.**

So:

* ✅ Use Python to **list files**, **filter extensions**, **move files**
* ✅ Use LLM to **interpret user intent** from:

  > “Hey assistant, could you move all the PDFs and Word docs into the project folder?”

---

## 🧩 Real-world hybrid approach

Let’s say the user types:

> “Can you move all lecture notes and RAG files to the organized folder?”

That’s vague. What are:

* “Lecture notes”? `.txt`? `.pdf`?
* “RAG files”? Specific filenames? `.json`?

✅ **LLM interprets that**

```json
{
  "tool_name": "move_files",
  "args": {
    "file_type": ["pdf", "json"],
    "source_dir": "docs_folder",
    "target_dir": "organized_folder"
  }
}
```

Then 🐍 **Python executes it efficiently**:

* `os.listdir()`, `shutil.move()`, etc.

---

## ⚡️ Rule of Thumb for Agent Design

> “LLMs should tell you *what* to do.
> Python should handle *how* to do it.”

---

## ✅ So in our notebook:

* Use Python to list the files ✅
* Use the LLM to turn vague language into structured commands ✅
* Use code to execute the plan ✅




In [1]:
%pip install -qU dotenv openai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m765.0/765.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from openai import OpenAI
from dotenv import load_dotenv
import os
import json
import re

load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

source_dir = "/content/docs_folder"

# Make sure the directory exists
if not os.path.exists(source_dir):
    raise FileNotFoundError(f"📁 Directory not found: {source_dir}")

# List and build full file paths
file_list = [
    os.path.join(source_dir, f)
    for f in os.listdir(source_dir)
    if os.path.isfile(os.path.join(source_dir, f))
]

# Display the found files
print("📂 Files found:")
for file in file_list:
    print("  -", file)

📂 Files found:
  - /content/docs_folder/001_PArse_the Response.txt
  - /content/docs_folder/006_Agent Loop with Function Calling.txt
  - /content/docs_folder/000_Prompting for Agents -GAIL.txt
  - /content/docs_folder/002_Execute_the_Action.txt
  - /content/docs_folder/005_Using Function Calling Capabilities with LLMs.txt
  - /content/docs_folder/003_gent Feedback and Memory.txt
  - /content/docs_folder/004_AGENT_Tools.txt


💥 You’re now stepping into the **hybrid agent zone**, where Python handles mechanics (like reading files), and the LLM handles higher-order tasks like **summarization, classification, and reasoning.**

---

## ✅ Extract filename + summarize contents

Let’s break it down using the rule:

| Step                        | Who should handle it         | Why                             |
| --------------------------- | ---------------------------- | ------------------------------- |
| List files                  | ✅ Python                     | Fast, local, predictable        |
| Read file contents          | ✅ Python                     | Straightforward, cheap          |
| Summarize contents          | ✅ LLM                        | Needs understanding of language |
| Group or tag files by theme | ✅ LLM                        | Requires semantic inference     |
| Move based on theme/summary | 🧩 LLM (plan) → Python (act) | Hybrid action loop              |

---

## 🧠 Example Flow

User says:

> “Summarize all the files in `docs_folder` and group them by topic.”

Your agent does:

1. **List the files** → Python
2. **Read contents (first N tokens)** → Python
3. **For each file, send to LLM**:

```python
Summarize this file:
Filename: lecture_01.txt
Content:
"Today we covered memory management for LLMs..."
```

4. LLM replies:

```json
{
  "summary": "Introduction to memory handling in agents.",
  "topic": "Memory / LLM internals"
}
```

5. Your agent stores or displays those summaries, or moves the files into folders by topic.

---

## 🧠 Why this is a smart LLM use

LLMs shine when:

* You need **semantic understanding** (“What is this about?”)
* You want **language tasks** like summarization, classification, insight extraction
* The logic is **subjective or fuzzy**, not strict rules

And they **struggle** when:

* You could use a loop
* You can `if`/`else` your way through it
* You want performance or cost-efficiency





Let’s build a **File Summarizer Agent** that:

1. Lists files from `/content/docs_folder`
2. Reads the content of each file
3. Sends it to the LLM with a prompt like:

   > "Summarize and identify the topic of this file."
4. Stores the results in a structured format

---

## ✅ Step 1: List files + read contents

```python
import os

SOURCE_DIR = "/content/docs_folder"

# List files
file_paths = [
    os.path.join(SOURCE_DIR, f)
    for f in os.listdir(SOURCE_DIR)
    if os.path.isfile(os.path.join(SOURCE_DIR, f))
]

print(f"📁 Found {len(file_paths)} files:")
for file in file_paths:
    print("  -", os.path.basename(file))
```

---

## ✅ Step 2: Read file content safely

Let’s read a *limited* number of characters to keep the LLM input short and cost-efficient:

```python
def read_file_preview(file_path, max_chars=1500):
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
            return content[:max_chars]
    except Exception as e:
        return f"Error reading file: {e}"
```

---

## ✅ Step 3: Create the summarization prompt

```python
def build_summary_prompt(filename, content):
    return [
        {"role": "system", "content": "You are a file summarization assistant. For each file, return a short summary and a topic label."},
        {"role": "user", "content": f"Filename: {filename}\n\nContent:\n{content}"}
    ]
```

---

## ✅ Step 4: Call the LLM for each file

```python
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_response(messages):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content
```

---

## ✅ Step 5: Process all files and collect summaries

```python
summaries = []

for path in file_paths:
    filename = os.path.basename(path)
    content = read_file_preview(path)
    prompt = build_summary_prompt(filename, content)

    print(f"\n📄 Summarizing: {filename}...")
    summary_response = generate_response(prompt)
    print("🧠 LLM Summary:\n", summary_response)

    summaries.append({
        "filename": filename,
        "summary": summary_response
    })
```

---

## ✅ Step 6: Review or save summaries

To view as a table (optional, for Colab):

```python
import pandas as pd
df = pd.DataFrame(summaries)
df.head()
```




In [5]:
import os
from openai import OpenAI
from dotenv import load_dotenv
import textwrap

# ✅ Step 1: List files + read contents
SOURCE_DIR = "/content/docs_folder"

# List files
file_paths = [
    os.path.join(SOURCE_DIR, f)
    for f in os.listdir(SOURCE_DIR)
    if os.path.isfile(os.path.join(SOURCE_DIR, f))
]

print(f"📁 Found {len(file_paths)} files:")
for file in file_paths:
    print("  -", os.path.basename(file))

# Read file content safely
def read_file_preview(file_path, max_chars=1500):
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
            return content[:max_chars]
    except Exception as e:
        return f"Error reading file: {e}"

# ✅ Step 3: Create the summarization prompt
def build_summary_prompt(filename, content):
    return [
        {"role": "system", "content": "You are a file summarization assistant. For each file, return a short summary and a topic label."},
        {"role": "user", "content": f"Filename: {filename}\n\nContent:\n{content}"}
    ]

# ✅ Step 4: Call the LLM for each file
load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_response(messages):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content

# ✅ Step 5: Process all files and collect summaries

summaries = []

for path in file_paths:
    filename = os.path.basename(path)
    content = read_file_preview(path)
    prompt = build_summary_prompt(filename, content)

    print(f"\n📄 Summarizing: {filename}...")

    summary_response = generate_response(prompt)

    print("🧠 LLM Summary:")
    print("-" * 60)
    print(textwrap.fill(summary_response, width=80))
    print("-" * 60)

    summaries.append({
        "filename": filename,
        "summary": summary_response
    })

📁 Found 7 files:
  - 001_PArse_the Response.txt
  - 006_Agent Loop with Function Calling.txt
  - 000_Prompting for Agents -GAIL.txt
  - 002_Execute_the_Action.txt
  - 005_Using Function Calling Capabilities with LLMs.txt
  - 003_gent Feedback and Memory.txt
  - 004_AGENT_Tools.txt

📄 Summarizing: 001_PArse_the Response.txt...
🧠 LLM Summary:
------------------------------------------------------------
**Summary:** The document explains the process of parsing responses generated by
a language model (LLM). It outlines how to extract an action and its parameters
from the output, specifically expecting a JSON format. The parsing is done using
a provided function that checks for the presence and validity of an action block
and handles errors accordingly.  **Topic Label:** LLM Response Parsing
------------------------------------------------------------

📄 Summarizing: 006_Agent Loop with Function Calling.txt...
🧠 LLM Summary:
------------------------------------------------------------
**Sum

In [7]:
# === 📦 Imports ===
import os
import json
import re
import textwrap
from openai import OpenAI
from dotenv import load_dotenv

# === 🔑 Load API Key ===
load_dotenv("/content/API_KEYS.env")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# === 📂 Step 1: List files ===
SOURCE_DIR = "/content/docs_folder"

file_paths = [
    os.path.join(SOURCE_DIR, f)
    for f in os.listdir(SOURCE_DIR)
    if os.path.isfile(os.path.join(SOURCE_DIR, f))
]

# === 📄 Step 2: Read preview of each file ===
def read_file_preview(file_path, max_chars=1500):
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            return f.read()[:max_chars]
    except Exception as e:
        return f"Error reading file: {e}"

# === 🤖 Step 3: Build summarization prompt ===
def build_summary_prompt(filename, content):
    return [
        {
            "role": "system",
            "content": "You are a file summarization assistant. For each file, return a short summary AND a topic label in JSON format inside a markdown code block like this:\n\n```action\n{\n  \"summary\": \"...\",\n  \"topic\": \"...\"\n}\n```"
        },
        {
            "role": "user",
            "content": f"Filename: {filename}\n\nContent:\n{content}"
        }
    ]

# === 📤 Step 4: Send to OpenAI ===
def generate_response(messages):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content

# === 🔍 Step 5: Extract markdown block ===
def extract_markdown_block(text: str, tag: str = "action") -> str:
    pattern = rf"```{tag}\s*(.*?)```"
    match = re.search(pattern, text, re.DOTALL)
    if match:
        return match.group(1).strip()
    else:
        raise ValueError(f"Missing markdown block with tag '{tag}'")

# === 🧠 Step 6: Parse the structured response ===
def parse_summary_response(response_text):
    try:
        block = extract_markdown_block(response_text, tag="action")
        return json.loads(block)
    except Exception as e:
        return {
            "summary": response_text.strip(),
            "topic": "Unknown (parse failed)",
            "error": str(e)
        }

# === 🔁 Step 7: Process all files ===
summaries = []

for path in file_paths:
    filename = os.path.basename(path)
    content = read_file_preview(path)
    prompt = build_summary_prompt(filename, content)

    print(f"\n📄 Summarizing: {filename}...")
    llm_response = generate_response(prompt)

    print("🧠 Raw LLM Output:")
    print(textwrap.fill(llm_response, width=80))

    parsed = parse_summary_response(llm_response)

    print("\n✅ Parsed Result:")
    print(f"  Summary: {parsed.get('summary')}")
    print(f"  Topic:   {parsed.get('topic')}")

    summaries.append({
        "filename": filename,
        "summary": parsed.get("summary"),
        "topic": parsed.get("topic"),
        "raw": llm_response
    })



📄 Summarizing: 001_PArse_the Response.txt...
🧠 Raw LLM Output:
```action {   "summary": "The document outlines a method for parsing responses
from a language model to extract actionable items formatted in JSON. It
emphasizes the importance of a structured output for execution, detailing a
function to handle valid and invalid responses.",   "topic": "Response Parsing"
} ```

✅ Parsed Result:
  Summary: The document outlines a method for parsing responses from a language model to extract actionable items formatted in JSON. It emphasizes the importance of a structured output for execution, detailing a function to handle valid and invalid responses.
  Topic:   Response Parsing

📄 Summarizing: 006_Agent Loop with Function Calling.txt...
🧠 Raw LLM Output:
```action {   "summary": "The document discusses how function calling can
simplify the AI agent loop by allowing for native support of structured
responses from large language models (LLMs). This innovation reduces complexity
in parsing re

Let’s compare **what we did before (no parsing)** with **what we’re doing now (structured parsing)** — and why this change matters so much in the context of building agents.

---

## 🧱 **Previous version (no parsing):**

### 🧠 LLM returned natural language like:

```
"This file introduces concepts of agent memory. It covers how LLMs are stateless..."
```

### ✅ Code behavior:

* Printed the LLM's response directly with `print()`
* Stored it as a `"summary"` in a dictionary or list
* Easy for **humans to read**
* ❌ But difficult for the agent to do anything structured with

---

## 🧩 **Current version (with parsing):**

### 🧠 LLM returns structured JSON *inside* a markdown code block:

````markdown
```action
{
  "summary": "Overview of agent memory handling.",
  "topic": "Memory Management"
}
````

---

### ✅ Code behavior:
- Uses `extract_markdown_block()` to isolate the JSON
- Uses `json.loads()` to turn it into a Python dictionary
- Now we get **structured fields**:
  - `summary = parsed["summary"]`
  - `topic = parsed["topic"]`

---

## 🔍 **Key Differences**

| Feature | No Parsing (Old) | With Parsing (New) |
|--------|------------------|---------------------|
| LLM Output Format | Free-form text | Markdown-wrapped JSON |
| Parsing | None | Extracted with regex, parsed with `json.loads()` |
| Data Structure | Plain string | Dictionary with `"summary"` and `"topic"` |
| Machine-readability | ❌ Not actionable | ✅ Agent-ready |
| Flexibility | Good for reading | Good for automation |
| Error handling | Minimal | Can detect and recover from malformed output |

---

## ✅ Why Parsing Matters for Agents

When you want your agent to:
- **Group files by topic**
- **Move or rename them**
- **Search summaries**
- **Build a dashboard**
- **Chain summaries into downstream tasks**

… then **structured data is everything**.

Parsing gives you **clear keys** (`"summary"`, `"topic"`, etc.) that your system can work with.

---

## 🧠 Final takeaway

> **Unparsed text is great for humans. Parsed JSON is great for agents.**

You’ve just leveled up from “print the answer” to “use the answer.” That’s agent territory. 👏

Would you like to now:
- Export these summaries to a CSV?
- Organize files by topic?
- Chain this into a vector index for RAG?



In [8]:
import shutil

# 📂 Base folder where we’ll organize by topic
TARGET_ROOT = "/content/organized_files"

# Create root folder if not exists
os.makedirs(TARGET_ROOT, exist_ok=True)

for summary in summaries:
    filename = summary["filename"]
    topic = summary["topic"] or "Uncategorized"

    # Clean topic string for use as a folder name
    topic_folder = os.path.join(TARGET_ROOT, topic.replace(" ", "_"))
    os.makedirs(topic_folder, exist_ok=True)

    # Build source and destination paths
    source_path = os.path.join(SOURCE_DIR, filename)
    target_path = os.path.join(topic_folder, filename)

    try:
        shutil.move(source_path, target_path)
        print(f"✅ Moved '{filename}' → {topic_folder}")
    except Exception as e:
        print(f"🚫 Failed to move '{filename}': {e}")


✅ Moved '001_PArse_the Response.txt' → /content/organized_files/Response_Parsing
✅ Moved '006_Agent Loop with Function Calling.txt' → /content/organized_files/AI_Development
✅ Moved '000_Prompting for Agents -GAIL.txt' → /content/organized_files/AI/Automation
✅ Moved '002_Execute_the_Action.txt' → /content/organized_files/Agent_Action_Execution
✅ Moved '005_Using Function Calling Capabilities with LLMs.txt' → /content/organized_files/AI_Function_Calling
✅ Moved '003_gent Feedback and Memory.txt' → /content/organized_files/Agent_Memory_and_Feedback
✅ Moved '004_AGENT_Tools.txt' → /content/organized_files/AI_Tool_Development


In [9]:
def print_directory_tree(start_path, indent=""):
    for item in sorted(os.listdir(start_path)):
        item_path = os.path.join(start_path, item)
        if os.path.isdir(item_path):
            print(f"{indent}📁 {item}/")
            print_directory_tree(item_path, indent + "    ")
        else:
            print(f"{indent}📄 {item}")

print("\n📦 Organized Folder Structure:\n")
print_directory_tree("/content/organized_files")


📦 Organized Folder Structure:

📁 AI/
    📁 Automation/
        📄 000_Prompting for Agents -GAIL.txt
📁 AI_Development/
    📄 006_Agent Loop with Function Calling.txt
📁 AI_Function_Calling/
    📄 005_Using Function Calling Capabilities with LLMs.txt
📁 AI_Tool_Development/
    📄 004_AGENT_Tools.txt
📁 Agent_Action_Execution/
    📄 002_Execute_the_Action.txt
📁 Agent_Memory_and_Feedback/
    📄 003_gent Feedback and Memory.txt
📁 Response_Parsing/
    📄 001_PArse_the Response.txt




## 🧠 **Concepts & Principles Learned**

### 1. **LLMs are Stateless by Default**

* LLMs don’t retain memory across calls.
* Developers must manage the **conversation history** manually to provide context.
* You learned to alternate `{"role": "user"}` and `{"role": "assistant"}` entries to simulate memory.

---

### 2. **Stateful Memory in Agent Loops**

* Built a manual memory buffer to retain past interactions.
* Saw how memory growth hits **token limits**, requiring summarization or truncation strategies.

---

### 3. **Parsing: The Core of Agent Execution**

* LLMs must return structured responses so agents can act.
* Markdown-wrapped JSON (e.g. `action { ... } `) is a common and powerful pattern.
* You implemented:

  * `extract_markdown_block()`
  * `parse_action()` with fallback error handling

---

### 4. **Rule of Thumb for Agents**

> ⚡️ “LLMs tell you *what* to do. Python handles *how* to do it.”

* This principle guided your design choices.
* Python handled reading, moving, listing files.
* LLM handled intent extraction, summarization, and classification.

---

## 🛠️ **Hands-On Agent Components Built**

### ✅ Calculator Agent

* Used structured LLM output to perform arithmetic
* Parsed tool name + arguments
* Demonstrated the need for markdown + JSON parsing

---

### ✅ File Summarization Agent

* Read files from `/content/docs_folder`
* Sent contents to an LLM with a summarization prompt
* Parsed responses into:

  * `"summary"`
  * `"topic"`

---

### ✅ File Organizer Agent

* Took the parsed topic from each summary
* Automatically created subfolders by topic
* Moved files into their topic folders using `shutil.move()`
* Printed out the final folder tree for verification

---

## 🧠 What You Now Understand Deeply

| Skill                       | Description                                                |
| --------------------------- | ---------------------------------------------------------- |
| ✅ Prompt Engineering        | You structured prompts for structured responses            |
| ✅ LLM-Oriented Parsing      | Extracting actionable JSON from LLM responses              |
| ✅ Memory Management         | Simulating stateful context across chat calls              |
| ✅ Language vs Code Boundary | Using LLMs for fuzzy logic, Python for precise execution   |
| ✅ File I/O + Agent Actions  | Integrated real-world Python file ops into an agent loop   |
| ✅ Agent Loops               | Built pipelines where the agent perceives → reasons → acts |

