# AI-Orchestrated PowerPoint Report Generator (v2 - Simplified)

This notebook uses **your LLM API's agentic capabilities** to automatically generate comprehensive PowerPoint reports from warpage data.

## What's New in v2

- **90% shorter prompts** - Simplified from ~500 lines to ~50 lines total
- **Conversation context reuse** - AI remembers previous phase findings
- **Files as fallback** - Data files only for verification, not re-processing
- **Clearer structure** - Each phase builds on previous work

## How It Works

1. **Phase 1:** Analyze data → finds outliers, calculates statistics
2. **Phase 2:** Generate charts → uses Phase 1 findings (not raw files)
3. **Phase 3:** Build PowerPoint → uses Phase 1 & 2 findings (not raw files)

**Key Advantage:** Each phase reuses conversation memory, avoiding redundant file processing.

## 1. Setup & Configuration

In [None]:
import httpx
import json
from pathlib import Path
from datetime import datetime
import time
from IPython.display import display, Latex

# LLM API Client
class LLMApiClient:
    def __init__(self, base_url: str, timeout: float = 3600.0):
        self.base_url = base_url.rstrip("/")
        self.token = None
        self.timeout = httpx.Timeout(50.0, read=timeout, write=timeout, pool=timeout)

    def _headers(self):
        return {"Authorization": f"Bearer {self.token}"} if self.token else {}

    def login(self, username: str, password: str):
        r = httpx.post(f"{self.base_url}/api/auth/login", 
                      json={"username": username, "password": password}, timeout=10.0)
        r.raise_for_status()
        self.token = r.json()["access_token"]
        return r.json()

    def list_models(self):
        r = httpx.get(f"{self.base_url}/v1/models", headers=self._headers(), timeout=10.0)
        r.raise_for_status()
        return r.json()

    def chat_new(self, model: str, user_message: str, agent_type: str = "auto", files: list = None):
        messages = [{"role": "user", "content": user_message}]
        data = {"model": model, "messages": json.dumps(messages), "agent_type": agent_type}
        
        files_to_upload = []
        if files:
            for file_path in files:
                f = open(file_path, "rb")
                files_to_upload.append(("files", (Path(file_path).name, f)))
        
        try:
            r = httpx.post(f"{self.base_url}/v1/chat/completions", data=data,
                          files=files_to_upload if files_to_upload else None,
                          headers=self._headers(), timeout=self.timeout)
            r.raise_for_status()
            result = r.json()
            return result["choices"][0]["message"]["content"], result["x_session_id"]
        finally:
            for _, (_, f) in files_to_upload:
                f.close()

    def chat_continue(self, model: str, session_id: str, user_message: str, 
                     agent_type: str = "auto", files: list = None):
        messages = [{"role": "user", "content": user_message}]
        data = {"model": model, "messages": json.dumps(messages), 
                "session_id": session_id, "agent_type": agent_type}
        
        files_to_upload = []
        if files:
            for file_path in files:
                f = open(file_path, "rb")
                files_to_upload.append(("files", (Path(file_path).name, f)))
        
        try:
            r = httpx.post(f"{self.base_url}/v1/chat/completions", data=data,
                          files=files_to_upload if files_to_upload else None,
                          headers=self._headers(), timeout=self.timeout)
            r.raise_for_status()
            result = r.json()
            return result["choices"][0]["message"]["content"], result["x_session_id"]
        finally:
            for _, (_, f) in files_to_upload:
                f.close()

    def get_session_artifacts(self, session_id: str):
        """Get list of files generated during the session"""
        r = httpx.get(f"{self.base_url}/api/chat/sessions/{session_id}/artifacts",
                     headers=self._headers(), timeout=10.0)
        r.raise_for_status()
        return r.json()

# Configuration
API_BASE_URL = 'http://localhost:1007'
USERNAME = "leesihun"
PASSWORD = "s.hun.lee"

# Initialize and login
client = LLMApiClient(API_BASE_URL, timeout=3600.0)
client.login(USERNAME, PASSWORD)
models = client.list_models()
MODEL = models["data"][0]["id"]

print(f"✓ Logged in as: {USERNAME}")
print(f"✓ Using model: {MODEL}")

## 2. Configure Data Files

In [None]:
# Define your data files
stats_paths = [
    Path("B8_1021_stats.json"),
    Path("B8_1027_stats.json"),
]

# Verify files exist
print(f"Configured {len(stats_paths)} data file(s):\n")
for i, path in enumerate(stats_paths, 1):
    if path.exists():
        size_kb = path.stat().st_size / 1024
        print(f"  [{i}] {path.name} ({size_kb:.1f} KB) - ✓")
    else:
        print(f"  [{i}] {path.name} - ✗ NOT FOUND")

file_paths_str = [str(p) for p in stats_paths]

## 3. Phase 1: Data Analysis

The AI will analyze your data and identify key patterns.

In [None]:
# Simplified prompt - 90% shorter than v1
analysis_prompt = f"""
Analyze {len(stats_paths)} warpage measurement JSON files attached.

**Data Structure:**
- Each file has warpage statistics per PCB board
- Fields: min, max, range (warpage value), mean, median, std, skewness, kurtosis
- PCA values (pc1, pc2) calculated within each source_pdf
- Filenames contain acquisition date/time (e.g., 20251027121034 = 2025.10.27 12:10:34)

**Tasks:**
1. Calculate overall statistics (mean, std, min, max across all files)
2. Identify PCA-based outliers using pc1, pc2 values
3. Compare production dates - which is better quality and why?
4. List specific outlier filenames with reasons
5. Save results to numpy array locally

**Required Output:**
- Total measurements count
- Outlier list with full filenames
- Production date comparison (winner + reason)
- Key concerns or patterns

THINK HARD
"""
print("=" * 80)
print("PHASE 1: DATA ANALYSIS")
print("=" * 80)

start = time.time()
analysis_result, session_id = client.chat_new(
    MODEL, analysis_prompt, agent_type="auto", files=file_paths_str
)

print(f"\n✓ Analysis completed in {time.time() - start:.1f}s\n")
print("=" * 80)
display(Latex(analysis_result))
print("=" * 80)

## 4. Phase 2: Generate Visualizations

**Key:** AI reuses Phase 1 findings from conversation memory (not raw files).

In [None]:
# Simplified prompt using phase handoff pattern
viz_prompt = f"""
**PRIORITY: Use your Phase 1 analysis from conversation memory.**

In Phase 1, you already:
- Analyzed {len(stats_paths)} datasets and loaded all data
- Identified PCA outliers with pc1, pc2 values
- Compared production dates
- Listed specific outlier filenames

**DO NOT re-analyze raw files. Use your Phase 1 findings.**
Files attached are ONLY for verification if needed.

**Task:** Create visualizations and classify outliers

**Outlier Classification:**
- **BAD outliers:** High mean/std/range (critical quality issues)
- **GOOD outliers:** Unusual PCA position but acceptable metrics
- **Normal:** Within PCA cluster, standard metrics

**Required Charts** (save to temp_charts/):
1. `pca_outliers_classified.png` - PC1 vs PC2 scatter (Blue=normal, Orange=good outlier, RED=bad outlier)
2. `bad_outliers_detail.png` - Bar chart comparing bad outliers vs average
3. `production_comparison.png` - Production date quality comparison
4. Additional charts as appropriate (distributions, trends, control charts, etc.)

**Style:** 300 DPI, seaborn whitegrid, professional colors

**Required Output:**
- List of generated chart files
- Bad outlier summary (file IDs + reasons)
- Production date insights

THINK HARD
"""

print("=" * 80)
print("PHASE 2: VISUALIZATION GENERATION")
print("=" * 80)

start = time.time()
viz_result, _ = client.chat_continue(
    MODEL, session_id, viz_prompt, agent_type="auto", files=file_paths_str
)

print(f"\n✓ Visualizations completed in {time.time() - start:.1f}s\n")
print("=" * 80)
display(Latex(viz_result))
print("=" * 80)

## 5. Phase 3: PowerPoint Assembly

**Key:** AI uses Phase 1 & 2 findings from conversation memory.

In [None]:
# Get total file count
total_files = 0
for path in stats_paths:
    with open(path, 'r') as f:
        data = json.load(f)
        total_files += len(data.get('files', []))

# Simplified prompt using phase handoff
pptx_prompt = f"""
**PRIORITY: Use Phase 1 & 2 findings from conversation memory.**

You have:
- Phase 1: Statistics, outlier IDs, production date comparison
- Phase 2: Charts in temp_charts/, bad outlier classifications

**DO NOT re-analyze raw files. Use conversation context.**
Files attached are ONLY for verification if needed.

**Task:** Create PowerPoint using python-pptx

**Presentation:**
- Title: "Warpage Analysis Report"
- Subtitle: "Analysis of {total_files} Measurements ({len(stats_paths)} Production Dates)"
- Filename: `Warpage_Report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pptx`
- Size: 10 x 7.5 inches

**Slide Structure:**

1. **Title Slide** - Report title, subtitle, date

2. **Executive Summary (CRITICAL)** - RED ALERT BOX with:
   - Bad outliers list from Phase 2 (file IDs + reasons)
   - 4 metric cards: Total measurements, Better production date, Bad outlier count, Avg mean
   - Production date analysis from Phase 1
   - Key findings

3. **PCA Outlier Classification** - pca_outliers_classified.png + color legend

4. **Bad Outlier Details** - bad_outliers_detail.png + explanations from Phase 2

5. **Production Comparison** - production_comparison.png + insights from Phase 1

6-N. **Additional Charts** - Auto-discover ALL remaining PNG files in temp_charts/

N+1. **Recommendations** - Immediate actions, quality improvements, monitoring plan

**Style:** Use blank layout (index 6), professional colors (Blue #1f77b4, Red #d62728, Orange #ff7f0e, Green #2ca02c)

**CRITICAL:** Extract bad outliers and metrics from Phase 1 & 2 conversation context (not from files)


THINK HARD
"""

print("=" * 80)
print("PHASE 3: POWERPOINT ASSEMBLY")
print("=" * 80)

start = time.time()
pptx_result, _ = client.chat_continue(
    MODEL, session_id, pptx_prompt, agent_type="auto", files=file_paths_str
)

print(f"\n✓ PowerPoint completed in {time.time() - start:.1f}s\n")
print("=" * 80)
display(Latex(pptx_result))
print("=" * 80)

## 6. Summary & Results

View generated artifacts using the new session artifacts API.

In [None]:
import glob

print("=" * 80)
print("REPORT GENERATION COMPLETE!")
print("=" * 80)

# Option 1: Use new artifacts API (recommended)
try:
    artifacts_data = client.get_session_artifacts(session_id)
    artifacts = artifacts_data.get("artifacts", [])
    
    print(f"\n✓ Found {len(artifacts)} generated files via API:\n")
    
    pptx_files = [a for a in artifacts if a['extension'] == '.pptx']
    chart_files = [a for a in artifacts if a['extension'] == '.png']
    
    if pptx_files:
        print("PowerPoint Presentations:")
        for pptx in pptx_files:
            print(f"  - {pptx['filename']} ({pptx['size_kb']:.2f} KB)")
    
    if chart_files:
        print(f"\nCharts ({len(chart_files)}):")
        for chart in chart_files:
            print(f"  - {chart['filename']}")
            
except Exception as e:
    print(f"\n⚠ Artifacts API unavailable: {e}")
    print("Falling back to manual file discovery...\n")
    
    # Option 2: Manual discovery (fallback)
    pptx_files = sorted(glob.glob("Warpage_Report_*.pptx"), reverse=True)
    if pptx_files:
        latest = pptx_files[0]
        size_kb = Path(latest).stat().st_size / 1024
        print(f"PowerPoint: {latest} ({size_kb:.2f} KB)")
    
    temp_charts = Path("temp_charts")
    if temp_charts.exists():
        charts = list(temp_charts.glob("*.png"))
        print(f"\nCharts ({len(charts)}):")
        for chart in sorted(charts):
            print(f"  - {chart.name}")

print("\n" + "=" * 80)
print("WHAT CHANGED IN V2")
print("=" * 80)
print("""
✓ Prompts 90% shorter (500 lines → 50 lines)
✓ Conversation context reuse (phases reference prior work)
✓ Files as fallback only (no redundant re-processing)
✓ New artifacts API (programmatic file discovery)
✓ Clearer phase handoff instructions

Same output quality, much simpler orchestration!
""")
print("=" * 80)

---

## About This Notebook (v2)

**Key Improvements:**
- **Conversation context reuse:** Each phase explicitly references previous work
- **Files as fallback:** Data files only for verification, not re-processing  
- **90% shorter prompts:** From ~500 lines to ~50 lines total
- **Artifacts API:** Programmatic discovery of generated files

**Architecture Pattern:**
```
Phase 1: Analyze(files) → findings stored in conversation
           ↓
Phase 2: Visualize(Phase 1 findings) → charts + classification
           ↓
Phase 3: Build PPTX(Phase 1 + Phase 2 findings) → presentation
```

Each phase prioritizes **conversation memory** over raw file re-processing.

---

**Version:** 2.0.0 (Simplified)  
**Last Updated:** January 2025