# Warpage Analysis - PowerPoint Report Generator

This notebook generates a comprehensive PowerPoint presentation from warpage statistics data.

**Features:**
- **Multi-file support**: Load and compare multiple datasets
- Loads data via LLM API (follows API_examples.ipynb pattern)
- Creates beautiful visualizations using matplotlib/seaborn
- Generates professional PPTX with python-pptx
- Includes statistical analysis, trends, distributions, PCA, outliers, and recommendations
- Automatic dataset comparison when multiple files are provided

**Required Packages:**
```bash
pip install python-pptx matplotlib seaborn pandas numpy scipy scikit-learn
```

## 1. Install Required Packages

In [None]:
import sys
!{sys.executable} -m pip install python-pptx matplotlib seaborn pandas numpy scipy pillow -q

## 2. Setup API Client (Following API_examples.ipynb Pattern)

In [None]:
import httpx
import json
from pathlib import Path
from typing import Iterator

class LLMApiClient:
    def __init__(self, base_url: str, timeout: float = 3600.0):
        self.base_url = base_url.rstrip("/")
        self.token = None
        self.timeout = httpx.Timeout(50.0, read=timeout, write=timeout, pool=timeout)

    def _headers(self):
        h = {}
        if self.token:
            h["Authorization"] = f"Bearer {self.token}"
        return h

    def login(self, username: str, password: str):
        r = httpx.post(f"{self.base_url}/api/auth/login", json={
            "username": username, "password": password
        }, timeout=10.0)
        r.raise_for_status()
        data = r.json()
        self.token = data["access_token"]
        return data

    def list_models(self):
        headers = {"Authorization": f"Bearer {self.token}"} if self.token else {}
        r = httpx.get(f"{self.base_url}/v1/models", headers=headers, timeout=10.0)
        r.raise_for_status()
        return r.json()

    def chat_new(self, model: str, user_message: str, agent_type: str = "auto", files: list = None):
        messages = [{"role": "user", "content": user_message}]
        data = {
            "model": model,
            "messages": json.dumps(messages),
            "agent_type": agent_type
        }
        
        files_to_upload = []
        if files:
            for file_path in files:
                f = open(file_path, "rb")
                files_to_upload.append(("files", (Path(file_path).name, f)))
        
        try:
            r = httpx.post(
                f"{self.base_url}/v1/chat/completions",
                data=data,
                files=files_to_upload if files_to_upload else None,
                headers=self._headers(),
                timeout=self.timeout
            )
            r.raise_for_status()
            result = r.json()
            return result["choices"][0]["message"]["content"], result["x_session_id"]
        finally:
            for _, (_, f) in files_to_upload:
                f.close()

# Configuration
API_BASE_URL = 'http://localhost:1007'
USERNAME = "leesihun"
PASSWORD = "s.hun.lee"

# Initialize client
client = LLMApiClient(API_BASE_URL, timeout=3600.0)
print(f"Client ready: {API_BASE_URL}")

## 3. Login and Get Model

In [None]:
# Login
login_result = client.login(USERNAME, PASSWORD)
print(f"Logged in as: {USERNAME}")

# Get available models
models = client.list_models()
MODEL = models["data"][0]["id"]
print(f"Using model: {MODEL}")

## 4. Load Warpage Statistics Data via API

**Configure your data files below** - supports single or multiple JSON files for comparative analysis.

In [None]:
import pandas as pd
import numpy as np

# ========================================
# CONFIGURATION: Define your data files
# ========================================
# Option 1: Single file
# stats_paths = [Path(f"data/uploads/{USERNAME}/20251013_stats.json")]

# Option 2: Multiple files for comparison
stats_paths = [
    Path(f"data/uploads/{USERNAME}/20251013_stats.json"),
    # Path(f"data/uploads/{USERNAME}/20251014_stats.json"),
    # Path(f"data/uploads/{USERNAME}/20251015_stats.json"),
]

print(f"Loading {len(stats_paths)} data file(s)...\n")

# ========================================
# Load and combine all data files
# ========================================
all_dataframes = []
dataset_metadata = []

for idx, stats_path in enumerate(stats_paths, 1):
    print(f"[{idx}/{len(stats_paths)}] Loading: {stats_path.name}")
    
    # Load JSON
    with open(stats_path, 'r') as f:
        warpage_data = json.load(f)
    
    # Convert to DataFrame
    temp_df = pd.DataFrame(warpage_data['files'])
    
    # Extract PCA components
    temp_df['pc1'] = temp_df['pca'].apply(lambda x: x['pc1'])
    temp_df['pc2'] = temp_df['pca'].apply(lambda x: x['pc2'])
    
    # Add metadata columns
    temp_df['dataset_name'] = stats_path.stem  # Filename without extension
    temp_df['dataset_index'] = idx
    temp_df['source_file'] = str(stats_path)
    
    all_dataframes.append(temp_df)
    
    dataset_metadata.append({
        'name': stats_path.stem,
        'file_count': len(temp_df),
        'source_pdf': warpage_data.get('source_pdf', 'N/A')
    })
    
    print(f"    ✓ Loaded {len(temp_df)} measurement files from {stats_path.stem}")

# Combine all datasets
df = pd.concat(all_dataframes, ignore_index=True)

# Add global file index for temporal analysis (across all datasets)
df['file_index'] = range(1, len(df) + 1)

print(f"\n{'='*80}")
print(f"Total measurement files loaded: {len(df)}")
print(f"Number of datasets: {len(stats_paths)}")
print(f"Columns: {list(df.columns)}")
print(f"{'='*80}\n")

# Display dataset summary
print("Dataset Summary:")
for meta in dataset_metadata:
    print(f"  • {meta['name']}: {meta['file_count']} files (from {meta['source_pdf']})")

print("\nFirst few rows:")
df.head()

## 5. Statistical Analysis via API (Optional - for AI insights)

In [None]:
# Ask the AI to analyze the data (following example #13 pattern)
analysis_query = """
The attached files are warpage measurements in corresponding days.
Upon the  warpage analysis data, provide a concise executive summary:

1. Overall data quality assessment
2. Key findings (2-3 bullet points)
3. Files with potential quality issues (if any)
4. Recommended actions (2-3 points)

Keep the response under 200 words.
"""

# Pass all data files to the API
ai_analysis, session_id = client.chat_new(
    MODEL, 
    analysis_query, 
    agent_type="auto",
    files=[str(path) for path in stats_paths]  # Send all files
)

print("=== AI Executive Summary ===")
from IPython.display import display, Math, Latex
display(Latex(ai_analysis))
print("\n" + "="*80)

## 6. Setup Visualization Style

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
import warnings
warnings.filterwarnings('ignore')

# Professional color palette
COLORS = {
    'primary': '#1f77b4',      # Blue
    'secondary': '#ff7f0e',    # Orange
    'accent': '#2ca02c',       # Green
    'warning': '#d62728',      # Red
    'neutral': '#7f7f7f'       # Gray
}

# Set style
sns.set_style("whitegrid")
sns.set_palette("deep")

# High DPI for crisp images
rcParams['figure.dpi'] = 300
rcParams['savefig.dpi'] = 300
rcParams['font.size'] = 10
rcParams['axes.titlesize'] = 12
rcParams['axes.labelsize'] = 10
rcParams['xtick.labelsize'] = 9
rcParams['ytick.labelsize'] = 9

print("Visualization style configured")

## 7. Generate All Visualizations

In [None]:
# Create output directory for charts
output_dir = Path("temp_charts")
output_dir.mkdir(exist_ok=True)

print(f"Saving charts to: {output_dir}")

### 7.1 Temporal Trends

In [None]:
# Chart 1: Temporal trends (Mean, Median, Std over time)
fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(df['file_index'], df['mean'], label='Mean', linewidth=2, color=COLORS['primary'])
ax.plot(df['file_index'], df['median'], label='Median', linewidth=2, color=COLORS['secondary'], linestyle='--')
ax.fill_between(df['file_index'], 
                df['mean'] - df['std'], 
                df['mean'] + df['std'], 
                alpha=0.2, color=COLORS['primary'], label='±1 Std Dev')

ax.set_xlabel('File Index (Temporal Sequence)')
ax.set_ylabel('Warpage Value')
ax.set_title('Temporal Trends: Mean, Median, and Variability', fontweight='bold', fontsize=14)
ax.legend(loc='best')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(output_dir / 'temporal_trends.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: temporal_trends.png")

### 7.2 Dataset Comparison (if multiple files loaded)

In [None]:
# Chart 1b: Dataset comparison (only if multiple datasets loaded)
if len(stats_paths) > 1:
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Group by dataset for comparison
    dataset_summary = df.groupby('dataset_name').agg({
        'mean': 'mean',
        'std': 'mean',
        'range': 'mean',
        'kurtosis': 'mean'
    }).reset_index()
    
    # Plot 1: Mean comparison
    axes[0, 0].bar(dataset_summary['dataset_name'], dataset_summary['mean'], 
                   color=COLORS['primary'], edgecolor='black', alpha=0.7)
    axes[0, 0].set_title('Average Mean Warpage by Dataset', fontweight='bold')
    axes[0, 0].set_ylabel('Mean Warpage')
    axes[0, 0].tick_params(axis='x', rotation=45)
    axes[0, 0].grid(True, alpha=0.3, axis='y')
    
    # Plot 2: Std comparison
    axes[0, 1].bar(dataset_summary['dataset_name'], dataset_summary['std'], 
                   color=COLORS['secondary'], edgecolor='black', alpha=0.7)
    axes[0, 1].set_title('Average Std Dev by Dataset', fontweight='bold')
    axes[0, 1].set_ylabel('Standard Deviation')
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(True, alpha=0.3, axis='y')
    
    # Plot 3: Range comparison
    axes[1, 0].bar(dataset_summary['dataset_name'], dataset_summary['range'], 
                   color=COLORS['accent'], edgecolor='black', alpha=0.7)
    axes[1, 0].set_title('Average Range by Dataset', fontweight='bold')
    axes[1, 0].set_ylabel('Range')
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(True, alpha=0.3, axis='y')
    
    # Plot 4: Box plot comparison of means across datasets
    dataset_groups = [df[df['dataset_name'] == name]['mean'].values for name in dataset_summary['dataset_name']]
    bp = axes[1, 1].boxplot(dataset_groups, labels=dataset_summary['dataset_name'], patch_artist=True)
    
    # Color each box differently
    palette = sns.color_palette("deep", len(dataset_groups))
    for patch, color in zip(bp['boxes'], palette):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    
    axes[1, 1].set_title('Mean Distribution by Dataset', fontweight='bold')
    axes[1, 1].set_ylabel('Mean Warpage')
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.savefig(output_dir / 'dataset_comparison.png', bbox_inches='tight', dpi=300)
    plt.show()
    
    print("✓ Saved: dataset_comparison.png")
else:
    print("⊘ Skipped: dataset_comparison.png (single dataset mode)")

### 7.2 Distribution Analysis

In [None]:
# Chart 2: Distribution of key metrics (2x2 grid)
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Mean distribution
axes[0, 0].hist(df['mean'], bins=15, color=COLORS['primary'], edgecolor='black', alpha=0.7)
axes[0, 0].axvline(df['mean'].mean(), color=COLORS['warning'], linestyle='--', linewidth=2, label=f"Avg: {df['mean'].mean():.2f}")
axes[0, 0].set_title('Distribution of Mean Values', fontweight='bold')
axes[0, 0].set_xlabel('Mean Warpage')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()

# Std distribution
axes[0, 1].hist(df['std'], bins=15, color=COLORS['secondary'], edgecolor='black', alpha=0.7)
axes[0, 1].axvline(df['std'].mean(), color=COLORS['warning'], linestyle='--', linewidth=2, label=f"Avg: {df['std'].mean():.2f}")
axes[0, 1].set_title('Distribution of Standard Deviation', fontweight='bold')
axes[0, 1].set_xlabel('Std Dev')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].legend()

# Skewness distribution
axes[1, 0].hist(df['skewness'], bins=15, color=COLORS['accent'], edgecolor='black', alpha=0.7)
axes[1, 0].axvline(df['skewness'].mean(), color=COLORS['warning'], linestyle='--', linewidth=2, label=f"Avg: {df['skewness'].mean():.2f}")
axes[1, 0].set_title('Distribution of Skewness', fontweight='bold')
axes[1, 0].set_xlabel('Skewness')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].legend()

# Kurtosis distribution
axes[1, 1].hist(df['kurtosis'], bins=15, color=COLORS['neutral'], edgecolor='black', alpha=0.7)
axes[1, 1].axvline(df['kurtosis'].mean(), color=COLORS['warning'], linestyle='--', linewidth=2, label=f"Avg: {df['kurtosis'].mean():.2f}")
axes[1, 1].axvline(47, color='red', linestyle=':', linewidth=2, label="Outlier Threshold (47)")
axes[1, 1].set_title('Distribution of Kurtosis', fontweight='bold')
axes[1, 1].set_xlabel('Kurtosis')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].legend()

plt.tight_layout()
plt.savefig(output_dir / 'distributions.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: distributions.png")

### 7.3 Box Plot Analysis

In [None]:
# Chart 3: Box plots for variability
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Min/Max/Range
data_minmax = [df['min'], df['max'], df['range']]
bp1 = axes[0].boxplot(data_minmax, labels=['Min', 'Max', 'Range'], patch_artist=True)
for patch, color in zip(bp1['boxes'], [COLORS['primary'], COLORS['secondary'], COLORS['accent']]):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
axes[0].set_title('Min/Max/Range Analysis', fontweight='bold')
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

# Mean/Median
data_central = [df['mean'], df['median']]
bp2 = axes[1].boxplot(data_central, labels=['Mean', 'Median'], patch_artist=True)
for patch, color in zip(bp2['boxes'], [COLORS['primary'], COLORS['secondary']]):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
axes[1].set_title('Central Tendency', fontweight='bold')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)

# Std/Skewness/Kurtosis (normalized)
data_dist = [df['std'], df['skewness']*10, df['kurtosis']/10]  # Scale for visibility
bp3 = axes[2].boxplot(data_dist, labels=['Std', 'Skewness×10', 'Kurtosis÷10'], patch_artist=True)
for patch, color in zip(bp3['boxes'], [COLORS['accent'], COLORS['warning'], COLORS['neutral']]):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
axes[2].set_title('Distribution Metrics (Scaled)', fontweight='bold')
axes[2].set_ylabel('Scaled Value')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(output_dir / 'boxplots.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: boxplots.png")

### 7.4 PCA Scatter Plot

In [None]:
# Chart 4: PCA scatter plot
fig, ax = plt.subplots(figsize=(10, 7))

scatter = ax.scatter(df['pc1'], df['pc2'], 
                     c=df['file_index'], 
                     cmap='viridis', 
                     s=100, 
                     edgecolors='black', 
                     linewidth=0.5,
                     alpha=0.8)

# Annotate outliers (based on distance from center)
pc1_mean, pc2_mean = df['pc1'].mean(), df['pc2'].mean()
distances = np.sqrt((df['pc1'] - pc1_mean)**2 + (df['pc2'] - pc2_mean)**2)
outlier_threshold = distances.quantile(0.95)
outliers = df[distances > outlier_threshold]

for _, row in outliers.iterrows():
    ax.annotate(row['file_id'], 
                (row['pc1'], row['pc2']), 
                textcoords="offset points", 
                xytext=(5, 5), 
                ha='left',
                fontsize=7,
                bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.5))

cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('File Index (Time)', rotation=270, labelpad=20)

ax.set_xlabel('PC1 (Principal Component 1)')
ax.set_ylabel('PC2 (Principal Component 2)')
ax.set_title('PCA Analysis: PC1 vs PC2 (Color: Temporal Sequence)', fontweight='bold', fontsize=14)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(output_dir / 'pca_scatter.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: pca_scatter.png")

### 7.5 Correlation Heatmap

In [None]:
# Chart 5: Correlation heatmap
numeric_cols = ['min', 'max', 'range', 'mean', 'median', 'std', 'skewness', 'kurtosis', 'pc1', 'pc2']
corr_matrix = df[numeric_cols].corr()

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr_matrix, 
            annot=True, 
            fmt='.2f', 
            cmap='coolwarm', 
            center=0,
            square=True,
            linewidths=0.5,
            cbar_kws={"shrink": 0.8},
            ax=ax)

ax.set_title('Correlation Matrix: All Metrics', fontweight='bold', fontsize=14)

plt.tight_layout()
plt.savefig(output_dir / 'correlation_heatmap.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: correlation_heatmap.png")

### 7.6 Outlier Detection (Control Chart)

In [None]:
# Chart 6: Control chart for outlier detection
fig, ax = plt.subplots(figsize=(12, 6))

mean_avg = df['mean'].mean()
mean_std = df['mean'].std()

# Plot mean values
ax.plot(df['file_index'], df['mean'], marker='o', linewidth=1.5, markersize=4, color=COLORS['primary'], label='Mean Warpage')

# Control limits (±3σ)
ax.axhline(mean_avg, color='green', linestyle='-', linewidth=2, label=f'Center Line: {mean_avg:.2f}')
ax.axhline(mean_avg + 3*mean_std, color='red', linestyle='--', linewidth=2, label=f'UCL (+3σ): {mean_avg + 3*mean_std:.2f}')
ax.axhline(mean_avg - 3*mean_std, color='red', linestyle='--', linewidth=2, label=f'LCL (-3σ): {mean_avg - 3*mean_std:.2f}')

# Shade warning zones (±2σ)
ax.fill_between(df['file_index'], mean_avg - 2*mean_std, mean_avg + 2*mean_std, alpha=0.1, color='yellow')

# Highlight out-of-control points
outliers_upper = df[df['mean'] > mean_avg + 3*mean_std]
outliers_lower = df[df['mean'] < mean_avg - 3*mean_std]
ax.scatter(outliers_upper['file_index'], outliers_upper['mean'], color='red', s=100, zorder=5, marker='x', linewidths=3, label='Out of Control')
ax.scatter(outliers_lower['file_index'], outliers_lower['mean'], color='red', s=100, zorder=5, marker='x', linewidths=3)

ax.set_xlabel('File Index (Temporal Sequence)')
ax.set_ylabel('Mean Warpage')
ax.set_title('Control Chart: Mean Warpage with ±3σ Limits', fontweight='bold', fontsize=14)
ax.legend(loc='best', fontsize=9)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig(output_dir / 'control_chart.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: control_chart.png")

### 7.7 Top vs Bottom Performers (Radar Chart)

In [None]:
# Chart 7: Radar chart comparing best vs worst performers
from math import pi

# Define "quality" score (lower std = better, closer to median mean = better)
df['quality_score'] = -df['std'] - abs(df['mean'] - df['mean'].median())

# Get top 5 and bottom 5
top5 = df.nlargest(5, 'quality_score')
bottom5 = df.nsmallest(5, 'quality_score')

# Metrics for radar chart (normalized to 0-1)
metrics = ['mean', 'std', 'range', 'skewness', 'kurtosis']
top5_avg = top5[metrics].mean()
bottom5_avg = bottom5[metrics].mean()

# Normalize
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(df[metrics])
top5_norm = scaler.transform(top5[metrics]).mean(axis=0)
bottom5_norm = scaler.transform(bottom5[metrics]).mean(axis=0)

# Radar chart setup
angles = [n / float(len(metrics)) * 2 * pi for n in range(len(metrics))]
top5_norm = np.concatenate((top5_norm, [top5_norm[0]]))  # Close the circle
bottom5_norm = np.concatenate((bottom5_norm, [bottom5_norm[0]]))
angles += angles[:1]

fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))

ax.plot(angles, top5_norm, 'o-', linewidth=2, label='Top 5 Files (Best Quality)', color=COLORS['accent'])
ax.fill(angles, top5_norm, alpha=0.25, color=COLORS['accent'])

ax.plot(angles, bottom5_norm, 'o-', linewidth=2, label='Bottom 5 Files (Worst Quality)', color=COLORS['warning'])
ax.fill(angles, bottom5_norm, alpha=0.25, color=COLORS['warning'])

ax.set_xticks(angles[:-1])
ax.set_xticklabels(metrics, fontsize=10)
ax.set_ylim(0, 1)
ax.set_title('Radar Chart: Top 5 vs Bottom 5 Files\n(Normalized Metrics)', fontweight='bold', fontsize=14, pad=20)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
ax.grid(True)

plt.tight_layout()
plt.savefig(output_dir / 'radar_chart.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: radar_chart.png")

### 7.8 Summary Statistics Table (as image)

In [None]:
# Chart 8: Summary statistics table
summary_stats = df[numeric_cols].describe().round(2)

fig, ax = plt.subplots(figsize=(12, 6))
ax.axis('tight')
ax.axis('off')

table = ax.table(cellText=summary_stats.values,
                 rowLabels=summary_stats.index,
                 colLabels=summary_stats.columns,
                 cellLoc='center',
                 loc='center',
                 colWidths=[0.1]*len(summary_stats.columns))

table.auto_set_font_size(False)
table.set_fontsize(9)
table.scale(1, 2)

# Color header row
for i in range(len(summary_stats.columns)):
    table[(0, i)].set_facecolor(COLORS['primary'])
    table[(0, i)].set_text_props(weight='bold', color='white')

# Color row labels
for i in range(len(summary_stats.index)):
    table[(i+1, -1)].set_facecolor('#E8E8E8')
    table[(i+1, -1)].set_text_props(weight='bold')

ax.set_title('Summary Statistics: All Metrics', fontweight='bold', fontsize=14, pad=20)

plt.tight_layout()
plt.savefig(output_dir / 'summary_table.png', bbox_inches='tight', dpi=300)
plt.show()

print("✓ Saved: summary_table.png")

## 8. Generate PowerPoint Presentation

In [None]:
from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.enum.text import PP_ALIGN
from pptx.dml.color import RGBColor
from datetime import datetime

# Create presentation
prs = Presentation()
prs.slide_width = Inches(10)
prs.slide_height = Inches(7.5)

print("Creating PowerPoint presentation...")

### 8.1 Title Slide

In [None]:
# Slide 1: Title
slide = prs.slides.add_slide(prs.slide_layouts[6])  # Blank layout

# Title
title_box = slide.shapes.add_textbox(Inches(1), Inches(2.5), Inches(8), Inches(1))
title_frame = title_box.text_frame
title_frame.text = "Warpage Analysis Report"
title_frame.paragraphs[0].font.size = Pt(44)
title_frame.paragraphs[0].font.bold = True
title_frame.paragraphs[0].font.color.rgb = RGBColor(31, 119, 180)
title_frame.paragraphs[0].alignment = PP_ALIGN.CENTER

# Subtitle (show dataset count if multiple)
subtitle_text = f"Statistical Analysis of {len(df)} Measurement Files"
if len(stats_paths) > 1:
    subtitle_text += f" ({len(stats_paths)} Datasets)"

subtitle_box = slide.shapes.add_textbox(Inches(1), Inches(3.8), Inches(8), Inches(0.6))
subtitle_frame = subtitle_box.text_frame
subtitle_frame.text = subtitle_text
subtitle_frame.paragraphs[0].font.size = Pt(24)
subtitle_frame.paragraphs[0].alignment = PP_ALIGN.CENTER

# Date
date_box = slide.shapes.add_textbox(Inches(1), Inches(4.6), Inches(8), Inches(0.4))
date_frame = date_box.text_frame
date_frame.text = f"Report Date: {datetime.now().strftime('%Y-%m-%d')}"
date_frame.paragraphs[0].font.size = Pt(14)
date_frame.paragraphs[0].alignment = PP_ALIGN.CENTER
date_frame.paragraphs[0].font.italic = True

print("✓ Slide 1: Title")

### 8.2 Executive Summary Slide

In [None]:
# Slide 2: Executive Summary
slide = prs.slides.add_slide(prs.slide_layouts[6])

# Title
title = slide.shapes.add_textbox(Inches(0.5), Inches(0.3), Inches(9), Inches(0.6))
title.text_frame.text = "Executive Summary"
title.text_frame.paragraphs[0].font.size = Pt(32)
title.text_frame.paragraphs[0].font.bold = True
title.text_frame.paragraphs[0].font.color.rgb = RGBColor(31, 119, 180)

# Key metrics boxes
metrics_data = [
    ("Total Files", str(len(df)), COLORS['primary']),
    ("Avg Mean Warpage", f"{df['mean'].mean():.2f}", COLORS['secondary']),
    ("Avg Std Dev", f"{df['std'].mean():.2f}", COLORS['accent']),
    ("Outliers (Kurtosis>47)", str(len(df[df['kurtosis'] > 47])), COLORS['warning'])
]

x_start = 0.5
for i, (label, value, color) in enumerate(metrics_data):
    # Box background
    box = slide.shapes.add_shape(
        1,  # Rectangle
        Inches(x_start + i*2.3), Inches(1.2),
        Inches(2), Inches(1.2)
    )
    box.fill.solid()
    box.fill.fore_color.rgb = RGBColor(*tuple(int(color.lstrip('#')[i:i+2], 16) for i in (0, 2, 4)))
    box.line.color.rgb = RGBColor(255, 255, 255)
    
    # Value text
    value_box = slide.shapes.add_textbox(
        Inches(x_start + i*2.3), Inches(1.4),
        Inches(2), Inches(0.5)
    )
    value_box.text_frame.text = value
    value_box.text_frame.paragraphs[0].font.size = Pt(28)
    value_box.text_frame.paragraphs[0].font.bold = True
    value_box.text_frame.paragraphs[0].font.color.rgb = RGBColor(255, 255, 255)
    value_box.text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER
    
    # Label text
    label_box = slide.shapes.add_textbox(
        Inches(x_start + i*2.3), Inches(1.95),
        Inches(2), Inches(0.3)
    )
    label_box.text_frame.text = label
    label_box.text_frame.paragraphs[0].font.size = Pt(11)
    label_box.text_frame.paragraphs[0].font.color.rgb = RGBColor(255, 255, 255)
    label_box.text_frame.paragraphs[0].alignment = PP_ALIGN.CENTER

# AI Summary (if available)
summary_box = slide.shapes.add_textbox(Inches(0.5), Inches(2.8), Inches(9), Inches(3.5))
text_frame = summary_box.text_frame
text_frame.word_wrap = True
p = text_frame.paragraphs[0]
p.text = "Key Findings:\n\n" + (ai_analysis if 'ai_analysis' in locals() else "Data loaded successfully. Statistical analysis complete.")
p.font.size = Pt(14)
p.line_spacing = 1.3

print("✓ Slide 2: Executive Summary")

### 8.3 Chart Slides

In [None]:
# Helper function to add chart slide
def add_chart_slide(prs, title_text, image_path, description=""):
    slide = prs.slides.add_slide(prs.slide_layouts[6])
    
    # Title
    title = slide.shapes.add_textbox(Inches(0.5), Inches(0.3), Inches(9), Inches(0.5))
    title.text_frame.text = title_text
    title.text_frame.paragraphs[0].font.size = Pt(28)
    title.text_frame.paragraphs[0].font.bold = True
    title.text_frame.paragraphs[0].font.color.rgb = RGBColor(31, 119, 180)
    
    # Image
    img_height = 5.5 if not description else 4.5
    slide.shapes.add_picture(str(image_path), Inches(0.5), Inches(1), width=Inches(9), height=Inches(img_height))
    
    # Description (if provided)
    if description:
        desc_box = slide.shapes.add_textbox(Inches(0.5), Inches(5.7), Inches(9), Inches(1.2))
        desc_box.text_frame.text = description
        desc_box.text_frame.paragraphs[0].font.size = Pt(11)
        desc_box.text_frame.word_wrap = True

# Slide 3: Temporal Trends
add_chart_slide(
    prs,
    "Temporal Trends",
    output_dir / 'temporal_trends.png',
    "This chart shows how mean warpage values change over time (file sequence). The shaded area represents ±1 standard deviation, indicating variability around the mean."
)
print("✓ Slide 3: Temporal Trends")

# Slide 3b: Dataset Comparison (only if multiple datasets)
if len(stats_paths) > 1 and (output_dir / 'dataset_comparison.png').exists():
    add_chart_slide(
        prs,
        "Dataset Comparison",
        output_dir / 'dataset_comparison.png',
        f"Comparative analysis across {len(stats_paths)} datasets. Bar charts show average values by dataset, while box plots reveal distribution patterns and outliers within each dataset."
    )
    print("✓ Slide 3b: Dataset Comparison")

# Slide 4: Distributions
add_chart_slide(
    prs,
    "Distribution Analysis",
    output_dir / 'distributions.png',
    f"Distribution of key statistical metrics across all {len(df)} files. Red dashed lines indicate mean values. The kurtosis chart shows the outlier threshold at 47."
)
print("✓ Slide 4: Distribution Analysis")

# Slide 5: Box Plots
add_chart_slide(
    prs,
    "Variability Analysis",
    output_dir / 'boxplots.png',
    "Box plots show the spread and quartiles of key metrics. The boxes represent the interquartile range (IQR), with whiskers extending to 1.5×IQR."
)
print("✓ Slide 5: Box Plots")

# Slide 6: PCA
add_chart_slide(
    prs,
    "PCA Analysis",
    output_dir / 'pca_scatter.png',
    "Principal Component Analysis (PCA) reduces dimensionality to visualize clustering patterns. Color gradient represents temporal sequence. Outliers are labeled."
)
print("✓ Slide 6: PCA Analysis")

# Slide 7: Correlation
add_chart_slide(
    prs,
    "Correlation Matrix",
    output_dir / 'correlation_heatmap.png',
    "Correlation coefficients between all metrics. Values close to ±1 indicate strong relationships (blue=negative, red=positive)."
)
print("✓ Slide 7: Correlation Heatmap")

# Slide 8: Control Chart
add_chart_slide(
    prs,
    "Outlier Detection (Control Chart)",
    output_dir / 'control_chart.png',
    "Statistical process control chart with ±3σ limits. Points marked with 'X' are out of control and require investigation."
)
print("✓ Slide 8: Control Chart")

# Slide 9: Radar Chart
add_chart_slide(
    prs,
    "Top vs Bottom Performers",
    output_dir / 'radar_chart.png',
    "Comparison of top 5 files (best quality) vs bottom 5 (worst quality) across normalized metrics. Larger area indicates higher values."
)
print("✓ Slide 9: Radar Chart")

# Slide 10: Summary Table
add_chart_slide(
    prs,
    "Summary Statistics",
    output_dir / 'summary_table.png',
    "Descriptive statistics for all metrics: count, mean, std, min, quartiles (25%, 50%, 75%), and max values."
)
print("✓ Slide 10: Summary Table")

### 8.4 Recommendations Slide

In [None]:
# Slide 11: Recommendations
slide = prs.slides.add_slide(prs.slide_layouts[6])

# Title
title = slide.shapes.add_textbox(Inches(0.5), Inches(0.3), Inches(9), Inches(0.6))
title.text_frame.text = "Recommendations & Next Steps"
title.text_frame.paragraphs[0].font.size = Pt(32)
title.text_frame.paragraphs[0].font.bold = True
title.text_frame.paragraphs[0].font.color.rgb = RGBColor(31, 119, 180)

# Recommendations
recommendations = [
    ("Quality Control", [
        f"Investigate {len(df[df['kurtosis'] > 47])} files with extreme kurtosis (>47)",
        "Review files marked as 'out of control' in the control chart",
        "Establish tighter control limits based on current process capability"
    ]),
    ("Process Improvement", [
        f"Current mean warpage: {df['mean'].mean():.2f} (Target: closer to 0)",
        f"Reduce variability: Avg Std Dev = {df['std'].mean():.2f}",
        "Implement corrective actions for bottom 5 performers"
    ]),
    ("Monitoring", [
        "Continue tracking PCA trends for early anomaly detection",
        "Set up automated alerts for files exceeding ±3σ limits",
        "Conduct root cause analysis on temporal patterns"
    ])
]

y_pos = 1.2
for section, items in recommendations:
    # Section header
    header = slide.shapes.add_textbox(Inches(0.8), Inches(y_pos), Inches(8.5), Inches(0.4))
    header.text_frame.text = section
    header.text_frame.paragraphs[0].font.size = Pt(18)
    header.text_frame.paragraphs[0].font.bold = True
    header.text_frame.paragraphs[0].font.color.rgb = RGBColor(255, 127, 14)
    
    y_pos += 0.5
    
    # Bullet points
    for item in items:
        bullet = slide.shapes.add_textbox(Inches(1.2), Inches(y_pos), Inches(8), Inches(0.3))
        bullet.text_frame.text = f"• {item}"
        bullet.text_frame.paragraphs[0].font.size = Pt(12)
        y_pos += 0.35
    
    y_pos += 0.2

print("✓ Slide 11: Recommendations")

### 8.5 Save PowerPoint File

In [None]:
# Save the presentation
output_pptx = Path(f"Warpage_Analysis_Report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pptx")
prs.save(str(output_pptx))

print(f"\n{'='*80}")
print(f"PowerPoint report generated successfully!")
print(f"File: {output_pptx}")
print(f"Size: {output_pptx.stat().st_size / 1024:.2f} KB")
print(f"Total slides: {len(prs.slides)}")
print(f"{'='*80}")

## 9. Cleanup Temporary Files

In [None]:
# Optional: Clean up temporary chart files
import shutil

cleanup = input("Delete temporary chart files? (y/n): ")
if cleanup.lower() == 'y':
    shutil.rmtree(output_dir)
    print(f"✓ Cleaned up {output_dir}")
else:
    print(f"Temporary charts preserved in: {output_dir}")

## 10. Summary

**Report Generation Complete!**

This notebook has:
1. ✅ Connected to the LLM API (following API_examples.ipynb pattern)
2. ✅ Loaded warpage statistics data from JSON (supports **multiple files**)
3. ✅ Generated 8-9 professional visualizations (9 if multiple datasets)
4. ✅ Created an 11-12 slide PowerPoint presentation with:
   - Title slide (with dataset count)
   - Executive summary with key metrics
   - Temporal trends analysis
   - **Dataset comparison** (if multiple files provided)
   - Distribution analysis (4 metrics)
   - Box plot variability analysis
   - PCA scatter plot with outlier detection
   - Correlation heatmap
   - Control chart for quality control
   - Radar chart comparing top/bottom performers
   - Summary statistics table
   - Recommendations and next steps

**Multi-File Support:**
- Configure `stats_paths` in cell 8 to load multiple JSON files
- Automatic dataset comparison charts when >1 file is loaded
- All analysis combines data from all datasets
- Dataset metadata tracked and displayed

**Next Steps:**
- Open the generated .pptx file
- Customize branding/colors as needed
- Add company logo
- Present to stakeholders

**To run this notebook again:**
```bash
jupyter notebook PPTX_Report_Generator.ipynb
```