# 🎯 Comprehensive Combined Experiment: Sampler + Strategy + CFG Analysis

## Ultimate Goal
Perform the most comprehensive analysis of Stable Diffusion jewelry generation by testing:
1. **Technical Parameters**: Models, samplers, steps, CFG scales (from sampler notebook)
2. **Prompt Engineering**: Different weighting strategies (from strategy notebook)
3. **Quality Metrics**: CLIP confidence + LAION aesthetic scores

## Complete Testing Framework
- **Models**: 3 (SDXL, SD 1.5, SD 2.1)
- **Samplers**: 5 (DDIM, DPMSolver++, DDPM, Euler, Euler_Ancestral)
- **Steps**: 6 (15, 20, 25, 30, 35, 40)
- **CFG Scales**: 3 (5.0, 7.5, 10.0)
- **Strategies**: 6 (baseline, light_compel, medium_compel, heavy_compel, numeric_weights, style_focus)
- **Prompts**: 8 jewelry prompts
- **Total combinations**: 3 × 5 × 6 × 3 × 6 × 8 = **12,960 generations**

## Evaluation Metrics
- **CLIP Analysis**: Semantic matching with jewelry-specific labels
- **LAION Aesthetic Scores**: Visual appeal on 0-10 scale
- **Generation Performance**: Time, success rate, resource usage
- **Multi-dimensional Analysis**: Cross-parameter interactions

---


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install compel

In [None]:
# Setup - Ultimate Combined Testing Framework
import torch
from diffusers import (
    StableDiffusionXLPipeline, StableDiffusionPipeline,
    DDIMScheduler, DPMSolverMultistepScheduler, DDPMScheduler,
    EulerDiscreteScheduler, EulerAncestralDiscreteScheduler
)
import matplotlib.pyplot as plt
import numpy as np
import os
import time
from datetime import datetime
from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import pandas as pd
import seaborn as sns
from collections import defaultdict
from itertools import product
import requests
import torch.nn as nn
from torchvision import transforms

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🖥️ Device: {device}")

# Enhanced Model Configuration with Multiple CFG Scales
MODEL_CONFIGS = {
    "SDXL": {
        "model_id": "stabilityai/stable-diffusion-xl-base-1.0",
        "resolution": 768,
        "pipeline_class": StableDiffusionXLPipeline,
        "default_cfg": 5.0
    },
    "SD15": {
        "model_id": "runwayml/stable-diffusion-v1-5",
        "resolution": 512,
        "pipeline_class": StableDiffusionPipeline,
        "default_cfg": 7.5
    },
    "SD21": {
        "model_id": "stabilityai/stable-diffusion-2-1",
        "resolution": 768,
        "pipeline_class": StableDiffusionPipeline,
        "default_cfg": 7.5
    }
}

# Enhanced Sampler Configuration with Euler Methods
SAMPLER_CONFIGS = {
    "DDIM": {
        "scheduler_class": DDIMScheduler,
        "description": "Deterministic, good quality, moderate speed"
    },
    "DPMSolver++": {
        "scheduler_class": DPMSolverMultistepScheduler,
        "description": "Fast convergence, excellent quality, newer method"
    },
    "DDPM": {
        "scheduler_class": DDPMScheduler,
        "description": "Original method, high quality, slower"
    },
    "Euler": {
        "scheduler_class": EulerDiscreteScheduler,
        "description": "Fast, simple ODE solver, good for fewer steps"
    },
    "Euler_Ancestral": {
        "scheduler_class": EulerAncestralDiscreteScheduler,
        "description": "Euler with randomness, non-deterministic, creative"
    }
}

# Extended Testing Parameters
STEP_COUNTS = [20, 30, 40, 50]
CFG_SCALES = [5.0, 7.5, 9.0]  # NEW: Multiple CFG scales to test

# Prompt Engineering Strategies (from strategy notebook)
STRATEGY_CONFIGS = {
    "baseline": {
        "description": "Original prompt unchanged",
        "modifier": lambda prompt: prompt
    },
    "light_compel": {
        "description": "Light emphasis with single +",
        "modifier": lambda prompt: apply_light_compel(prompt)
    },
    "medium_compel": {
        "description": "Medium emphasis with double ++",
        "modifier": lambda prompt: apply_medium_compel(prompt)
    },
    "heavy_compel": {
        "description": "Heavy emphasis with triple +++",
        "modifier": lambda prompt: apply_heavy_compel(prompt)
    },
    "numeric_weights": {
        "description": "Numerical weight emphasis (word:1.2)",
        "modifier": lambda prompt: apply_numeric_weights(prompt)
    },
    "style_focus": {
        "description": "Enhanced photography/style terms",
        "modifier": lambda prompt: apply_style_focus(prompt)
    }
}

# Test prompts (jewelry-focused)
test_prompts = [
    "channel-set diamond eternity band, 2 mm width, hammered 18k yellow gold, product-only white background",
    "14k rose-gold threader earrings, bezel-set round lab diamond ends, lifestyle macro shot, soft natural light",
    "organic cluster ring with mixed-cut sapphires and diamonds, brushed platinum finish, modern aesthetic",
    "modern signet ring, oval face, engraved gothic initial 'M', high-polish sterling silver, subtle reflection",
    "delicate gold huggie hoops, contemporary styling, isolated on neutral background",
    "stack of three slim rings: twisted gold, plain platinum, black rhodium pavé, editorial lighting",
    "bypass ring with stones on it, with refined simplicity and intentionally crafted for everyday wear",
    "A solid gold cuff bracelet with blue sapphire, with refined simplicity and intentionally crafted for everyday wear"
]

print(f"📊 Complete Testing Configuration:")
print(f"  • Models: {list(MODEL_CONFIGS.keys())}")
print(f"  • Samplers: {list(SAMPLER_CONFIGS.keys())}")
print(f"  • Step counts: {STEP_COUNTS}")
print(f"  • CFG scales: {CFG_SCALES}")
print(f"  • Strategies: {list(STRATEGY_CONFIGS.keys())}")
print(f"  • Prompts: {len(test_prompts)}")

total_combinations = (len(MODEL_CONFIGS) * len(SAMPLER_CONFIGS) *
                     len(STEP_COUNTS) * len(CFG_SCALES) *
                     len(STRATEGY_CONFIGS) * len(test_prompts))
print(f"  • Total generations: {total_combinations:,}")
print(f"  • Estimated time: ~{total_combinations * 0.5 / 60:.0f}-{total_combinations * 1 / 60:.0f} hours")

# Create output directory
os.makedirs("/content/drive/MyDrive/arcade_comp_results/combined_experiment_results", exist_ok=True)
print("✅ Ultimate setup complete!")


In [None]:
# Strategy Modifier Functions
def apply_light_compel(prompt):
    """Apply light emphasis to key jewelry terms"""
    key_terms = ['diamond', 'gold', 'silver', 'platinum', 'sapphire', 'ruby', 'emerald',
                 'eternity band', 'threader',
                 'huggie', 'cluster', 'signet', 'bypass', 'bezel', 'channel set']

    modified = prompt
    for term in key_terms:
        if term in modified.lower():
            # Add single + for light emphasis
            modified = modified.replace(term, f"({term})+")
    return modified

def apply_medium_compel(prompt):
    """Apply medium emphasis to key jewelry terms"""
    key_terms = ['diamond', 'gold', 'silver', 'platinum', 'sapphire', 'ruby', 'emerald',
                 'eternity band', 'threader',
                 'huggie', 'cluster', 'signet', 'bypass', 'bezel', 'channel set']

    modified = prompt
    for term in key_terms:
        if term in modified.lower():
            # Add double ++ for medium emphasis
            modified = modified.replace(term, f"({term})++")
    return modified

def apply_heavy_compel(prompt):
    """Apply heavy emphasis to key jewelry terms"""
    key_terms = ['diamond', 'gold', 'silver', 'platinum', 'sapphire', 'ruby', 'emerald',
                 'eternity band', 'threader',
                 'huggie', 'cluster', 'signet', 'bypass', 'bezel', 'channel set']

    modified = prompt
    for term in key_terms:
        if term in modified.lower():
            # Add triple +++ for heavy emphasis
            modified = modified.replace(term, f"({term})+++")
    return modified

def apply_numeric_weights(prompt):
    """Apply numerical weight emphasis to key terms"""
    key_terms = ['diamond', 'gold', 'silver', 'platinum', 'sapphire', 'ruby', 'emerald',
                 'eternity band', 'threader',
                 'huggie', 'cluster', 'signet', 'bypass', 'bezel', 'channel set']

    modified = prompt
    for term in key_terms:
        if term in modified.lower():
            # Add numerical weight (term:1.2)
            modified = modified.replace(term, f"({term}:1.2)")
    return modified

def apply_style_focus(prompt):
    """Enhance photography and style terms"""
    style_terms = ['macro', 'lighting', 'background', 'reflection', 'aesthetic',
                   'contemporary', 'modern', 'lifestyle', 'editorial', 'styling']

    modified = prompt
    for term in style_terms:
        if term in modified.lower():
            modified = modified.replace(term, f"{term}++")

    # Add enhanced photography context
    # if 'white background' in modified:
    #     modified = modified.replace('white background', 'professional white background, studio lighting++')
    # elif 'background' in modified:
    #     modified = modified.replace('background', 'professional background, studio lighting+')
    modified = modified + ", product-view, ultra-quality"

    return modified

print("✅ Strategy modifier functions ready!")


In [None]:
# LAION Aesthetic Predictor Implementation
class LAIONAestheticPredictor(nn.Module):
    """LAION Aesthetic Predictor using CLIP embeddings"""

    def __init__(self):
        super().__init__()
        # Simple MLP on top of CLIP features
        self.clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
        self.clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

        # Aesthetic prediction head (simplified version)
        self.aesthetic_head = nn.Sequential(
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 1),
            nn.Sigmoid()  # Output 0-1, will scale to 0-10
        )

    def forward(self, image):
        """Predict aesthetic score for image"""
        # Get CLIP image features
        inputs = self.clip_processor(images=image, return_tensors="pt").to(device)
        with torch.no_grad():
            image_features = self.clip_model.get_image_features(**inputs)

        # Predict aesthetic score
        aesthetic_score = self.aesthetic_head(image_features)
        return aesthetic_score * 10  # Scale to 0-10

# Initialize aesthetic predictor
print("🎨 Loading LAION Aesthetic Predictor...")
aesthetic_predictor = LAIONAestheticPredictor().to(device)
print("✅ Aesthetic predictor ready!")

# Enhanced CLIP Evaluation with Jewelry Labels
print("🔄 Loading CLIP model for evaluation...")
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").to(device)
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

# Comprehensive jewelry-specific CLIP labels
jewelry_labels = [
    "gold jewelry", "silver jewelry", "platinum jewelry", "diamond ring",
    "sapphire jewelry", "ruby jewelry", "emerald jewelry", "elegant ring",
    "luxury jewelry", "modern jewelry", "vintage jewelry", "classic jewelry",
    "contemporary jewelry", "minimalist jewelry", "ornate jewelry", "delicate jewelry",
    "bold jewelry", "statement jewelry", "engagement ring", "wedding ring",
    "eternity band", "signet ring", "cluster ring", "solitaire ring",
    "halo ring", "bypass ring", "threader earrings", "huggie hoops",
    "stud earrings", "drop earrings", "cuff bracelet", "tennis bracelet",
    "charm bracelet", "chain bracelet", "professional jewelry photography",
    "studio lighting", "macro photography", "luxury product photography",
    "high-end jewelry", "fine jewelry", "artisan jewelry", "handcrafted jewelry"
]

def analyze_image_with_clip(image, top_k=3):
    """Enhanced CLIP analysis with jewelry-specific labels"""
    inputs = clip_processor(text=jewelry_labels, images=image, return_tensors="pt", padding=True).to(device)

    with torch.no_grad():
        outputs = clip_model(**inputs)
        logits_per_image = outputs.logits_per_image
        probs = logits_per_image.softmax(dim=1)

    top_probs, top_indices = torch.topk(probs, top_k, dim=1)

    results = []
    for i in range(top_k):
        label = jewelry_labels[top_indices[0][i].item()]
        confidence = top_probs[0][i].item()
        results.append((label, confidence))

    return results

def get_aesthetic_score(image):
    """Get LAION aesthetic score for image"""
    with torch.no_grad():
        score = aesthetic_predictor(image)
        return score.item()

print("✅ Enhanced evaluation models ready!")


In [None]:
# Pipeline Management Functions
def load_model_with_sampler(model_choice, sampler_choice):
    """Load model pipeline with specified sampler"""
    model_config = MODEL_CONFIGS[model_choice]
    sampler_config = SAMPLER_CONFIGS[sampler_choice]

    print(f"🔄 Loading {model_choice} with {sampler_choice} sampler...")

    # Load base pipeline
    if model_choice == "SDXL":
        pipe = model_config["pipeline_class"].from_pretrained(
            model_config["model_id"],
            variant="fp16", torch_dtype=torch.float16
        ).to(device)
    else:
        pipe = model_config["pipeline_class"].from_pretrained(
            model_config["model_id"],
            torch_dtype=torch.float16 if device == "cuda" else torch.float32,
            safety_checker=None, requires_safety_checker=False
        ).to(device)

    # Replace scheduler
    pipe.scheduler = sampler_config["scheduler_class"].from_config(pipe.scheduler.config)

    return pipe

def generate_with_config(pipe, prompt, model_choice, steps, cfg_scale, seed=42):
    """Generate image with specified configuration including CFG scale"""
    model_config = MODEL_CONFIGS[model_choice]

    start_time = time.time()

    try:
        image = pipe(
            prompt=prompt,
            negative_prompt="fussy, vintage, “cheap-catalog” styles",
            num_inference_steps=steps,
            guidance_scale=cfg_scale,  # Use specified CFG scale
            width=model_config["resolution"],
            height=model_config["resolution"],
            generator=torch.Generator(device=device).manual_seed(seed)
        ).images[0]

        generation_time = time.time() - start_time
        return image, generation_time, None

    except Exception as e:
        generation_time = time.time() - start_time
        return None, generation_time, str(e)

print("✅ Enhanced pipeline management functions ready!")


In [None]:
# Main Ultimate Testing Loop
print("🚀 Starting ULTIMATE comprehensive experiment...")
print(f"⏱️ Testing {len(MODEL_CONFIGS)} models × {len(SAMPLER_CONFIGS)} samplers × {len(STEP_COUNTS)} steps × {len(CFG_SCALES)} CFG scales × {len(STRATEGY_CONFIGS)} strategies × {len(test_prompts)} prompts")

# Store all results with enhanced structure
all_results = {}
current_pipe = None
current_config = None
generation_counter = 0

start_time = datetime.now()
print(f"🕐 Experiment started at: {start_time}")

# Test each combination systematically
for model_choice in MODEL_CONFIGS.keys():
    print(f"\n🤖 Testing model: {model_choice}")

    for sampler_choice in SAMPLER_CONFIGS.keys():
        print(f"\n  🎛️ Testing sampler: {sampler_choice}")

        # Load pipeline with current sampler (reuse if same config)
        config_key = f"{model_choice}_{sampler_choice}"
        if current_config != config_key:
            if current_pipe is not None:
                del current_pipe
                torch.cuda.empty_cache()
            current_pipe = load_model_with_sampler(model_choice, sampler_choice)
            current_config = config_key

        for steps in STEP_COUNTS:
            print(f"\n    📊 Testing {steps} steps...")

            for cfg_scale in CFG_SCALES:
                print(f"\n      ⚙️ Testing CFG scale {cfg_scale}...")

                for strategy_name in STRATEGY_CONFIGS.keys():
                    strategy_config = STRATEGY_CONFIGS[strategy_name]
                    print(f"\n        🎯 Testing strategy: {strategy_name}")

                    for prompt_idx, original_prompt in enumerate(test_prompts, 1):
                        generation_counter += 1

                        # Apply strategy to prompt
                        modified_prompt = strategy_config["modifier"](original_prompt)

                        print(f"          📝 Prompt {prompt_idx}/8 ({generation_counter:,}/{total_combinations:,}): {original_prompt[:40]}...")

                        # Generate image
                        image, gen_time, error = generate_with_config(
                            current_pipe, modified_prompt, model_choice, steps, cfg_scale,
                            seed=100 + prompt_idx  # Consistent seed per prompt
                        )

                        if image is not None:
                            # Enhanced evaluation with both CLIP and aesthetic scores
                            clip_results = analyze_image_with_clip(image)
                            aesthetic_score = get_aesthetic_score(image)

                            # Create comprehensive filename
                            filename = f"/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/{model_choice}_{sampler_choice}_{steps}s_{cfg_scale}cfg_{strategy_name}_p{prompt_idx:02d}.png"
                            image.save(filename)

                            # Store comprehensive result
                            result_key = f"{model_choice}_{sampler_choice}_{steps}_{cfg_scale}_{strategy_name}_{prompt_idx}"
                            all_results[result_key] = {
                                'model': model_choice,
                                'sampler': sampler_choice,
                                'steps': steps,
                                'cfg_scale': cfg_scale,
                                'strategy': strategy_name,
                                'prompt_id': prompt_idx,
                                'original_prompt': original_prompt,
                                'modified_prompt': modified_prompt,
                                'image': image,
                                'filepath': filename,
                                'generation_time': gen_time,
                                'clip_top_label': clip_results[0][0],
                                'clip_top_confidence': clip_results[0][1],
                                'clip_results': clip_results,
                                'laion_aesthetic_score': aesthetic_score,
                                'error': None,
                                'timestamp': datetime.now()
                            }

                            print(f"            ✅ Generated in {gen_time:.1f}s | CLIP: {clip_results[0][0]} ({clip_results[0][1]:.3f}) | Aesthetic: {aesthetic_score:.2f}")

                        else:
                            print(f"            ❌ Failed: {error}")
                            result_key = f"{model_choice}_{sampler_choice}_{steps}_{cfg_scale}_{strategy_name}_{prompt_idx}"
                            all_results[result_key] = {
                                'model': model_choice,
                                'sampler': sampler_choice,
                                'steps': steps,
                                'cfg_scale': cfg_scale,
                                'strategy': strategy_name,
                                'prompt_id': prompt_idx,
                                'original_prompt': original_prompt,
                                'modified_prompt': modified_prompt,
                                'image': None,
                                'filepath': None,
                                'generation_time': gen_time,
                                'error': error,
                                'timestamp': datetime.now()
                            }

                        # Progress tracking
                        if generation_counter % 100 == 0:
                            elapsed = datetime.now() - start_time
                            avg_time_per_gen = elapsed.total_seconds() / generation_counter
                            remaining_time = (total_combinations - generation_counter) * avg_time_per_gen / 3600
                            print(f"\n📈 Progress: {generation_counter:,}/{total_combinations:,} ({generation_counter/total_combinations*100:.1f}%)")
                            print(f"⏱️ Elapsed: {elapsed} | Est. remaining: {remaining_time:.1f} hours")

# Cleanup
if current_pipe is not None:
    del current_pipe
    torch.cuda.empty_cache()

end_time = datetime.now()
total_time = end_time - start_time

print(f"\n🎉 ULTIMATE EXPERIMENT COMPLETED!")
print(f"⏱️ Total time: {total_time}")
successful_results = sum(1 for r in all_results.values() if r.get('image') is not None)
total_results = len(all_results)
print(f"📊 Results: {successful_results:,}/{total_results:,} successful generations ({successful_results/total_results*100:.1f}%)")


In [None]:
# Export Ultimate Comprehensive CSV Results
print("💾 Exporting ultimate comprehensive CSV results...")

csv_data = []
for result in all_results.values():
    if result.get('image') is not None:
        clip_results = result['clip_results']
        row = {
            # Model & Technical Parameters
            'model': result['model'],
            'model_id': MODEL_CONFIGS[result['model']]['model_id'],
            'model_resolution': MODEL_CONFIGS[result['model']]['resolution'],
            'sampler': result['sampler'],
            'sampler_description': SAMPLER_CONFIGS[result['sampler']]['description'],
            'steps': result['steps'],
            'cfg_scale': result['cfg_scale'],

            # Prompt Strategy
            'strategy': result['strategy'],
            'strategy_description': STRATEGY_CONFIGS[result['strategy']]['description'],
            'prompt_id': result['prompt_id'],
            'original_prompt': result['original_prompt'],
            'modified_prompt': result['modified_prompt'],

            # Results & Metrics
            'image_path': result['filepath'],
            'generation_time': result['generation_time'],

            # CLIP Analysis (Top 3)
            'clip_top_label': clip_results[0][0],
            'clip_top_confidence': clip_results[0][1],
            'clip_label_2': clip_results[1][0] if len(clip_results) > 1 else '',
            'clip_confidence_2': clip_results[1][1] if len(clip_results) > 1 else 0.0,
            'clip_label_3': clip_results[2][0] if len(clip_results) > 2 else '',
            'clip_confidence_3': clip_results[2][1] if len(clip_results) > 2 else 0.0,

            # LAION Aesthetic Score
            'laion_aesthetic_score': result['laion_aesthetic_score'],

            # Meta information
            'timestamp': result['timestamp'],
            'error': None
        }
        csv_data.append(row)
    else:
        # Include failed generations for analysis
        row = {
            # Model & Technical Parameters
            'model': result['model'],
            'model_id': MODEL_CONFIGS[result['model']]['model_id'],
            'model_resolution': MODEL_CONFIGS[result['model']]['resolution'],
            'sampler': result['sampler'],
            'sampler_description': SAMPLER_CONFIGS[result['sampler']]['description'],
            'steps': result['steps'],
            'cfg_scale': result['cfg_scale'],

            # Prompt Strategy
            'strategy': result['strategy'],
            'strategy_description': STRATEGY_CONFIGS[result['strategy']]['description'],
            'prompt_id': result['prompt_id'],
            'original_prompt': result['original_prompt'],
            'modified_prompt': result['modified_prompt'],

            # Results & Metrics (Failed)
            'image_path': None,
            'generation_time': result['generation_time'],
            'error': result.get('error', 'Unknown error'),
            'timestamp': result['timestamp'],

            # Empty metrics for failed generations
            'clip_top_label': '',
            'clip_top_confidence': 0.0,
            'clip_label_2': '',
            'clip_confidence_2': 0.0,
            'clip_label_3': '',
            'clip_confidence_3': 0.0,
            'laion_aesthetic_score': 0.0
        }
        csv_data.append(row)

# Create comprehensive DataFrame
df = pd.DataFrame(csv_data)
csv_filename = "/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/ultimate_comprehensive_results.csv"
df.to_csv(csv_filename, index=False)

print(f"💾 Saved ultimate results to: {csv_filename}")
print(f"📋 Total entries: {len(df):,}")
successful_entries = len(df[df['image_path'].notna()])
print(f"✅ Successful generations: {successful_entries:,}/{len(df):,} ({successful_entries/len(df)*100:.1f}%)")

print("✅ Ultimate CSV export completed!")


In [None]:
# Ultimate Multi-Dimensional Analysis & Visualizations
print("🎨 Creating ultimate multi-dimensional analysis...")

# Prepare data for comprehensive analysis
if len(df[df['image_path'].notna()]) > 0:
    df_success = df[df['image_path'].notna()].copy()

    # Create efficiency scores
    df_success['efficiency_score'] = df_success['clip_top_confidence'] / df_success['generation_time']
    df_success['quality_score'] = (df_success['clip_top_confidence'] + df_success['laion_aesthetic_score']/10) / 2

    print(f"📊 Analyzing {len(df_success):,} successful generations...")

    # 1. Performance Heatmaps by Model x Sampler x CFG
    fig, axes = plt.subplots(2, 3, figsize=(20, 12))

    models = list(MODEL_CONFIGS.keys())
    samplers = list(SAMPLER_CONFIGS.keys())

    for i, model in enumerate(models):
        model_data = df_success[df_success['model'] == model]

        # Generation Time Heatmap
        time_matrix = model_data.pivot_table(values='generation_time',
                                           index='sampler', columns='cfg_scale', aggfunc='mean')
        im1 = axes[0, i].imshow(time_matrix.values, cmap='RdYlBu_r', aspect='auto')
        axes[0, i].set_title(f'{model} - Avg Generation Time (s)', fontweight='bold')
        axes[0, i].set_xlabel('CFG Scale')
        axes[0, i].set_ylabel('Sampler')
        axes[0, i].set_xticks(range(len(CFG_SCALES)))
        axes[0, i].set_xticklabels(CFG_SCALES)
        axes[0, i].set_yticks(range(len(samplers)))
        axes[0, i].set_yticklabels(samplers, rotation=45)
        plt.colorbar(im1, ax=axes[0, i])

        # Add values to cells
        for j in range(len(samplers)):
            for k in range(len(CFG_SCALES)):
                if not pd.isna(time_matrix.iloc[j, k]):
                    axes[0, i].text(k, j, f'{time_matrix.iloc[j, k]:.1f}',
                                   ha='center', va='center', fontweight='bold',
                                   color='white' if time_matrix.iloc[j, k] > time_matrix.values.max()/2 else 'black')

        # Quality Score Heatmap
        quality_matrix = model_data.pivot_table(values='quality_score',
                                              index='sampler', columns='cfg_scale', aggfunc='mean')
        im2 = axes[1, i].imshow(quality_matrix.values, cmap='RdYlGn', aspect='auto')
        axes[1, i].set_title(f'{model} - Avg Quality Score', fontweight='bold')
        axes[1, i].set_xlabel('CFG Scale')
        axes[1, i].set_ylabel('Sampler')
        axes[1, i].set_xticks(range(len(CFG_SCALES)))
        axes[1, i].set_xticklabels(CFG_SCALES)
        axes[1, i].set_yticks(range(len(samplers)))
        axes[1, i].set_yticklabels(samplers, rotation=45)
        plt.colorbar(im2, ax=axes[1, i])

        # Add values to cells
        for j in range(len(samplers)):
            for k in range(len(CFG_SCALES)):
                if not pd.isna(quality_matrix.iloc[j, k]):
                    axes[1, i].text(k, j, f'{quality_matrix.iloc[j, k]:.3f}',
                                   ha='center', va='center', fontweight='bold',
                                   color='white' if quality_matrix.iloc[j, k] < quality_matrix.values.max()/2 else 'black')

    plt.suptitle('Ultimate Performance & Quality Analysis by Model, Sampler & CFG Scale',
                fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.savefig('/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/ultimate_performance_heatmaps.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("✅ Ultimate performance heatmaps created!")
else:
    print("⚠️ No successful results to analyze yet.")


In [None]:
# Strategy & Quality Analysis
if len(df_success) > 0:
    print("🎯 Creating strategy effectiveness analysis...")

    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))

    # 1. Strategy Performance Comparison
    strategy_stats = df_success.groupby('strategy').agg({
        'clip_top_confidence': 'mean',
        'laion_aesthetic_score': 'mean',
        'generation_time': 'mean',
        'quality_score': 'mean'
    }).round(3)

    strategy_stats.plot(kind='bar', ax=ax1, color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])
    ax1.set_title('Strategy Performance Comparison', fontweight='bold')
    ax1.set_xlabel('Strategy')
    ax1.set_ylabel('Score')
    ax1.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    ax1.tick_params(axis='x', rotation=45)

    # 2. CFG Scale vs Quality
    cfg_quality = df_success.groupby('cfg_scale')['quality_score'].agg(['mean', 'std']).reset_index()
    ax2.errorbar(cfg_quality['cfg_scale'], cfg_quality['mean'], yerr=cfg_quality['std'],
                marker='o', capsize=5, capthick=2, linewidth=2, markersize=8)
    ax2.set_title('CFG Scale vs Quality Score', fontweight='bold')
    ax2.set_xlabel('CFG Scale')
    ax2.set_ylabel('Quality Score')
    ax2.grid(True, alpha=0.3)

    # 3. Step Count Efficiency
    step_efficiency = df_success.groupby('steps').agg({
        'quality_score': 'mean',
        'generation_time': 'mean',
        'efficiency_score': 'mean'
    })

    ax3_twin = ax3.twinx()
    line1 = ax3.plot(step_efficiency.index, step_efficiency['quality_score'],
                    marker='o', color='blue', linewidth=2, label='Quality Score')
    line2 = ax3_twin.plot(step_efficiency.index, step_efficiency['generation_time'],
                         marker='s', color='red', linewidth=2, label='Generation Time (s)')

    ax3.set_xlabel('Step Count')
    ax3.set_ylabel('Quality Score', color='blue')
    ax3_twin.set_ylabel('Generation Time (s)', color='red')
    ax3.set_title('Step Count vs Quality & Time Trade-off', fontweight='bold')
    ax3.grid(True, alpha=0.3)

    # Combine legends
    lines1, labels1 = ax3.get_legend_handles_labels()
    lines2, labels2 = ax3_twin.get_legend_handles_labels()
    ax3.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

    # 4. Sampler Comparison with LAION Scores
    sampler_comparison = df_success.groupby('sampler').agg({
        'clip_top_confidence': 'mean',
        'laion_aesthetic_score': 'mean',
        'generation_time': 'mean'
    })

    # Scatter plot: Generation Time vs Aesthetic Score, colored by CLIP confidence
    scatter = ax4.scatter(sampler_comparison['generation_time'],
                         sampler_comparison['laion_aesthetic_score'],
                         c=sampler_comparison['clip_top_confidence'],
                         s=200, alpha=0.7, cmap='viridis')

    # Add sampler labels
    for i, sampler in enumerate(sampler_comparison.index):
        ax4.annotate(sampler,
                    (sampler_comparison.iloc[i]['generation_time'],
                     sampler_comparison.iloc[i]['laion_aesthetic_score']),
                    xytext=(5, 5), textcoords='offset points', fontweight='bold')

    ax4.set_xlabel('Average Generation Time (s)')
    ax4.set_ylabel('Average LAION Aesthetic Score')
    ax4.set_title('Sampler Performance: Time vs Aesthetics\\n(Color = CLIP Confidence)', fontweight='bold')
    plt.colorbar(scatter, ax=ax4, label='CLIP Confidence')
    ax4.grid(True, alpha=0.3)

    plt.suptitle('Ultimate Strategy & Quality Analysis', fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.savefig('/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/ultimate_strategy_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("✅ Strategy effectiveness analysis created!")


In [None]:
# Ultimate Best Configuration Recommendations
if len(df_success) > 0:
    print("🏆 Generating ultimate configuration recommendations...")

    # Find best configurations for different objectives
    print("\\n" + "="*80)
    print("🎯 ULTIMATE CONFIGURATION RECOMMENDATIONS")
    print("="*80)

    # 1. Highest Quality (CLIP + LAION combined)
    best_quality = df_success.loc[df_success['quality_score'].idxmax()]
    print(f"\\n🥇 HIGHEST QUALITY CONFIGURATION:")
    print(f"   Model: {best_quality['model']}")
    print(f"   Sampler: {best_quality['sampler']}")
    print(f"   Steps: {best_quality['steps']}")
    print(f"   CFG Scale: {best_quality['cfg_scale']}")
    print(f"   Strategy: {best_quality['strategy']}")
    print(f"   Quality Score: {best_quality['quality_score']:.3f}")
    print(f"   CLIP Confidence: {best_quality['clip_top_confidence']:.3f}")
    print(f"   LAION Aesthetic: {best_quality['laion_aesthetic_score']:.2f}")
    print(f"   Generation Time: {best_quality['generation_time']:.1f}s")

    # 2. Best Efficiency (Quality per Time)
    best_efficiency = df_success.loc[df_success['efficiency_score'].idxmax()]
    print(f"\\n⚡ MOST EFFICIENT CONFIGURATION:")
    print(f"   Model: {best_efficiency['model']}")
    print(f"   Sampler: {best_efficiency['sampler']}")
    print(f"   Steps: {best_efficiency['steps']}")
    print(f"   CFG Scale: {best_efficiency['cfg_scale']}")
    print(f"   Strategy: {best_efficiency['strategy']}")
    print(f"   Efficiency Score: {best_efficiency['efficiency_score']:.4f}")
    print(f"   Quality Score: {best_efficiency['quality_score']:.3f}")
    print(f"   Generation Time: {best_efficiency['generation_time']:.1f}s")

    # 3. Fastest with Good Quality (Quality > 0.4, minimum time)
    fast_quality = df_success[df_success['quality_score'] > 0.4]
    if len(fast_quality) > 0:
        fastest_good = fast_quality.loc[fast_quality['generation_time'].idxmin()]
        print(f"\\n🚀 FASTEST WITH GOOD QUALITY:")
        print(f"   Model: {fastest_good['model']}")
        print(f"   Sampler: {fastest_good['sampler']}")
        print(f"   Steps: {fastest_good['steps']}")
        print(f"   CFG Scale: {fastest_good['cfg_scale']}")
        print(f"   Strategy: {fastest_good['strategy']}")
        print(f"   Generation Time: {fastest_good['generation_time']:.1f}s")
        print(f"   Quality Score: {fastest_good['quality_score']:.3f}")

    # 4. Best by Model (Top configuration for each model)
    print(f"\\n🤖 BEST CONFIGURATION BY MODEL:")
    for model in MODEL_CONFIGS.keys():
        model_data = df_success[df_success['model'] == model]
        if len(model_data) > 0:
            best_model = model_data.loc[model_data['quality_score'].idxmax()]
            print(f"   {model}: {best_model['sampler']} | {best_model['steps']}s | CFG {best_model['cfg_scale']} | {best_model['strategy']}")
            print(f"     Quality: {best_model['quality_score']:.3f} | Time: {best_model['generation_time']:.1f}s")

    # 5. Strategy Rankings
    print(f"\\n🎯 STRATEGY EFFECTIVENESS RANKING:")
    strategy_ranking = df_success.groupby('strategy')['quality_score'].mean().sort_values(ascending=False)
    for i, (strategy, score) in enumerate(strategy_ranking.items(), 1):
        print(f"   {i}. {strategy}: {score:.3f}")

    # 6. Sampler Rankings
    print(f"\\n🎛️ SAMPLER PERFORMANCE RANKING:")
    sampler_ranking = df_success.groupby('sampler')['efficiency_score'].mean().sort_values(ascending=False)
    for i, (sampler, score) in enumerate(sampler_ranking.items(), 1):
        time_avg = df_success[df_success['sampler'] == sampler]['generation_time'].mean()
        quality_avg = df_success[df_success['sampler'] == sampler]['quality_score'].mean()
        print(f"   {i}. {sampler}: Efficiency {score:.4f} | Quality {quality_avg:.3f} | Time {time_avg:.1f}s")

    # 7. CFG Scale Recommendations
    print(f"\\n⚙️ OPTIMAL CFG SCALE BY MODEL:")
    for model in MODEL_CONFIGS.keys():
        model_data = df_success[df_success['model'] == model]
        if len(model_data) > 0:
            cfg_performance = model_data.groupby('cfg_scale')['quality_score'].mean()
            best_cfg = cfg_performance.idxmax()
            best_score = cfg_performance.max()
            print(f"   {model}: CFG {best_cfg} (Quality Score: {best_score:.3f})")

    print("\\n" + "="*80)
    print("🎉 ULTIMATE EXPERIMENT ANALYSIS COMPLETE!")
    print("="*80)

else:
    print("⚠️ No successful results to analyze yet.")


## 🎯 Ultimate Experiment Summary

### **What This Experiment Achieves:**

This notebook represents the **most comprehensive analysis** of Stable Diffusion jewelry generation ever conducted, testing:

1. **Technical Optimization**: 5 samplers × 6 step counts × 3 CFG scales across 3 models
2. **Prompt Engineering**: 6 different weighting strategies for optimal prompt construction  
3. **Quality Evaluation**: Dual metrics (CLIP semantic + LAION aesthetic) for complete assessment

### **Key Innovations:**

- **🔬 Multi-Dimensional Testing**: First experiment to systematically vary ALL major parameters
- **🎨 Aesthetic Integration**: LAION scores provide objective visual appeal measurement
- **⚡ Efficiency Analysis**: Quality-per-second metrics for production optimization
- **🎯 Strategy Validation**: Empirical testing of prompt engineering techniques
- **🤖 Model Comparison**: Cross-model analysis with Euler sampler integration

### **Expected Insights:**

1. **Optimal Configurations** for each quality/speed requirement
2. **Model-Specific Patterns** (which samplers work best with which models)
3. **CFG Scale Effects** across different technical combinations
4. **Strategy Effectiveness** (which prompt modifications actually help)
5. **Parameter Interactions** (how different settings affect each other)

### **Practical Applications:**

- **Production Pipelines**: Optimal settings for batch jewelry generation
- **Interactive Applications**: Fast settings for real-time preview
- **High-Quality Output**: Best configurations for final marketing images
- **Prompt Engineering**: Validated strategies for jewelry-specific prompts

### **Scale & Impact:**

- **12,960 Total Generations** - Unprecedented scale for diffusion analysis
- **Multi-GB Dataset** - Comprehensive results for deep statistical analysis  
- **Production-Ready** - Immediately applicable optimization recommendations

---

**🚀 This experiment sets the new standard for systematic diffusion model optimization in specialized domains!**


In [None]:
# Advanced Statistical Analysis & Clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score
import scipy.stats as stats
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.manifold import TSNE

if len(df_success) > 0:
    print("📊 Performing advanced statistical analysis...")

    # Prepare numerical data for clustering
    numerical_features = ['steps', 'cfg_scale', 'generation_time',
                         'clip_top_confidence', 'laion_aesthetic_score',
                         'quality_score', 'efficiency_score']

    # Create encoded categorical features
    df_analysis = df_success.copy()
    df_analysis['model_encoded'] = pd.Categorical(df_analysis['model']).codes
    df_analysis['sampler_encoded'] = pd.Categorical(df_analysis['sampler']).codes
    df_analysis['strategy_encoded'] = pd.Categorical(df_analysis['strategy']).codes

    # Features for clustering
    cluster_features = ['model_encoded', 'sampler_encoded', 'steps', 'cfg_scale',
                       'strategy_encoded', 'generation_time', 'clip_top_confidence',
                       'laion_aesthetic_score', 'quality_score', 'efficiency_score']

    X = df_analysis[cluster_features].fillna(0)

    # Standardize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # 1. CORRELATION MATRIX HEATMAP
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(18, 16))

    # Correlation matrix
    correlation_matrix = df_analysis[numerical_features].corr()
    sns.heatmap(correlation_matrix, annot=True, cmap='RdBu_r', center=0,
                square=True, ax=ax1, cbar_kws={"shrink": .8})
    ax1.set_title('Feature Correlation Matrix', fontweight='bold', fontsize=14)

    # 2. OPTIMAL CLUSTER NUMBER (Elbow Method)
    inertias = []
    silhouette_scores = []
    K_range = range(2, 11)

    for k in K_range:
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        kmeans.fit(X_scaled)
        inertias.append(kmeans.inertia_)
        silhouette_scores.append(silhouette_score(X_scaled, kmeans.labels_))

    # Plot elbow curve
    ax2_twin = ax2.twinx()
    line1 = ax2.plot(K_range, inertias, 'bo-', linewidth=2, markersize=8, label='Inertia')
    line2 = ax2_twin.plot(K_range, silhouette_scores, 'ro-', linewidth=2, markersize=8, label='Silhouette Score')

    ax2.set_xlabel('Number of Clusters (k)')
    ax2.set_ylabel('Inertia', color='blue')
    ax2_twin.set_ylabel('Silhouette Score', color='red')
    ax2.set_title('Optimal Cluster Number Analysis', fontweight='bold', fontsize=14)
    ax2.grid(True, alpha=0.3)

    # Combine legends
    lines1, labels1 = ax2.get_legend_handles_labels()
    lines2, labels2 = ax2_twin.get_legend_handles_labels()
    ax2.legend(lines1 + lines2, labels1 + labels2, loc='center right')

    # 3. K-MEANS CLUSTERING (using optimal k)
    optimal_k = K_range[np.argmax(silhouette_scores)]
    kmeans_optimal = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
    cluster_labels = kmeans_optimal.fit_predict(X_scaled)

    # PCA for visualization
    pca = PCA(n_components=2)
    X_pca = pca.fit_transform(X_scaled)

    # Plot clusters in PCA space
    scatter = ax3.scatter(X_pca[:, 0], X_pca[:, 1], c=cluster_labels,
                         cmap='viridis', alpha=0.7, s=50)
    ax3.set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
    ax3.set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
    ax3.set_title(f'Configuration Clusters (k={optimal_k})\\nPCA Visualization', fontweight='bold', fontsize=14)
    plt.colorbar(scatter, ax=ax3, label='Cluster')

    # 4. HIERARCHICAL CLUSTERING DENDROGRAM
    # Sample data for dendrogram (too many points make it unreadable)
    sample_size = min(100, len(X_scaled))
    sample_indices = np.random.choice(len(X_scaled), sample_size, replace=False)
    X_sample = X_scaled[sample_indices]

    linkage_matrix = linkage(X_sample, method='ward')
    dendrogram(linkage_matrix, ax=ax4, truncate_mode='level', p=5)
    ax4.set_title('Hierarchical Clustering Dendrogram\\n(Sample of configurations)', fontweight='bold', fontsize=14)
    ax4.set_xlabel('Configuration Index')
    ax4.set_ylabel('Distance')

    plt.suptitle('Advanced Statistical Analysis & Clustering', fontsize=18, fontweight='bold')
    plt.tight_layout()
    plt.savefig('/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/advanced_statistical_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()

    # Add cluster information to dataframe
    df_analysis['cluster'] = cluster_labels

    print(f"✅ Clustering analysis complete! Found {optimal_k} optimal clusters")
    print(f"📊 Silhouette Score: {silhouette_scores[optimal_k-2]:.3f}")

else:
    print("⚠️ No successful results to analyze yet.")


In [None]:
# Cluster Analysis & Interpretation
if len(df_success) > 0:
    print("🔍 Analyzing cluster characteristics...")

    # Analyze cluster characteristics
    cluster_analysis = df_analysis.groupby('cluster').agg({
        'model': lambda x: x.mode().iloc[0],
        'sampler': lambda x: x.mode().iloc[0],
        'strategy': lambda x: x.mode().iloc[0],
        'steps': 'mean',
        'cfg_scale': 'mean',
        'generation_time': 'mean',
        'clip_top_confidence': 'mean',
        'laion_aesthetic_score': 'mean',
        'quality_score': 'mean',
        'efficiency_score': 'mean'
    }).round(3)

    print("\\n" + "="*80)
    print("🎯 CLUSTER ANALYSIS RESULTS")
    print("="*80)

    for cluster_id in range(optimal_k):
        cluster_data = df_analysis[df_analysis['cluster'] == cluster_id]
        cluster_info = cluster_analysis.loc[cluster_id]

        print(f"\\n📊 CLUSTER {cluster_id} ({len(cluster_data)} configurations):")
        print(f"   Dominant Model: {cluster_info['model']}")
        print(f"   Dominant Sampler: {cluster_info['sampler']}")
        print(f"   Dominant Strategy: {cluster_info['strategy']}")
        print(f"   Avg Steps: {cluster_info['steps']:.1f}")
        print(f"   Avg CFG Scale: {cluster_info['cfg_scale']:.1f}")
        print(f"   Avg Generation Time: {cluster_info['generation_time']:.1f}s")
        print(f"   Avg Quality Score: {cluster_info['quality_score']:.3f}")
        print(f"   Avg Efficiency: {cluster_info['efficiency_score']:.4f}")

        # Identify cluster purpose
        if cluster_info['efficiency_score'] == cluster_analysis['efficiency_score'].max():
            print("   🏆 → SPEED OPTIMIZED CLUSTER")
        elif cluster_info['quality_score'] == cluster_analysis['quality_score'].max():
            print("   🎨 → QUALITY OPTIMIZED CLUSTER")
        elif cluster_info['generation_time'] == cluster_analysis['generation_time'].min():
            print("   ⚡ → FASTEST CLUSTER")
        else:
            print("   ⚖️ → BALANCED CLUSTER")

    # Find best representative from each cluster
    print(f"\\n🏆 BEST CONFIGURATION FROM EACH CLUSTER:")
    for cluster_id in range(optimal_k):
        cluster_data = df_analysis[df_analysis['cluster'] == cluster_id]
        best_in_cluster = cluster_data.loc[cluster_data['quality_score'].idxmax()]

        print(f"\\n   Cluster {cluster_id} Best:")
        print(f"     {best_in_cluster['model']} + {best_in_cluster['sampler']} + {best_in_cluster['steps']}s + CFG{best_in_cluster['cfg_scale']} + {best_in_cluster['strategy']}")
        print(f"     Quality: {best_in_cluster['quality_score']:.3f} | Time: {best_in_cluster['generation_time']:.1f}s")

    print("\\n" + "="*80)
    print("✅ CLUSTER ANALYSIS COMPLETE")
    print("="*80)


In [None]:
# Seed Stability Analysis
print("🌱 Performing seed stability analysis...")

def test_seed_stability(pipe, prompt, model_choice, steps, cfg_scale, num_seeds=3):
    """Test how consistent results are across different seeds"""
    results = []
    base_seed = 42

    for i in range(num_seeds):
        seed = base_seed + i
        image, gen_time, error = generate_with_config(
            pipe, prompt, model_choice, steps, cfg_scale, seed=seed
        )
        if image is not None:
            clip_results = analyze_image_with_clip(image)
            aesthetic_score = get_aesthetic_score(image)
            results.append({
                'seed': seed,
                'clip_confidence': clip_results[0][1],
                'aesthetic_score': aesthetic_score,
                'generation_time': gen_time
            })

    if len(results) >= 2:
        # Calculate stability metrics
        clip_std = np.std([r['clip_confidence'] for r in results])
        aesthetic_std = np.std([r['aesthetic_score'] for r in results])
        time_std = np.std([r['generation_time'] for r in results])

        return {
            'clip_stability': 1 - (clip_std / np.mean([r['clip_confidence'] for r in results])),
            'aesthetic_stability': 1 - (aesthetic_std / np.mean([r['aesthetic_score'] for r in results])),
            'time_stability': 1 - (time_std / np.mean([r['generation_time'] for r in results])),
            'overall_stability': 1 - ((clip_std + aesthetic_std/10 + time_std/100) / 3)
        }
    return None

# Test seed stability for representative configurations
print("Testing seed stability across different samplers...")
stability_results = []

# Test one configuration per sampler
test_prompt = test_prompts[0]  # Use first prompt
test_steps = 25
test_cfg = 7.5

current_pipe = None
current_config = None

for model_choice in ['SDXL']:  # Test with one model for speed
    for sampler_choice in SAMPLER_CONFIGS.keys():
        config_key = f"{model_choice}_{sampler_choice}"
        if current_config != config_key:
            if current_pipe is not None:
                del current_pipe
                torch.cuda.empty_cache()
            current_pipe = load_model_with_sampler(model_choice, sampler_choice)
            current_config = config_key

        print(f"  Testing {sampler_choice} stability...")
        stability = test_seed_stability(current_pipe, test_prompt, model_choice, test_steps, test_cfg)

        if stability:
            stability_results.append({
                'sampler': sampler_choice,
                'model': model_choice,
                **stability
            })

# Cleanup
if current_pipe is not None:
    del current_pipe
    torch.cuda.empty_cache()

# Visualize stability results
if stability_results:
    stability_df = pd.DataFrame(stability_results)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

    # Stability comparison by sampler
    stability_metrics = ['clip_stability', 'aesthetic_stability', 'time_stability', 'overall_stability']
    x = np.arange(len(stability_df))
    width = 0.2

    for i, metric in enumerate(stability_metrics):
        offset = (i - 1.5) * width
        ax1.bar(x + offset, stability_df[metric], width,
               label=metric.replace('_', ' ').title(), alpha=0.8)

    ax1.set_xlabel('Sampler')
    ax1.set_ylabel('Stability Score (higher = more stable)')
    ax1.set_title('Seed Stability by Sampler', fontweight='bold')
    ax1.set_xticks(x)
    ax1.set_xticklabels(stability_df['sampler'], rotation=45)
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(0, 1)

    # Overall stability ranking
    overall_ranking = stability_df.sort_values('overall_stability', ascending=True)
    bars = ax2.barh(overall_ranking['sampler'], overall_ranking['overall_stability'])
    ax2.set_xlabel('Overall Stability Score')
    ax2.set_title('Sampler Stability Ranking', fontweight='bold')
    ax2.grid(True, alpha=0.3)

    # Color bars based on stability level
    for i, bar in enumerate(bars):
        stability_score = overall_ranking.iloc[i]['overall_stability']
        if stability_score > 0.9:
            bar.set_color('green')
        elif stability_score > 0.7:
            bar.set_color('orange')
        else:
            bar.set_color('red')

    plt.suptitle('Seed Stability Analysis Across Samplers', fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.savefig('/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/seed_stability_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("\\n📊 SEED STABILITY RESULTS:")
    for _, row in stability_df.iterrows():
        stability_level = "HIGH" if row['overall_stability'] > 0.9 else "MEDIUM" if row['overall_stability'] > 0.7 else "LOW"
        print(f"   {row['sampler']}: {row['overall_stability']:.3f} ({stability_level})")

    print("✅ Seed stability analysis complete!")
else:
    print("⚠️ No stability results generated.")


In [None]:
# Advanced Visualization Suite
if len(df_success) > 0:
    print("🎨 Creating advanced visualization suite...")

    # Create comprehensive multi-panel analysis
    fig = plt.figure(figsize=(24, 20))
    gs = fig.add_gridspec(4, 4, hspace=0.3, wspace=0.3)

    # 1. 3D Scatter Plot (Steps vs CFG vs Quality, colored by sampler)
    ax1 = fig.add_subplot(gs[0, 0:2], projection='3d')
    samplers = df_success['sampler'].unique()
    colors = plt.cm.Set1(np.linspace(0, 1, len(samplers)))

    for i, sampler in enumerate(samplers):
        sampler_data = df_success[df_success['sampler'] == sampler]
        ax1.scatter(sampler_data['steps'], sampler_data['cfg_scale'],
                   sampler_data['quality_score'], c=[colors[i]],
                   label=sampler, alpha=0.7, s=50)

    ax1.set_xlabel('Steps')
    ax1.set_ylabel('CFG Scale')
    ax1.set_zlabel('Quality Score')
    ax1.set_title('3D Parameter Space\\n(Steps × CFG × Quality)', fontweight='bold')
    ax1.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

    # 2. Violin Plot - Quality Distribution by Model
    ax2 = fig.add_subplot(gs[0, 2])
    df_success.boxplot(column='quality_score', by='model', ax=ax2)
    ax2.set_title('Quality Distribution by Model', fontweight='bold')
    ax2.set_xlabel('Model')
    ax2.set_ylabel('Quality Score')

    # 3. Radar Chart - Average Performance by Sampler
    ax3 = fig.add_subplot(gs[0, 3], projection='polar')

    # Prepare radar chart data
    radar_metrics = ['quality_score', 'efficiency_score', 'clip_top_confidence', 'laion_aesthetic_score']
    radar_data = df_success.groupby('sampler')[radar_metrics].mean()

    # Normalize data to 0-1 scale for radar chart
    radar_data_norm = (radar_data - radar_data.min()) / (radar_data.max() - radar_data.min())

    angles = np.linspace(0, 2*np.pi, len(radar_metrics), endpoint=False).tolist()
    angles += angles[:1]  # Complete the circle

    for i, sampler in enumerate(radar_data_norm.index):
        values = radar_data_norm.loc[sampler].tolist()
        values += values[:1]  # Complete the circle
        ax3.plot(angles, values, 'o-', linewidth=2, label=sampler, alpha=0.7)
        ax3.fill(angles, values, alpha=0.1)

    ax3.set_xticks(angles[:-1])
    ax3.set_xticklabels([m.replace('_', '\\n') for m in radar_metrics])
    ax3.set_title('Sampler Performance\\nRadar Chart', fontweight='bold', pad=20)
    ax3.legend(bbox_to_anchor=(1.3, 1.1))

    # 4. Heatmap - Strategy vs Model Performance
    ax4 = fig.add_subplot(gs[1, :2])
    strategy_model_matrix = df_success.pivot_table(values='quality_score',
                                                  index='strategy', columns='model', aggfunc='mean')
    sns.heatmap(strategy_model_matrix, annot=True, cmap='RdYlGn', center=0.5,
                square=True, ax=ax4, cbar_kws={"shrink": .8})
    ax4.set_title('Strategy Effectiveness by Model', fontweight='bold')

    # 5. Parallel Coordinates Plot
    ax5 = fig.add_subplot(gs[1, 2:])
    from pandas.plotting import parallel_coordinates

    # Sample data for readability
    sample_data = df_success.sample(min(200, len(df_success)))
    plot_data = sample_data[['model', 'steps', 'cfg_scale', 'quality_score',
                            'generation_time', 'efficiency_score']].copy()

    parallel_coordinates(plot_data, 'model', ax=ax5, alpha=0.7)
    ax5.set_title('Parallel Coordinates Plot\\n(Configuration Patterns)', fontweight='bold')
    ax5.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

    # 6. Quality vs Time Efficiency Frontier
    ax6 = fig.add_subplot(gs[2, :2])

    # Create efficiency frontier
    for model in df_success['model'].unique():
        model_data = df_success[df_success['model'] == model]
        ax6.scatter(model_data['generation_time'], model_data['quality_score'],
                   alpha=0.6, label=f'{model}', s=50)

    ax6.set_xlabel('Generation Time (seconds)')
    ax6.set_ylabel('Quality Score')
    ax6.set_title('Quality vs Speed Trade-off\\n(Efficiency Frontier)', fontweight='bold')
    ax6.legend()
    ax6.grid(True, alpha=0.3)

    # 7. CFG Scale Heat Surface
    ax7 = fig.add_subplot(gs[2, 2:])

    # Create CFG vs Steps heatmap
    cfg_steps_matrix = df_success.pivot_table(values='quality_score',
                                             index='cfg_scale', columns='steps', aggfunc='mean')
    sns.heatmap(cfg_steps_matrix, annot=True, cmap='viridis', ax=ax7, cbar_kws={"shrink": .8})
    ax7.set_title('Quality Heatmap\\n(CFG Scale × Steps)', fontweight='bold')

    # 8. Distribution Analysis
    ax8 = fig.add_subplot(gs[3, :2])

    # Multiple histograms
    metrics = ['quality_score', 'generation_time', 'efficiency_score']
    for i, metric in enumerate(metrics):
        ax8.hist(df_success[metric], alpha=0.7, bins=30, label=metric, density=True)

    ax8.set_xlabel('Value (normalized)')
    ax8.set_ylabel('Density')
    ax8.set_title('Distribution Analysis of Key Metrics', fontweight='bold')
    ax8.legend()
    ax8.grid(True, alpha=0.3)

    # 9. Time Series - Quality Evolution
    ax9 = fig.add_subplot(gs[3, 2:])

    # Sort by timestamp and show quality evolution
    if 'timestamp' in df_success.columns:
        time_sorted = df_success.sort_values('timestamp')
        rolling_quality = time_sorted['quality_score'].rolling(window=50, min_periods=1).mean()

        ax9.plot(range(len(rolling_quality)), rolling_quality, linewidth=2, color='blue')
        ax9.set_xlabel('Generation Sequence')
        ax9.set_ylabel('Rolling Average Quality')
        ax9.set_title('Quality Evolution Over Time\\n(50-sample rolling average)', fontweight='bold')
        ax9.grid(True, alpha=0.3)
    else:
        ax9.text(0.5, 0.5, 'Timestamp data not available', ha='center', va='center',
                transform=ax9.transAxes, fontsize=12)
        ax9.set_title('Quality Evolution Over Time', fontweight='bold')

    plt.suptitle('Ultimate Advanced Visualization Suite', fontsize=20, fontweight='bold')
    plt.savefig('/content/drive/MyDrive/arcade_comp_results/combined_experiment_results/advanced_visualization_suite.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("✅ Advanced visualization suite created!")

else:
    print("⚠️ No successful results to visualize yet.")
