# 💎 Jewelry Generation: SDXL vs SD1.5 Model Comparison

## Comprehensive Experiment Design

This notebook tests **three enhancement methods** across **two models** for jewelry image generation:

### 🔬 **Enhancement Methods:**
1. **Baseline** - No enhancements
2. **Compel Weighting** - Using `++` syntax for term emphasis  
3. **Word Replacement** - Enhanced jewelry terminology (channel-set → "channel-set groove set gems")

### 🤖 **Models Tested:**
- **SDXL** (`stabilityai/stable-diffusion-xl-base-1.0`)
- **SD1.5** (`runwayml/stable-diffusion-v1-5`)

### 📊 **Evaluation:**
- **Visual comparison** - Side-by-side image analysis
- **CLIP similarity** - Quantitative prompt adherence scoring
- **CSV export** - All prompt variations for analysis

### 🎯 **Research Questions:**
1. Which enhancement method works best?
2. Which model responds better to enhancements?
3. Can we improve specific issues like engraved letter visibility?

---


## 🔧 Setup & Dependencies


In [None]:
!pip install compel
!pip install open-clip-torch

In [None]:
# Install dependencies (uncomment for Colab)
# %pip install torch torchvision diffusers transformers accelerate compel pillow matplotlib open-clip-torch pandas

import torch
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

# Device setup
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"🖥️  Device: {device}")
if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Create output directories
os.makedirs("model_comparison_results", exist_ok=True)
os.makedirs("model_comparison_results/sdxl", exist_ok=True)
os.makedirs("model_comparison_results/sd15", exist_ok=True)
print("✅ Setup complete!")


## 📝 Test Prompts & Enhancement Functions


In [None]:
# Define the 8 test prompts from the assignment
test_prompts = [
    "channel-set diamond eternity band, 2 mm width, hammered 18k yellow gold, product-only white background",
    "14k rose-gold threader earrings, bezel-set round lab diamond ends, lifestyle macro shot, soft natural light",
    "organic cluster ring with mixed-cut sapphires and diamonds, brushed platinum finish, modern aesthetic",
    "A solid gold cuff bracelet with blue sapphire, with refined simplicity and intentionally crafted for everyday wear",
    "modern signet ring, oval face, engraved gothic initial 'M', high-polish sterling silver, subtle reflection",
    "delicate gold huggie hoops, contemporary styling, isolated on neutral background",
    "stack of three slim rings: twisted gold, plain platinum, black rhodium pavé, editorial lighting",
    "bypass ring with stones on it, with refined simplicity and intentionally crafted for everyday wear"
]

def create_compel_enhanced_prompt(prompt):
    """Add ++ weighting to critical jewelry terms for Compel"""
    critical_terms = {
        "channel-set": "channel-set++",
        "threader": "threader++", 
        "bezel-set": "bezel-set++",
        "eternity band": "eternity band++",
        "huggie": "huggie++",
        "bypass": "bypass++",
        "pavé": "pavé++",
        "signet": "signet++",
        "cuff": "cuff++",
        "cluster": "cluster++",
        "diamond": "diamond++",
        "sapphire": "sapphire++",
        "gold": "gold++",
        "platinum": "platinum++",
        "engraved": "engraved++",
        "initial": "initial++",
        "'M'": "'M'++"
    }
    
    enhanced = prompt
    for term, weighted in critical_terms.items():
        if term in prompt.lower():
            enhanced = enhanced.replace(term, weighted)
    return enhanced

def create_word_replacement_enhanced_prompt(prompt):
    """Enhanced jewelry terminology (from your existing pipeline)"""
    jewelry_terms = {
        "channel-set": "channel-set (parallel groove gemstones)",
        "threader": "threader (thread-through earring)",
        "bezel-set": "bezel-set (rim-enclosed gemstone)",
        "eternity band": "eternity band (full-band gemstones)",
        "huggie": "huggie (small close hoop)",
        "bypass": "bypass (overlapping band ring)",
        "pavé": "pavé (small set stones)",
        "signet": "signet (flat engraved ring)",
        "cuff": "cuff (open bracelet)",
        "cluster": "cluster (grouped gemstones)"
    }
    
    enhanced = prompt
    for term, description in jewelry_terms.items():
        if term in prompt.lower():
            enhanced = enhanced.replace(term, description)
    
    # Add modern aesthetic terms
    enhanced += ", high-end jewelry, luxury craftsmanship, premium materials"
    return enhanced

# Create all prompt variations
compel_prompts = [create_compel_enhanced_prompt(p) for p in test_prompts]
word_replacement_prompts = [create_word_replacement_enhanced_prompt(p) for p in test_prompts]

# Common negative prompt
negative_prompt = "vintage, ornate, fussy, cheap, low quality, blurry, deformed, ugly"

print(f"✅ {len(test_prompts)} test prompts prepared with 3 enhancement variations each")
print(f"📊 Total prompt combinations: {len(test_prompts)} × 3 methods × 2 models = {len(test_prompts) * 3 * 2} generations")

# Display first prompt as example
print(f"\n📝 Example (Prompt 1):")
print(f"Original:        {test_prompts[0]}")
print(f"Compel:          {compel_prompts[0]}")
print(f"Word Enhanced:   {word_replacement_prompts[0][:100]}...")
print(f"Negative:        {negative_prompt}")


## 🤖 Model Loading & Generation Functions


In [None]:
from compel import Compel, ReturnedEmbeddingsType
from diffusers import StableDiffusionXLPipeline, StableDiffusionPipeline

def load_model(model_name):
    """Load either SDXL or SD1.5 with corresponding Compel instance"""
    if model_name == "SDXL":
        model_id = "stabilityai/stable-diffusion-xl-base-1.0"
        pipe = StableDiffusionXLPipeline.from_pretrained(
            model_id, variant="fp16", use_safetensors=True, torch_dtype=torch.float16
        ).to(device)
        compel_inst = Compel(
            tokenizer=[pipe.tokenizer, pipe.tokenizer_2],
            text_encoder=[pipe.text_encoder, pipe.text_encoder_2],
            returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
            requires_pooled=[False, True],
        )
        return pipe, compel_inst, True  # True = is_sdxl
    else:  # SD1.5
        model_id = "runwayml/stable-diffusion-v1-5"
        pipe = StableDiffusionPipeline.from_pretrained(
            model_id, torch_dtype=torch.float16 if device=="cuda" else torch.float32
        ).to(device)
        compel_inst = Compel(
            tokenizer=pipe.tokenizer,
            text_encoder=pipe.text_encoder,
            returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
            requires_pooled=False,
        )
        return pipe, compel_inst, False  # False = not_sdxl

def generate_image(pipe, compel_inst, is_sdxl, prompt, method="baseline", seed=42):
    """Generate image with specified method: baseline, compel, or word_replacement"""
    
    # Set optimal parameters for each model
    steps = 30
    cfg = 5.0 if is_sdxl else 7.5
    w, h = (1024, 1024) if is_sdxl else (768, 768)
    
    generator = torch.Generator(device=device).manual_seed(seed)
    
    if method == "baseline":
        # Standard generation
        image = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=steps,
            guidance_scale=cfg,
            width=w, height=h,
            generator=generator
        ).images[0]
        
    elif method == "compel":
        # Compel-enhanced generation
        if is_sdxl:
            # SDXL: dual encoders
            cond, pooled = compel_inst([prompt, negative_prompt])
            image = pipe(
                prompt_embeds=cond[0:1], 
                pooled_prompt_embeds=pooled[0:1],
                negative_prompt_embeds=cond[1:2], 
                negative_pooled_prompt_embeds=pooled[1:2],
                num_inference_steps=steps,
                guidance_scale=cfg,
                width=w, height=h,
                generator=generator
            ).images[0]
        else:
            # SD1.5: single encoder
            pos_cond = compel_inst.build_conditioning_tensor(prompt)
            neg_cond = compel_inst.build_conditioning_tensor(negative_prompt)
            image = pipe(
                prompt_embeds=pos_cond,
                negative_prompt_embeds=neg_cond,
                num_inference_steps=steps,
                guidance_scale=cfg,
                width=w, height=h,
                generator=generator
            ).images[0]
            
    elif method == "word_replacement":
        # Standard generation with enhanced prompt
        image = pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=steps,
            guidance_scale=cfg,
            width=w, height=h,
            generator=generator
        ).images[0]
    
    return image

print("✅ Model loading and generation functions ready!")


## 🧪 Run Full Experiment

This will generate images for all combinations:
- **2 models** × **8 prompts** × **3 methods** = **48 total images**
- Estimated time: 20-40 minutes depending on GPU


In [None]:
# Run comprehensive experiment
models_to_test = ["SDXL", "SD15"]
methods = ["baseline", "compel", "word_replacement"]

# Store all results for analysis
all_results = []

print("🚀 Starting comprehensive jewelry generation experiment...")
print(f"⏱️  Estimated time: {len(models_to_test) * len(test_prompts) * len(methods) * 2} minutes")

for model_name in models_to_test:
    print(f"\n🤖 Loading {model_name}...")
    pipe, compel_inst, is_sdxl = load_model(model_name)
    
    for prompt_idx, base_prompt in enumerate(test_prompts, 1):
        print(f"\n📝 Prompt {prompt_idx}/8: {base_prompt[:50]}...")
        
        # Get prompt variations
        prompts = {
            "baseline": base_prompt,
            "compel": compel_prompts[prompt_idx-1],
            "word_replacement": word_replacement_prompts[prompt_idx-1]
        }
        
        for method in methods:
            try:
                print(f"  🎨 Generating {method}...")
                
                # Generate image
                image = generate_image(
                    pipe, compel_inst, is_sdxl, 
                    prompts[method], method, 
                    seed=100 + prompt_idx
                )
                
                # Save image
                filename = f"{model_name.lower()}/p{prompt_idx:02d}_{method}.png"
                filepath = f"model_comparison_results/{filename}"
                image.save(filepath)
                
                # Store result for analysis
                result = {
                    'model': model_name,
                    'prompt_id': prompt_idx,
                    'method': method,
                    'original_prompt': base_prompt,
                    'used_prompt': prompts[method],
                    'image_path': filepath,
                    'image': image
                }
                all_results.append(result)
                
                print(f"    ✅ Saved: {filename}")
                
            except Exception as e:
                print(f"    ❌ Error in {method}: {e}")
    
    # Clear GPU memory
    del pipe, compel_inst
    torch.cuda.empty_cache()
    print(f"✅ {model_name} completed!")

print(f"\n🎉 Experiment completed! Generated {len(all_results)} images")
print(f"📁 Results saved in: model_comparison_results/")
print(f"📊 Ready for analysis and comparison!")


## 📊 Export Results to CSV


In [None]:
# Create comprehensive CSV with all prompt variations
def export_results_to_csv():
    """Export all prompt variations and results to CSV for analysis"""
    
    # Create data for CSV
    csv_data = []
    
    for prompt_idx, base_prompt in enumerate(test_prompts, 1):
        compel_prompt = compel_prompts[prompt_idx-1]
        word_prompt = word_replacement_prompts[prompt_idx-1]
        
        # Find generated results for this prompt
        prompt_results = [r for r in all_results if r['prompt_id'] == prompt_idx]
        
        # Group by model and method
        sdxl_results = {r['method']: r['image_path'] for r in prompt_results if r['model'] == 'SDXL'}
        sd15_results = {r['method']: r['image_path'] for r in prompt_results if r['model'] == 'SD15'}
        
        row = {
            'prompt_id': prompt_idx,
            'original_prompt': base_prompt,
            'compel_enhanced_prompt': compel_prompt,
            'word_replacement_enhanced_prompt': word_prompt,
            'negative_prompt': negative_prompt,
            
            # SDXL file paths
            'sdxl_baseline_path': sdxl_results.get('baseline', ''),
            'sdxl_compel_path': sdxl_results.get('compel', ''),
            'sdxl_word_replacement_path': sdxl_results.get('word_replacement', ''),
            
            # SD15 file paths  
            'sd15_baseline_path': sd15_results.get('baseline', ''),
            'sd15_compel_path': sd15_results.get('compel', ''),
            'sd15_word_replacement_path': sd15_results.get('word_replacement', ''),
            
            # Enhancement analysis
            'compel_changes': ', '.join([f'{k}++' for k in ['channel-set', 'diamond', 'gold', 'platinum', 'engraved', 'signet'] if k in base_prompt.lower()]),
            'word_replacements': ', '.join([k for k in ['channel-set', 'threader', 'bezel-set', 'huggie', 'cuff'] if k in base_prompt.lower()])
        }
        csv_data.append(row)
    
    # Create DataFrame and save
    df = pd.DataFrame(csv_data)
    csv_path = "model_comparison_results/comprehensive_results.csv"
    df.to_csv(csv_path, index=False)
    
    # Display summary
    print("📊 CSV Export Summary:")
    print("=" * 60)
    print(f"💾 Saved to: {csv_path}")
    print(f"📋 Total prompts: {len(df)}")
    print(f"🏛️ Columns: {len(df.columns)}")
    
    print(f"\n📝 Column breakdown:")
    prompt_cols = [c for c in df.columns if 'prompt' in c]
    path_cols = [c for c in df.columns if 'path' in c]
    analysis_cols = [c for c in df.columns if c in ['compel_changes', 'word_replacements']]
    
    print(f"  Prompts: {len(prompt_cols)} ({', '.join(prompt_cols)})")
    print(f"  Paths: {len(path_cols)} (sdxl/sd15 × baseline/compel/word_replacement)")
    print(f"  Analysis: {len(analysis_cols)} (enhancement tracking)")
    
    # Show sample data
    print(f"\n📋 Sample data (first 2 rows):")
    display_cols = ['prompt_id', 'original_prompt', 'compel_enhanced_prompt', 'sdxl_baseline_path']
    print(df[display_cols].head(2).to_string(max_colwidth=50))
    
    return df

# Export to CSV
if 'all_results' in locals() and all_results:
    results_df = export_results_to_csv()
    print(f"\n✅ CSV export completed with {len(results_df)} prompt variations!")
else:
    print("⚠️  No results to export - run the experiment first!")


## 📈 Comprehensive Evaluation: CLIP + LAION Aesthetic Scoring

This section provides quantitative evaluation using two complementary metrics:
- **CLIP Similarity** - Measures prompt adherence (how well the image matches the text)
- **LAION Aesthetic Score** - Measures visual quality and aesthetic appeal

The combination gives us both technical accuracy and aesthetic quality assessment.

In [None]:
# Comprehensive evaluation: CLIP similarity + LAION aesthetic scoring
try:
    import open_clip
    from transformers import pipeline
    
    print("📊 Loading evaluation models...")
    
    # CLIP model for prompt adherence
    print("  🔍 Loading CLIP model...")
    clip_model, _, clip_preprocess = open_clip.create_model_and_transforms('ViT-B-32', pretrained='openai')
    clip_model = clip_model.to(device).eval()
    clip_tokenizer = open_clip.get_tokenizer('ViT-B-32')
    
    # LAION aesthetic predictor for image quality/aesthetics
    print("  🎨 Loading LAION aesthetic model...")
    try:
        aesthetic_model = pipeline(
            "image-classification", 
            model="cafeai/cafe_aesthetic", 
            device=0 if device == "cuda" else -1
        )
        aesthetic_available = True
        print("    ✅ LAION aesthetic model loaded")
    except Exception as e:
        print(f"    ⚠️ LAION aesthetic model failed to load: {e}")
        print("    📝 Continuing with CLIP evaluation only")
        aesthetic_available = False
    
    def calculate_clip_similarity(image, text):
        """Calculate CLIP similarity between image and text (prompt adherence)"""
        with torch.no_grad():
            image_input = clip_preprocess(image).unsqueeze(0).to(device)
            text_input = clip_tokenizer([text]).to(device)
            
            image_features = clip_model.encode_image(image_input)
            text_features = clip_model.encode_text(text_input)
            
            image_features = image_features / image_features.norm(dim=-1, keepdim=True)
            text_features = text_features / text_features.norm(dim=-1, keepdim=True)
            
            similarity = (image_features @ text_features.T).squeeze().item()
            return float(similarity)
    
    def calculate_aesthetic_score(image):
        """Calculate LAION aesthetic score (image quality/beauty)"""
        if not aesthetic_available:
            return None
            
        try:
            # Get aesthetic predictions
            predictions = aesthetic_model(image)
            
            # Extract aesthetic score (higher = better aesthetics)
            # The cafe_aesthetic model typically returns aesthetic/not_aesthetic scores
            for pred in predictions:
                if 'aesthetic' in pred['label'].lower() and 'not' not in pred['label'].lower():
                    return float(pred['score'])
            
            # Fallback: return highest score if structure is different
            return float(max(predictions, key=lambda x: x['score'])['score'])
            
        except Exception as e:
            print(f"    ⚠️ Aesthetic scoring failed: {e}")
            return None
    
    # Evaluate all results
    if 'all_results' in locals() and all_results:
        print("\n📊 Comprehensive Evaluation Results:")
        print("=" * 80)
        
        # Group results by model and method
        evaluation_summary = {}
        aesthetic_summary = {}
        
        for result in all_results:
            model = result['model']
            method = result['method']
            key = f"{model}_{method}"
            
            if key not in evaluation_summary:
                evaluation_summary[key] = []
                aesthetic_summary[key] = []
            
            # Calculate CLIP similarity with original prompt
            similarity = calculate_clip_similarity(result['image'], result['original_prompt'])
            evaluation_summary[key].append(similarity)
            
            # Calculate aesthetic score
            aesthetic_score = calculate_aesthetic_score(result['image'])
            if aesthetic_score is not None:
                aesthetic_summary[key].append(aesthetic_score)
        
        # Display CLIP results
        print(f"\n🔍 CLIP Similarity (Prompt Adherence):")
        print(f"{'Model':<8} {'Method':<15} {'Avg CLIP':<10} {'Std':<8} {'Samples':<8}")
        print("-" * 60)
        
        for key, scores in evaluation_summary.items():
            model, method = key.split('_', 1)
            avg_score = np.mean(scores)
            std_score = np.std(scores)
            
            print(f"{model:<8} {method:<15} {avg_score:.3f}     {std_score:.3f}   {len(scores)}")
        
        # Display aesthetic results if available
        if aesthetic_available and any(aesthetic_summary.values()):
            print(f"\n🎨 LAION Aesthetic Scores (Visual Quality):")
            print(f"{'Model':<8} {'Method':<15} {'Avg Aesthetic':<13} {'Std':<8} {'Samples':<8}")
            print("-" * 65)
            
            for key, scores in aesthetic_summary.items():
                if scores:  # Only show if we have scores
                    model, method = key.split('_', 1)
                    avg_score = np.mean(scores)
                    std_score = np.std(scores)
                    
                    print(f"{model:<8} {method:<15} {avg_score:.3f}        {std_score:.3f}   {len(scores)}")
        
        # Compare methods within each model
        print(f"\n📈 Model Comparison:")
        for model in ['SDXL', 'SD15']:
            print(f"\n{model}:")
            baseline_scores = evaluation_summary.get(f'{model}_baseline', [])
            compel_scores = evaluation_summary.get(f'{model}_compel', [])
            word_scores = evaluation_summary.get(f'{model}_word_replacement', [])
            
            # CLIP improvements
            if baseline_scores and compel_scores:
                compel_improvement = np.mean(compel_scores) - np.mean(baseline_scores)
                print(f"  📊 CLIP - Compel vs Baseline: {compel_improvement:+.3f}")
            
            if baseline_scores and word_scores:
                word_improvement = np.mean(word_scores) - np.mean(baseline_scores)
                print(f"  📊 CLIP - Word Enhancement vs Baseline: {word_improvement:+.3f}")
            
            # Aesthetic improvements if available
            if aesthetic_available:
                baseline_aes = aesthetic_summary.get(f'{model}_baseline', [])
                compel_aes = aesthetic_summary.get(f'{model}_compel', [])
                word_aes = aesthetic_summary.get(f'{model}_word_replacement', [])
                
                if baseline_aes and compel_aes:
                    aes_improvement = np.mean(compel_aes) - np.mean(baseline_aes)
                    print(f"  🎨 Aesthetic - Compel vs Baseline: {aes_improvement:+.3f}")
                
                if baseline_aes and word_aes:
                    aes_improvement = np.mean(word_aes) - np.mean(baseline_aes)
                    print(f"  🎨 Aesthetic - Word Enhancement vs Baseline: {aes_improvement:+.3f}")
        
        print("\n✅ Comprehensive evaluation completed!")
        
    else:
        print("⚠️  No results to evaluate - run the experiment first!")
        
except ImportError:
    print("⚠️  Required packages not installed - skipping evaluation")
    print("   Install with: pip install open-clip-torch transformers")
except Exception as e:
    print(f"❌ Error in evaluation: {e}")

## 📋 Summary & Next Steps

### 📁 **Generated Files:**
- `model_comparison_results/sdxl/p01_baseline.png` - SDXL baseline images
- `model_comparison_results/sdxl/p01_compel.png` - SDXL Compel-enhanced images  
- `model_comparison_results/sdxl/p01_word_replacement.png` - SDXL word-enhanced images
- `model_comparison_results/sd15/p01_baseline.png` - SD1.5 baseline images
- `model_comparison_results/sd15/p01_compel.png` - SD1.5 Compel-enhanced images
- `model_comparison_results/sd15/p01_word_replacement.png` - SD1.5 word-enhanced images  
- `model_comparison_results/comprehensive_results.csv` - **Complete analysis CSV**

### 🔍 **Analysis Questions:**
1. **Which model performs better overall?** (SDXL vs SD1.5)
2. **Which enhancement method is most effective?** (Compel vs Word Replacement)
3. **Are there specific jewelry terms that benefit more from certain enhancements?**
4. **Does the engraved 'M' visibility improve with any method?**

### 🎯 **Key Findings to Look For:**
- **Visual quality differences** between models and methods
- **LAION aesthetic scores** measuring visual quality and appeal
- **CLIP similarity improvements** indicating better prompt adherence
- **Specific jewelry features** rendered more accurately
- **Style consistency** (modern vs vintage aesthetic)

### 💡 **Next Steps:**
1. **Visual inspection** - Compare generated images side by side
2. **CSV analysis** - Use spreadsheet tools for systematic comparison  
3. **CLIP scores** - Review quantitative improvements
4. **Method selection** - Choose best-performing combination for production
5. **Integration** - Implement winning approach in main pipeline

---

**🏆 Goal**: Determine the optimal model + enhancement combination for high-quality jewelry image generation!
