# Image Quality Control Demo - PROPER WORKFLOW

This notebook demonstrates the **correct workflow** for the MorphSeq image quality control system:

## 📋 Proper QC Workflow

### 1. **Initialize** - Populate QC CSV from Metadata
First, we create entries for ALL images from the experiment metadata with empty QC flags.

### 2. **Manual QC** - Human Review First
Human experts review and flag problematic images they can identify.

### 3. **Automatic QC** - Algorithmic Processing  
Computational methods flag remaining images using blur detection, brightness checks, etc.

### 4. **Done** - Complete QC Dataset
All images now have quality assessments from either human review or automated analysis.

## 🎯 Key Functions

- **`initialize_qc_file()`** - Creates QC CSV from experiment metadata
- **`manual_qc()`** - For human review and flagging
- **`auto_qc()`** - For algorithmic quality checks
- **`flag_qc()`** - Low-level flagging function

## Why This Order Matters?
1. **Initialize** ensures all images are tracked in the QC system
2. **Manual first** captures expert knowledge for complex quality issues
3. **Automatic second** efficiently processes remaining images at scale
4. **Complete coverage** ensures no image is missed in the pipeline

## QC Data Structure
All QC data is stored in: `data/quality_control/image_quality_qc.csv`
- `experiment_id`: Date or experiment identifier
- `video_id`: Full video identifier (experiment_id + well)  
- `image_id`: Unique image identifier
- `qc_flag`: Quality control flag (PASS, BLUR, DARK, etc.)
- `notes`: Optional description of the QC decision
- `annotator`: Who made the QC decision (person name or "automatic")

In [None]:
# Setup - Import QC functions 🎯
import sys
import os
import pandas as pd
import json
from pathlib import Path

# Robust path detection for imports (works from any project directory)
current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

utils_qc_path = None
if current_dir.name == "segmentation_sandbox":
    utils_qc_path = current_dir / "utils" / "image_quality_qc_utils"
elif current_dir.name == "test":
    utils_qc_path = current_dir.parent
else:
    for parent in current_dir.parents:
        if parent.name == "segmentation_sandbox":
            utils_qc_path = parent / "utils" / "image_quality_qc_utils"
            break

if utils_qc_path and utils_qc_path.exists():
    sys.path.insert(0, str(utils_qc_path))
    print(f"Added to path: {utils_qc_path}")
else:
    print("Warning: Could not find utils path. You may need to adjust the path manually.")

# Import all QC functions
from image_quality_qc_utils import (
    initialize_qc_file,  # Step 1: Initialize from metadata
    load_qc_data, 
    save_qc_data,
    flag_qc,             # Core flagging function
    get_qc_csv_path,
    QC_FLAGS
)

from image_quality_qc import manual_qc, auto_qc  # Step 2&3: Convenience functions

print("✅ Successfully imported QC functions")
print(f"Available QC flags: {list(QC_FLAGS.keys())}")

Current directory: /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox
✓ Added to path: /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/utils/image_quality_qc_utils
🎉 Successfully imported QC convenience functions!
📋 Available QC flags: ['PASS', 'BLUR', 'DARK', 'BRIGHT', 'LOW_CONTRAST', 'CORRUPT', 'ARTIFACT', 'OUT_OF_FOCUS', 'DEBRIS', 'FAIL']
🎯 FOCUS: This demo highlights manual_qc() and auto_qc()


## 🚀 Step 1: Initialize QC System from Metadata

**This is the CRITICAL first step** - we populate the QC CSV with ALL images from the experiment metadata file. This ensures every image in the dataset is tracked in the QC system.

Without this step, the QC system doesn't know what images exist!

In [None]:
# Step 1: Initialize QC CSV from experiment metadata 📋

# Set up paths (adjust these for your actual data)
data_dir = Path("/net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/raw_data_organized")
metadata_path = data_dir / "experiment_metadata.json"

print(f"Data directory: {data_dir}")
print(f"Metadata path: {metadata_path}")
print(f"Metadata exists: {metadata_path.exists()}")

# Initialize QC file from metadata
# This creates entries for ALL images with empty QC flags
if metadata_path.exists():
    print("\n🔄 Initializing QC file from experiment metadata...")
    qc_df = initialize_qc_file(
        data_dir=data_dir,
        experiment_metadata_path=metadata_path,
        overwrite=False  # Set to True to recreate the file
    )
    print(f"✅ QC file initialized with {len(qc_df)} images")
    print(f"📊 Sample of initialized data:")
    print(qc_df.head())
else:
    print("❌ Metadata file not found. Run 01_prepare_videos.py first!")
    
# Show QC file location
qc_csv_path = get_qc_csv_path(data_dir)
print(f"\n📁 QC file location: {qc_csv_path}")

## 📋 Setup: Create Test Data

First, let's create some sample images to work with.

In [None]:
# SETUP: Create test environment and sample data
print("📋 SETUP: Creating test environment...")

# Setup test directory
test_data_dir = Path('annotation_demo_test')
test_data_dir.mkdir(exist_ok=True)
print(f"✓ Created test directory: {test_data_dir}")

# Load existing QC data or create new
qc_df = load_qc_data(test_data_dir)

# Create sample images if they don't exist
sample_images = [
    {"experiment_id": "20250101", "video_id": "20250101_A01", "image_id": "20250101_A01_t001"},
    {"experiment_id": "20250101", "video_id": "20250101_A01", "image_id": "20250101_A01_t002"},
    {"experiment_id": "20250101", "video_id": "20250101_B01", "image_id": "20250101_B01_t001"},
    {"experiment_id": "20250102", "video_id": "20250102_A01", "image_id": "20250102_A01_t001"},
    {"experiment_id": "20250102", "video_id": "20250102_A01", "image_id": "20250102_A01_t002"},
    {"experiment_id": "20250102", "video_id": "20250102_B01", "image_id": "20250102_B01_t001"},
]

for img in sample_images:
    if len(qc_df) == 0 or img["image_id"] not in qc_df["image_id"].values:
        new_row = pd.DataFrame([{
            "experiment_id": img["experiment_id"],
            "video_id": img["video_id"], 
            "image_id": img["image_id"],
            "qc_flag": ,
            "notes": None,
            "annotator": None
        }])
        qc_df = pd.concat([qc_df, new_row], ignore_index=True)

# Save the initial data
save_qc_data(qc_df, test_data_dir)

print(f"✅ Setup complete: {len(qc_df)} images ready for QC")
print(f"📂 QC file location: {test_data_dir / 'image_quality_qc.csv'}")
print("\n🔍 Initial data (all images unflagged):")
display(qc_df)

📋 SETUP: Creating test environment...
✓ Created test directory: annotation_demo_test
Created new QC DataFrame
Saved QC data to: quality_control/image_quality_qc.csv
✅ Setup complete: 6 images ready for QC
📂 QC file location: annotation_demo_test/image_quality_qc.csv

🔍 Initial data (all images unflagged):


Unnamed: 0,experiment_id,video_id,image_id,qc_flag,notes,annotator
0,20250101,20250101_A01,20250101_A01_t001,,,
1,20250101,20250101_A01,20250101_A01_t002,,,
2,20250101,20250101_B01,20250101_B01_t001,,,
3,20250102,20250102_A01,20250102_A01_t001,,,
4,20250102,20250102_A01,20250102_A01_t002,,,
5,20250102,20250102_B01,20250102_B01_t001,,,


## ⭐ Demo 1: Manual QC with `manual_qc()`

The `manual_qc()` function is the **recommended way** to flag images during human review. It automatically sets the annotator field and provides a clean interface.

In [None]:
# DEMO 1: Manual QC with manual_qc() ⭐
print("⭐ DEMO 1: Manual QC with manual_qc() function")
print("\n📋 Available QC flags:")
for flag, description in QC_FLAGS.items():
    print(f"  {flag}: {description}")

# Example 1a: Flag a single image as BLUR
print("\n🔍 Example 1a: Flagging a single image as BLUR")
qc_df = manual_qc(
    data_dir=test_data_dir,
    annotator="mcolon",  # Your name - manual_qc() handles the rest!
    image_ids=["20250101_A01_t001"],
    qc_flag="BLUR",
    notes="Manual inspection - image appears out of focus"
)

print("✅ Flagged image 20250101_A01_t001 as BLUR")

# Example 1b: Flag multiple images in a batch  
print("\n🔍 Example 1b: Flagging multiple images as DARK")
qc_df = manual_qc(
    data_dir=test_data_dir,
    annotator="mcolon",
    image_ids=["20250101_A01_t002", "20250101_B01_t001"],
    qc_flag="DARK", 
    notes="Manual review - insufficient illumination"
)

print("✅ Flagged 2 images as DARK")

# Check our progress
print(f"\n📊 QC Progress: {len(get_flagged_images(qc_df))} flagged, {len(qc_df) - len(get_flagged_images(qc_df))} remaining")
display(qc_df[qc_df['qc_flag'].notna()])


## 🤖 Demo 2: Automatic QC with `auto_qc()`

The `auto_qc()` function is perfect for algorithmic quality control. It automatically sets the annotator to "automatic" and integrates seamlessly with image analysis pipelines.

In [None]:
# DEMO 2: Automatic QC with auto_qc() 🤖  
print("🤖 DEMO 2: Automatic QC with auto_qc() function")

# Example 2a: Simulate automatic blur detection
print("\n🔍 Example 2a: Automatic blur detection")
qc_df = auto_qc(
    data_dir=test_data_dir,
    image_ids=["20250102_A01_t001"],
    qc_flag="BLUR",
    notes="Automatic: Laplacian variance < threshold (blur_threshold=100)"
    # Note: auto_qc() automatically sets annotator="automatic"
)

print("✅ Automatically flagged image 20250102_A01_t001 as BLUR")

# Example 2b: Simulate automatic brightness detection
print("\n🔍 Example 2b: Automatic brightness detection") 
qc_df = auto_qc(
    data_dir=test_data_dir,
    image_ids=["20250102_A01_t002"],
    qc_flag="DARK",
    notes="Automatic: Mean brightness < threshold (brightness_threshold=50)"
)

print("✅ Automatically flagged image 20250102_A01_t002 as DARK")

# Example 2c: Batch automatic QC
print("\n🔍 Example 2c: Batch automatic processing")
qc_df = auto_qc(
    data_dir=test_data_dir,
    image_ids=["20250102_B01_t001"],
    qc_flag="OUT_OF_FOCUS", 
    notes="Automatic: Focus metric below threshold"
)

print("✅ Automatically flagged 1 more image")

# Show automatic annotations
auto_flags = qc_df[qc_df['annotator'] == 'automatic']
print(f"\n📊 Automatic QC Results: {len(auto_flags)} images flagged automatically")
display(auto_flags)

## 📊 Demo 3: Analyzing QC Results

Now let's use the utility functions to analyze our QC annotations from both manual and automatic sources.

In [None]:
# DEMO 3: Analyzing QC Results 📊
print("📊 DEMO 3: Analyzing QC Results")

# Reload current data
qc_df = load_qc_data(test_data_dir)

# Get flagged images  
flagged_images = get_flagged_images(qc_df)
print(f"\n🚩 Total flagged images (excluding PASS): {len(flagged_images)}")

# Get unflagged images  
unflagged_count = len(qc_df) - len(flagged_images)
print(f"✅ Total unflagged images: {unflagged_count}")

# Analyze by QC flag type
print("\n📋 Images by QC flag:")
for flag in ['BLUR', 'DARK', 'OUT_OF_FOCUS']:
    flag_images = get_images_by_flag(qc_df, flag)
    if len(flag_images) > 0:
        print(f"  {flag}: {len(flag_images)} images")
        for img in flag_images[:3]:  # Show first 3
            print(f"    - {img}")

# Analyze by annotator
print("\n👤 Images by annotator:")
manual_images = get_images_by_annotator(qc_df, 'mcolon')
auto_images = get_images_by_annotator(qc_df, 'automatic')
print(f"  mcolon (manual): {len(manual_images)} images")
print(f"  automatic: {len(auto_images)} images")

# Overall summary
print("\n📈 QC Summary:")
summary = get_qc_summary(qc_df)
display(summary)

print("\n🎯 Current QC data (all annotations):")
display(qc_df)

## 🔄 Demo 4: Advanced QC Operations

Learn how to update, remove, and manage QC flags with the convenience functions.

In [None]:
# DEMO 4: Advanced QC Operations 🔄
print("🔄 DEMO 4: Advanced QC Operations")

print("\n📋 Current flagged images before updates:")
qc_df = load_qc_data(test_data_dir)
flagged_before = qc_df[qc_df['qc_flag'].notna()]
display(flagged_before[['image_id', 'qc_flag', 'annotator', 'notes']])

# Example 4a: Override/update an existing flag using manual_qc
print("\n🔍 Example 4a: Updating an existing flag")
qc_df = manual_qc(
    data_dir=test_data_dir,
    annotator="mcolon",
    image_ids=["20250101_A01_t001"], 
    qc_flag="OUT_OF_FOCUS",
    notes="Manual review - changed from BLUR to OUT_OF_FOCUS after closer inspection",
    overwrite=True  # Allow overwriting existing flags
)

print("✅ Updated image 20250101_A01_t001 from BLUR to OUT_OF_FOCUS")

# Example 4b: Mark an image as PASS (good quality)
print("\n🔍 Example 4b: Marking an image as PASS")
qc_df = manual_qc(
    data_dir=test_data_dir,
    annotator="mcolon",
    image_ids=["20250101_A01_t002"],
    qc_flag="PASS", 
    notes="Manual review - actually good quality upon second look",
    overwrite=True
)

print("✅ Updated image 20250101_A01_t002 from DARK to PASS")

# Example 4c: Batch update with auto_qc (e.g., reprocessing with new algorithm)
print("\n🔍 Example 4c: Batch reprocessing with improved algorithm")
qc_df = auto_qc(
    data_dir=test_data_dir, 
    image_ids=["20250102_A01_t001"],
    qc_flag="PASS",
    notes="Automatic v2.0: Passed improved blur detection algorithm",
    overwrite=True
)

print("✅ Reprocessed image 20250102_A01_t001 with improved algorithm")

print("\n📋 Updated QC data after changes:")
qc_df = load_qc_data(test_data_dir)
display(qc_df)

## 🎯 Summary: Key Takeaways

This demo showed the **recommended workflow** for MorphSeq image quality control!

In [None]:
# SUMMARY: Final QC Report 🎯
print("🎯 FINAL SUMMARY: QC Workflow Complete!")

# Load final data
qc_df = load_qc_data(test_data_dir)

print("\n📊 Final QC Statistics:")
print(f"  Total images processed: {len(qc_df)}")
print(f"  Images with QC flags: {len(qc_df[qc_df['qc_flag'].notna()])}")
print(f"  Images passing QC: {len(get_images_by_flag(qc_df, 'PASS'))}")
print(f"  Images failing QC: {len(get_flagged_images(qc_df))}")

print("\n👥 Annotator breakdown:")
for annotator in qc_df['annotator'].dropna().unique():
    count = len(get_images_by_annotator(qc_df, annotator))
    print(f"  {annotator}: {count} annotations")

print("\n🚩 Flag distribution:")
flag_summary = qc_df['qc_flag'].value_counts()
for flag, count in flag_summary.items():
    if pd.notna(flag):
        print(f"  {flag}: {count} images")

print(f"\n📁 Final QC data saved to: {test_data_dir / 'image_quality_qc.csv'}")

print("\n📋 Complete final dataset:")
display(qc_df)

print("\n" + "="*60)
print("🎉 DEMO COMPLETE!")
print("="*60)
print("\n⭐ KEY CONVENIENCE FUNCTIONS DEMONSTRATED:")
print("  • manual_qc() - For human QC review")
print("  • auto_qc() - For algorithmic QC detection")
print("\n💡 BENEFITS:")
print("  • Automatic annotator tracking")
print("  • Simpler syntax")
print("  • Built-in validation")
print("  • Consistent workflow")
print("\n🚀 Ready to integrate into your MorphSeq pipeline!")

## 🎉 Summary: MorphSeq QC Convenience Functions

This notebook demonstrated the **recommended workflow** for image quality control in MorphSeq using the convenient wrapper functions.

### ⭐ Key Functions Highlighted:

#### `manual_qc()` - Human Review Interface
```python
# Simple manual flagging
qc_df = manual_qc(
    data_dir=data_directory,
    annotator="your_name",
    image_ids=["image1", "image2"],
    qc_flag="BLUR",
    notes="Manual inspection notes"
)
```

#### `auto_qc()` - Algorithmic QC Interface  
```python
# Automatic flagging (annotator set to "automatic")
qc_df = auto_qc(
    data_dir=data_directory,
    image_ids=["image1", "image2"], 
    qc_flag="DARK",
    notes="Automatic: brightness < threshold"
)
```

### 💡 Why Use These Functions?

1. **Automatic annotator tracking** - No need to manually set who flagged what
2. **Simpler syntax** - Less typing, fewer parameters to remember  
3. **Consistent workflow** - Same interface for both manual and automatic QC
4. **Built-in validation** - Error checking and sensible defaults
5. **Easy integration** - Works seamlessly with existing analysis pipelines

### 🚀 Next Steps:

- Use `manual_qc()` in your manual review notebooks
- Integrate `auto_qc()` into your image analysis pipelines
- Combine both approaches for comprehensive quality control
- Use analysis functions like `get_qc_summary()`, `get_flagged_images()` for reporting

### 📖 More Resources:

- See `convenience_functions_demo.ipynb` for focused examples
- Check `simple_qc_demo.ipynb` for minimal working examples
- Run `qc_demo_blocks.py` for a complete script-based workflow
- `get_images_by_flag()`: Filter images by specific QC flag
- `get_images_by_annotator()`: Filter images by annotator
- Direct DataFrame manipulation for adding/removing flags

### Next Steps:

- Integrate with actual image processing pipeline
- Add automatic blur detection algorithms
- Create visualization tools for QC data
- Set up batch processing workflows