# 🎯 MorphSeq Quality Control System Demo

This notebook demonstrates the comprehensive QC flagging system for hierarchical experiment data.

## What You'll Learn:
- 🏗️ **Initialize** the QC system from experiment metadata
- 🏷️ **Flag entities** at all levels (experiment → video → image → embryo)
- 📦 **Batch operations** for efficient large-scale QC
- 🔍 **Validate and summarize** QC data
- 💾 **Save and manage** QC flags

## Key Features:
- **Author designation**: Set once, used automatically
- **Flexible batching**: Accumulate or apply directly 
- **Flag validation**: Ensures only valid flags are used
- **In-memory operations**: Fast performance until you save

## 📋 Setup and Imports

In [1]:
# Import the QC utilities
import sys
sys.path.append('/net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/scripts/utils')  # Update this path

from experiment_data_qc_utils import (
    ExperimentDataQC,
    initialize_qc_structure_from_metadata,
    show_valid_qc_categories,
    gen_flag_batch,
    add_flags_to_qc_data,
    load_qc_data,
    save_qc_data
)

import json
from pathlib import Path
from pprint import pprint

print("✅ Imports successful!")

✅ Imports successful!


## 🏗️ Step 1: Initialize the QC System

First, we'll set up the QC system from your existing experiment metadata.

In [2]:
# Define paths (update these for your setup)
quality_control_dir = "/net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/quality_control"  # Update this
experiment_metadata_path = "/net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/raw_data_organized/experiment_metadata.json"  # Update this

# Create demo paths for this notebook
demo_dir = Path(quality_control_dir + "/demo_qc")
demo_dir.mkdir(exist_ok=True)

quality_control_dir = demo_dir
print(f"Demo QC directory: {quality_control_dir}")

Demo QC directory: /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/quality_control/demo_qc


In [3]:
# Create a mock experiment metadata file for this demo
mock_metadata = {
    "experiments": {
        "20241215": {
            "videos": {
                "20241215_A01": {
                    "image_ids": ["20241215_A01_t001", "20241215_A01_t002", "20241215_A01_t003"]
                },
                "20241215_A02": {
                    "image_ids": ["20241215_A02_t001", "20241215_A02_t002"]
                }
            }
        },
        "20241216": {
            "videos": {
                "20241216_B01": {
                    "image_ids": ["20241216_B01_t001", "20241216_B01_t002"]
                }
            }
        }
    }
}

# Save mock metadata
experiment_metadata_path = demo_dir / "experiment_metadata.json"
with open(experiment_metadata_path, 'w') as f:
    json.dump(mock_metadata, f, indent=2)

print("📋 Created mock experiment metadata")
print(f"   - 2 experiments")
print(f"   - 3 videos total")
print(f"   - 7 images total")

📋 Created mock experiment metadata
   - 2 experiments
   - 3 videos total
   - 7 images total


In [4]:
# Initialize the QC system from metadata
print("🏗️ Initializing QC system...")

qc_data = initialize_qc_structure_from_metadata(
    quality_control_dir=quality_control_dir,
    experiment_metadata_path=experiment_metadata_path,
    overwrite=True  # Start fresh for demo
)

print("\n✅ QC system initialized!")

🏗️ Initializing QC system...
🔍 Checking flag integrity...
✅ All flags are valid!
QC structure initialized: 2 experiments, 3 videos, 7 images added

✅ QC system initialized!


## 🏷️ Step 2: View Available QC Flag Categories

Let's see what QC flags are available at each level.

In [5]:
# Show all valid QC flag categories
show_valid_qc_categories(quality_control_dir)

📋 Valid QC Flag Categories (from JSON file):

🏷️  Experiment Level:
   • PROTOCOL_DEVIATION: Deviation from standard imaging protocol
   • INCOMPLETE: Experiment was not completed

🏷️  Video Level:
   • TECHNICAL_ISSUE: Technical problems during video acquisition

🏷️  Image Level:
   • BLUR: Image is blurry (low variance of Laplacian)
   • DRY_WELL: Well dried out during imaging

🏷️  Embryo Level:
   • DEAD_EMBRYO: Embryo appears dead
   • EMBRYO_NOT_DETECTED: No embryo detected in expected location
   • ABNORMAL_DEVELOPMENT: Embryo shows abnormal development patterns

💡 To add new flag categories, edit the 'valid_qc_flag_categories'
   section in the experiment_data_qc.json file manually.


## 🎭 Step 3: Create QC Manager with Author Designation

Now we'll create our main QC interface with a default author.

In [6]:
# Create QC manager with author designation
qc = ExperimentDataQC(
    quality_control_dir=quality_control_dir,
    author_designation="demo_analyst"  # This will be used for all flags by default
)

print(f"🎭 Created QC manager: {qc}")
print(f"📝 Default author: {qc.author_designation}")
print(f"💾 Unsaved changes: {qc.has_unsaved_changes}")

🎭 Created QC manager: ExperimentDataQC(author='demo_analyst', experiments=2, categories=4, status=✅ saved)
📝 Default author: demo_analyst
💾 Unsaved changes: False


## 🚩 Step 4: Individual Flag Operations

Let's flag some issues at different levels. Notice how we don't need to specify the author each time!

In [10]:
# Flag an experiment-level issue
print("🔬 Flagging experiment-level issue...")
qc.flag_experiment(
    experiment_id="20241215",
    qc_flag="PROTOCOL_DEVIATION",
    notes="Temperature control failed during imaging"
)

# Flag a video-level issue  
print("📹 Flagging video-level issue...")
qc.flag_video(
    video_id="20241215_A01",
    qc_flag="TECHNICAL_ISSUE",
    notes="Camera focus drifted during acquisition"
)

# Flag multiple image-level issues
print("🖼️ Flagging image-level issues...")
qc.flag_image(
    image_id="20241215_A01_t001",
    qc_flag="BLUR",
    notes="Variance of Laplacian: 42 (below threshold of 100)"
)

# Flag with override author for special case
qc.flag_image(
    image_id="20241215_A02_t001",
    qc_flag="DRY_WELL",
    author="expert_reviewer",  # Override default author
    notes="Manual review: file corruption detected"
)

print(f"\n✅ Individual flags added!")
print(f"💾 Unsaved changes: {qc.has_unsaved_changes}")

🔬 Flagging experiment-level issue...
📹 Flagging video-level issue...
🖼️ Flagging image-level issues...

✅ Individual flags added!
💾 Unsaved changes: True


In [11]:
# This will raise an error because CORRUPT is not in the default valid flags
# Let's first add it to the valid flags for this demo
qc._qc_data["valid_qc_flag_categories"]["image_level"]["CORRUPT"] = "Cannot read/process image"

# Now this will work
qc.flag_image(
    image_id="20241215_A02_t001",
    qc_flag="CORRUPT",
    notes="Manual review: file corruption detected"
)

print("✅ Added CORRUPT flag category and applied it successfully!")


ValueError: Invalid QC flag 'CORRUPT' for level 'image'. Valid flags: ['BLUR', 'DRY_WELL']

## 📦 Step 5: Batch Operations - Images

For large-scale QC, batch operations are much more efficient.

In [14]:
# Demonstrate batch image flagging with different formats
print("📦 Demonstrating batch image flagging...")

# Prepare batch data in different formats
image_flags = {
    # Single flag per image
    "20241215_A01_t003": "BLUR",
    
    # Multiple flags per image (list of strings)
    "20241215_A02_t002": ["BLUR", "DRY_WELL"],
    
    # Detailed flags with custom authors/notes (list of dicts)
    "20241216_B01_t001": [
        {
            "qc_flag": "DRY_WELL", 
            "author": "automated_detector", 
            "notes": "Pixel intensity analysis detected drying"
        },
        {
            "qc_flag": "BLUR",
            "notes": "Confirmed by automated blur detection"
            # Uses default author since not specified
        }
    ]
}

# Apply batch flags
qc.flag_images_batch(image_flags)

print("✅ Batch image flags applied!")
print(f"📊 Flagged {len(image_flags)} images with various issues")

📦 Demonstrating batch image flagging...
✅ Batch image flags applied!
📊 Flagged 3 images with various issues


## ⚡ Step 6: Advanced Batch Operations with gen_flag_batch

The most flexible approach - works across ALL entity types!

In [16]:
# Method 1: Build batches for accumulation (DEFAULT)
print("⚡ Method 1: Building accumulated batch...")

batch = []
batch += qc.gen_flag_batch("experiment", "20241216", "INCOMPLETE", qc.author_designation, notes="Missing final time points")
batch += qc.gen_flag_batch("video", "20241216_B01", "TECHNICAL_ISSUE", qc.author_designation, notes="Lighting fluctuations")
batch += qc.gen_flag_batch("image", "20241216_B01_t002", "BLUR", qc.author_designation, notes="Motion blur detected")

print(f"📋 Built batch with {len(batch)} flag entries")

# Apply the accumulated batch
qc.add_flag_batch(batch)
print("✅ Accumulated batch applied!")

⚡ Method 1: Building accumulated batch...
📋 Built batch with 3 flag entries
✅ Accumulated batch applied!


In [19]:
# Method 2: Apply directly to memory (no batch needed)
print("⚡ Method 2: Direct application...")

# These flags are applied immediately
qc.gen_flag_batch(
    level="experiment", 
    entity_id="20241215", 
    qc_flag="INCOMPLETE",
    author=qc.author_designation,
    notes="Missing metadata file",
    apply_directly=True  # Applied immediately!
)

qc.gen_flag_batch(
    level="image",
    entity_id="20241215_A02_t002", 
    qc_flag="BLUR",
    author=qc.author_designation,
    notes="File header corruption",
    apply_directly=True
)

print("✅ Direct flags applied immediately!")

⚡ Method 2: Direct application...
✅ Direct flags applied immediately!


In [21]:
# Method 3: Mixed mode - conditional logic
print("⚡ Method 3: Mixed mode with conditional logic...")

# Simulate automated analysis results
automated_analysis = {
    "20241215_A01_t001": {"blur_score": 35, "confidence": 0.95, "needs_review": False},
    "20241215_A01_t002": {"blur_score": 65, "confidence": 0.70, "needs_review": True},
    "20241215_A01_t003": {"blur_score": 25, "confidence": 0.98, "needs_review": False}
}

batch = []
for image_id, analysis in automated_analysis.items():
    if analysis['needs_review']:
        # Low confidence -> add to batch for manual review
        batch += qc.gen_flag_batch(
            "image", image_id, "BLUR", qc.author_designation,
            notes=f"Needs review: blur_score={analysis['blur_score']}, confidence={analysis['confidence']}"
        )
    else:
        # High confidence -> apply directly
        qc.gen_flag_batch(
            "image", image_id, "DRY_WELL", qc.author_designation,
            notes=f"Auto-confirmed: blur_score={analysis['blur_score']}",
            apply_directly=True
        )

# Apply batched flags that need review
if batch:
    qc.add_flag_batch(batch)
    print(f"📋 Applied {len(batch)} flags needing review")

print("✅ Mixed mode processing complete!")

⚡ Method 3: Mixed mode with conditional logic...
📋 Applied 1 flags needing review
✅ Mixed mode processing complete!


## 📊 Step 7: Check QC Summary and Status

Let's see what flags we've added and get a summary.

In [22]:
# Get QC summary
print("📊 QC Summary:")
print("=" * 50)

summary = qc.get_summary()
for level, flag_counts in summary.items():
    if flag_counts:  # Only show levels with flags
        print(f"\n🏷️ {level.replace('_', ' ').title()}:")
        for flag, count in flag_counts.items():
            print(f"   • {flag}: {count} occurrences")

# Check specific entity flags
print("\n🔍 Detailed Flag Examples:")
print("=" * 50)

# Check flags for a specific image
image_flags = qc.get_image_flags("20241215_A01_t002")
print(f"\n📸 Image '20241215_A01_t002':")
print(f"   Flags: {image_flags['flags']}")
print(f"   Authors: {image_flags['authors']}")
print(f"   Notes: {image_flags['notes']}")

# Check manager status
print(f"\n💾 Manager Status: {qc}")

📊 QC Summary:

🔍 Detailed Flag Examples:

📸 Image '20241215_A01_t002':
   Flags: []
   Authors: []
   Notes: []

💾 Manager Status: ExperimentDataQC(author='demo_analyst', experiments=2, categories=4, status=⚠️  unsaved changes)


## ✅ Step 8: Validation and Integrity

Let's validate our flags and check data integrity.

In [23]:
# Check flag integrity
print("🔍 Checking flag integrity...")
integrity_result = qc.check_flag_integrity()

if integrity_result['valid']:
    print("\n✅ All flags are valid!")
else:
    print(f"\n⚠️ Found {len(integrity_result['issues'])} integrity issues")
    
# Show valid flags available
print("\n📋 Available flag categories:")
qc.show_valid_flags()

🔍 Checking flag integrity...
✅ All flags are valid!

✅ All flags are valid!

📋 Available flag categories:
📋 Valid QC Flag Categories (from JSON file):

🏷️  Experiment Level:
   • PROTOCOL_DEVIATION: Deviation from standard imaging protocol
   • INCOMPLETE: Experiment was not completed

🏷️  Video Level:
   • TECHNICAL_ISSUE: Technical problems during video acquisition

🏷️  Image Level:
   • BLUR: Image is blurry (low variance of Laplacian)
   • DRY_WELL: Well dried out during imaging

🏷️  Embryo Level:
   • DEAD_EMBRYO: Embryo appears dead
   • EMBRYO_NOT_DETECTED: No embryo detected in expected location
   • ABNORMAL_DEVELOPMENT: Embryo shows abnormal development patterns


## 💾 Step 9: Save QC Data

Finally, let's save our QC flags to disk.

In [24]:
# Save with backup
print("💾 Saving QC data...")
qc.save(backup=True)

print(f"\n📁 QC data saved to: {qc.qc_json_path}")
print(f"💾 Unsaved changes: {qc.has_unsaved_changes}")

# Show final status
print(f"\n🎯 Final Status: {qc}")

💾 Saving QC data...
📋 Created backup: /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/quality_control/demo_qc/experiment_data_qc.backup_20250702_153713.json
💾 Saved QC data to /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/quality_control/demo_qc/experiment_data_qc.json

📁 QC data saved to: /net/trapnell/vol1/home/mdcolon/proj/morphseq/segmentation_sandbox/data/quality_control/demo_qc/experiment_data_qc.json
💾 Unsaved changes: False

🎯 Final Status: ExperimentDataQC(author='demo_analyst', experiments=2, categories=4, status=✅ saved)


## 🔍 Step 10: Inspect the Saved QC File

Let's look at the structure of our saved QC data.

In [25]:
# Load and display the QC JSON structure
with open(qc.qc_json_path, 'r') as f:
    saved_qc_data = json.load(f)

print("📋 QC JSON Structure:")
print("=" * 50)

# Show valid categories
print("\n🏷️ Valid QC Flag Categories:")
pprint(saved_qc_data['valid_qc_flag_categories'])

# Show experiment structure (just keys for overview)
print("\n🔬 Experiments with QC Data:")
for exp_id, exp_data in saved_qc_data['experiments'].items():
    print(f"\n📊 Experiment {exp_id}:")
    if exp_data.get('flags'):
        print(f"   Flags: {exp_data['flags']}")
    
    print(f"   Videos: {list(exp_data.get('videos', {}).keys())}")
    
    # Show video details
    for vid_id, vid_data in exp_data.get('videos', {}).items():
        if vid_data.get('flags') or vid_data.get('images', {}):
            print(f"     📹 {vid_id}:")
            if vid_data.get('flags'):
                print(f"       Flags: {vid_data['flags']}")
            
            flagged_images = {img_id: img_data['flags'] 
                            for img_id, img_data in vid_data.get('images', {}).items() 
                            if img_data.get('flags')}
            if flagged_images:
                print(f"       Flagged images: {len(flagged_images)}")
                for img_id, flags in list(flagged_images.items())[:3]:  # Show first 3
                    print(f"         📸 {img_id}: {flags}")

📋 QC JSON Structure:

🏷️ Valid QC Flag Categories:
{'embryo_level': {'ABNORMAL_DEVELOPMENT': 'Embryo shows abnormal development '
                                          'patterns',
                  'DEAD_EMBRYO': 'Embryo appears dead',
                  'EMBRYO_NOT_DETECTED': 'No embryo detected in expected '
                                         'location'},
 'experiment_level': {'INCOMPLETE': 'Experiment was not completed',
                      'PROTOCOL_DEVIATION': 'Deviation from standard imaging '
                                            'protocol'},
 'image_level': {'BLUR': 'Image is blurry (low variance of Laplacian)',
                 'DRY_WELL': 'Well dried out during imaging'},
 'video_level': {'TECHNICAL_ISSUE': 'Technical problems during video '
                                    'acquisition'}}

🔬 Experiments with QC Data:

📊 Experiment 20241215:
   Flags: ['PROTOCOL_DEVIATION', 'PROTOCOL_DEVIATION', 'PROTOCOL_DEVIATION', 'PROTOCOL_DEVIATION', 'INCOMPLETE', 'IN

## 🚀 Step 11: Advanced Usage - Standalone Functions

For scripts and pipelines, you can also use the standalone functions.

In [None]:
# Demonstrate standalone functions for pipeline use
print("🚀 Standalone function examples for pipelines...")

# Load QC data directly
qc_data = load_qc_data(quality_control_dir)
print(f"📂 Loaded QC data with {len(qc_data['experiments'])} experiments")

# Build batch with standalone function
batch = []
batch += gen_flag_batch("image", "20241216_B01_t001", "OVEREXPOSURE", "pipeline_script", "Auto-detected")
batch += gen_flag_batch("image", "20241216_B01_t002", "CORRUPT", "pipeline_script", "Read error")

print(f"📋 Built batch with {len(batch)} entries")

# Apply batch to QC data
qc_data = add_flags_to_qc_data(qc_data, batch)
print("✅ Applied batch to QC data")

# Save directly
save_qc_data(qc_data, quality_control_dir)
print("💾 Saved QC data")

print("\n🎯 Perfect for automated QC pipelines!")

## 🎉 Demo Complete!

### What We Accomplished:

✅ **Initialized** QC system from experiment metadata  
✅ **Created** QC manager with author designation  
✅ **Flagged** entities at all levels (experiment → video → image → embryo)  
✅ **Used batch operations** for efficient large-scale QC  
✅ **Demonstrated** flexible `gen_flag_batch` functionality  
✅ **Validated** flag integrity and viewed summaries  
✅ **Saved** QC data to persistent storage  
✅ **Explored** standalone functions for pipelines  

### Key Benefits:

🎭 **Author designation** eliminates repetitive author specification  
⚡ **Flexible batching** supports both accumulation and direct application  
🔍 **Automatic validation** ensures only valid flags are used  
💾 **In-memory operations** provide fast performance until saving  
📦 **Multiple interfaces** support both interactive and pipeline use  

### Next Steps:

1. **Integrate** with your image analysis pipelines
2. **Customize** QC flag categories in the JSON file
3. **Build** automated QC detection algorithms
4. **Create** QC reports and visualizations
5. **Scale** to your full dataset!

In [None]:
# Clean up demo files (optional)
import shutil

# Uncomment to remove demo directory
# shutil.rmtree(demo_dir)
# print("🧹 Demo files cleaned up")

print("🎉 Demo complete! Ready to implement in your MorphSeq pipeline.")