# ML Engineer SageMaker Pipeline for YOLOv11 Object Detection

This notebook implements a comprehensive SageMaker Pipeline for YOLOv11 object detection model training, evaluation, registration, and deployment. It replaces individual training job management with complete MLOps pipeline orchestration.

## Pipeline Architecture Overview

```
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Data Validation │───▶│ Model Training  │───▶│ Model Evaluation│
│ ProcessingStep  │    │ TrainingStep    │    │ ProcessingStep  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                                        │
                                                        ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Serverless      │◀───│ Model Creation  │◀───│ Performance     │
│ Endpoint Deploy │    │ CreateModelStep │    │ Condition Check │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                ▲                       │
                                │                       ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │ Approval        │◀───│ Model Registry  │
                       │ Condition Check │    │ RegisterModel   │
                       └─────────────────┘    └─────────────────┘
```

## Key Features

- **Complete Pipeline Orchestration**: End-to-end automation from data to deployment
- **Conditional Logic**: Performance-based deployment decisions
- **Model Registry Integration**: Centralized model management and versioning
- **Serverless Endpoints**: Cost-effective inference with auto-scaling
- **MLflow Integration**: Comprehensive experiment tracking and lineage
- **Error Recovery**: Robust error handling and retry mechanisms
- **Performance Monitoring**: Real-time pipeline execution monitoring

## Prerequisites

- AWS account with appropriate permissions
- AWS CLI configured with "ab" profile
- SageMaker Studio access with ML Engineer role
- Access to drone imagery dataset in S3: `lucaskle-ab3-project-pv`
- YOLOv11 training and inference containers in ECR
- SageMaker managed MLFlow tracking server

Let's start by setting up our environment and importing the necessary libraries.

In [None]:
# Install required packages for SageMaker Pipelines
!pip install --quiet sagemaker>=2.190.0 boto3>=1.28.0 pandas>=2.0.0 matplotlib>=3.7.0 \
    numpy>=1.24.0 PyYAML>=6.0 mlflow>=3.0.0 requests-auth-aws-sigv4>=0.7 ipywidgets>=8.0.0

print("✅ Required packages installed successfully!")

In [None]:
# Core imports
import os
import json
import time
import boto3
import sagemaker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from IPython.display import display, HTML, Markdown
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

# SageMaker Pipeline imports
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CreateModelStep
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo, ConditionEquals
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet
from sagemaker.workflow.parameters import ParameterString, ParameterFloat, ParameterInteger
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.properties import PropertyFile

# SageMaker components
from sagemaker.estimator import Estimator
from sagemaker.processing import ProcessingInput, ProcessingOutput, Processor
from sagemaker.inputs import TrainingInput
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.serverless import ServerlessInferenceConfig

# MLflow imports
import mlflow
import mlflow.sagemaker

print("✅ All imports successful!")

In [None]:
# Set up AWS session with "ab" profile
session = boto3.Session(profile_name='ab')
sagemaker_session = sagemaker.Session(boto_session=session)
pipeline_session = PipelineSession(boto_session=session)
sagemaker_client = session.client('sagemaker')
region = session.region_name
account_id = session.client('sts').get_caller_identity()['Account']

# Configuration
BUCKET_NAME = 'lucaskle-ab3-project-pv'
ROLE_ARN = sagemaker_session.get_caller_identity_arn()
MODEL_PACKAGE_GROUP_NAME = "yolov11-drone-detection-models"
PIPELINE_NAME = "yolov11-training-pipeline"

# Set up MLFlow tracking with SageMaker managed server
try:
    tracking_server_arn = "arn:aws:sagemaker:us-east-1:192771711075:mlflow-tracking-server/sagemaker-core-setup-mlflow-server"
    mlflow.set_tracking_uri(tracking_server_arn)
    
    # Create or set experiment
    experiment_name = "yolov11-pipeline-experiments"
    try:
        mlflow.create_experiment(experiment_name)
        print(f"Created new MLflow experiment: {experiment_name}")
    except Exception:
        mlflow.set_experiment(experiment_name)
        print(f"Using existing MLflow experiment: {experiment_name}")
    
    print(f"✅ Connected to SageMaker managed MLflow server")
    print(f"Tracking Server ARN: {tracking_server_arn}")
    
except Exception as e:
    print(f"⚠️  Could not connect to SageMaker managed MLflow: {e}")
    print("Using basic MLflow setup as fallback")
    experiment_name = "yolov11-pipeline-experiments"
    mlflow.set_experiment(experiment_name)

# Display configuration
print(f"\n📋 Configuration Summary:")
print(f"   Data Bucket: {BUCKET_NAME}")
print(f"   Region: {region}")
print(f"   Account ID: {account_id}")
print(f"   Role ARN: {ROLE_ARN}")
print(f"   Model Package Group: {MODEL_PACKAGE_GROUP_NAME}")
print(f"   Pipeline Name: {PIPELINE_NAME}")
print(f"   MLflow Experiment: {experiment_name}")

print("\n✅ Environment setup complete!")

## 1. Dataset Discovery and Validation

Before creating our pipeline, let's discover and validate available YOLOv11 datasets. The pipeline will use parameterized dataset paths for flexibility.

In [None]:
def discover_yolo_datasets(bucket, prefix="datasets/"):
    """Discover available YOLOv11 datasets with comprehensive validation"""
    s3_client = session.client('s3')
    
    try:
        response = s3_client.list_objects_v2(
            Bucket=bucket,
            Prefix=prefix,
            Delimiter='/'
        )
        
        datasets = []
        if 'CommonPrefixes' in response:
            for obj in response['CommonPrefixes']:
                dataset_prefix = obj['Prefix']
                dataset_name = dataset_prefix.split('/')[-2]
                
                # Validate dataset structure
                validation_result = validate_yolo_dataset_structure(bucket, dataset_prefix)
                
                datasets.append({
                    'name': dataset_name,
                    'prefix': dataset_prefix,
                    'full_path': f's3://{bucket}/{dataset_prefix}',
                    'valid': validation_result['valid'],
                    'validation_details': validation_result
                })
        
        return datasets
        
    except Exception as e:
        print(f"❌ Error discovering datasets: {e}")
        return []

def validate_yolo_dataset_structure(bucket, dataset_prefix):
    """Validate YOLOv11 dataset structure for pipeline compatibility"""
    s3_client = session.client('s3')
    
    # Required structure for YOLOv11 pipeline
    required_structure = {
        'train/images/': False,
        'train/labels/': False,
        'validation/images/': False,  # Note: validation/ not val/
        'validation/labels/': False,
        'data.yaml': False,
        'dataset_info.json': False
    }
    
    validation_details = {
        'valid': False,
        'missing_components': [],
        'found_components': [],
        'train_image_count': 0,
        'val_image_count': 0,
        'train_label_count': 0,
        'val_label_count': 0,
        'pipeline_ready': False
    }
    
    try:
        print(f"🔍 Validating dataset: {dataset_prefix}")
        
        for required_path in required_structure.keys():
            full_path = dataset_prefix + required_path
            
            if required_path.endswith('/'):
                # Check directories with pagination for large datasets
                paginator = s3_client.get_paginator('list_objects_v2')
                page_iterator = paginator.paginate(
                    Bucket=bucket,
                    Prefix=full_path,
                    PaginationConfig={'PageSize': 1000}
                )
                
                file_count = 0
                has_files = False
                
                for page in page_iterator:
                    if 'Contents' in page:
                        has_files = True
                        page_files = [obj for obj in page['Contents'] 
                                    if not obj['Key'].endswith('/') and 
                                       obj['Key'] != full_path]
                        file_count += len(page_files)
                
                if has_files and file_count > 0:
                    required_structure[required_path] = True
                    validation_details['found_components'].append(required_path)
                    
                    # Store counts
                    if 'train/images/' in required_path:
                        validation_details['train_image_count'] = file_count
                    elif 'validation/images/' in required_path:
                        validation_details['val_image_count'] = file_count
                    elif 'train/labels/' in required_path:
                        validation_details['train_label_count'] = file_count
                    elif 'validation/labels/' in required_path:
                        validation_details['val_label_count'] = file_count
                    
                    print(f"   ✅ {required_path}: {file_count:,} files")
                else:
                    validation_details['missing_components'].append(required_path)
                    print(f"   ❌ {required_path}: Not found or empty")
            else:
                # Check individual files
                try:
                    s3_client.head_object(Bucket=bucket, Key=full_path)
                    required_structure[required_path] = True
                    validation_details['found_components'].append(required_path)
                    print(f"   ✅ {required_path}: Found")
                except:
                    validation_details['missing_components'].append(required_path)
                    print(f"   ❌ {required_path}: Not found")
        
        # Dataset is valid if all required components are found
        validation_details['valid'] = all(required_structure.values())
        
        # Additional pipeline readiness checks
        if validation_details['valid']:
            # Check for reasonable dataset sizes
            min_images = 100  # Minimum for meaningful training
            train_images = validation_details['train_image_count']
            val_images = validation_details['val_image_count']
            
            pipeline_ready = (
                train_images >= min_images and
                val_images >= min_images // 4 and  # At least 25% of training size
                validation_details['train_image_count'] == validation_details['train_label_count'] and
                validation_details['val_image_count'] == validation_details['val_label_count']
            )
            
            validation_details['pipeline_ready'] = pipeline_ready
            
            if pipeline_ready:
                print(f"   🚀 Dataset is PIPELINE READY")
            else:
                print(f"   ⚠️  Dataset valid but may have issues for pipeline training")
        
        # Summary
        total_images = validation_details['train_image_count'] + validation_details['val_image_count']
        total_labels = validation_details['train_label_count'] + validation_details['val_label_count']
        
        print(f"   📊 Summary: {total_images:,} images, {total_labels:,} labels")
        print(f"   Status: {'✅ VALID' if validation_details['valid'] else '❌ INVALID'}")
        
    except Exception as e:
        print(f"❌ Error during validation: {e}")
    
    return validation_details

# Discover and validate datasets
print("🔍 Discovering YOLOv11 datasets for pipeline...")
available_datasets = discover_yolo_datasets(BUCKET_NAME)

if available_datasets:
    print(f"\n📊 Found {len(available_datasets)} datasets:")
    print("=" * 80)
    
    pipeline_ready_datasets = []
    for i, dataset in enumerate(available_datasets):
        status_icon = "🚀" if dataset['validation_details'].get('pipeline_ready', False) else "✅" if dataset['valid'] else "❌"
        print(f"{i+1}. {status_icon} {dataset['name']}")
        print(f"   Path: {dataset['full_path']}")
        
        if dataset['valid']:
            details = dataset['validation_details']
            print(f"   📈 Training: {details['train_image_count']:,} images, {details['train_label_count']:,} labels")
            print(f"   📊 Validation: {details['val_image_count']:,} images, {details['val_label_count']:,} labels")
            
            if details.get('pipeline_ready', False):
                pipeline_ready_datasets.append(dataset)
        else:
            print(f"   ⚠️  Missing: {', '.join(details['missing_components'])}")
        
        print("-" * 80)
    
    print(f"\n🚀 {len(pipeline_ready_datasets)} dataset(s) ready for pipeline execution")
    
    # Store for pipeline configuration
    global validated_datasets, pipeline_ready_datasets_global
    validated_datasets = available_datasets
    pipeline_ready_datasets_global = pipeline_ready_datasets
    
else:
    print("❌ No datasets found. Please prepare datasets using the Data Scientist notebook first.")
    validated_datasets = []
    pipeline_ready_datasets_global = []

## 2. Pipeline Parameters Configuration

Define parameterized pipeline configuration for flexibility and reusability.

In [None]:
# Interactive dataset selection widget
def create_dataset_selector():
    """Create interactive widget for dataset selection"""
    if not pipeline_ready_datasets_global:
        print("❌ No pipeline-ready datasets available")
        return None
    
    dataset_options = [(f"{ds['name']} ({ds['validation_details']['train_image_count'] + ds['validation_details']['val_image_count']:,} images)", 
                       ds) for ds in pipeline_ready_datasets_global]
    
    dataset_selector = widgets.Dropdown(
        options=dataset_options,
        description='Dataset:',
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='600px')
    )
    
    return dataset_selector

# Create parameter configuration widgets
def create_pipeline_config_widgets():
    """Create interactive widgets for pipeline configuration"""
    
    # Dataset selection
    dataset_selector = create_dataset_selector()
    if not dataset_selector:
        return None
    
    # Model configuration
    model_variant = widgets.Dropdown(
        options=['yolov11n', 'yolov11s', 'yolov11m', 'yolov11l', 'yolov11x'],
        value='yolov11n',
        description='Model Variant:',
        style={'description_width': 'initial'}
    )
    
    image_size = widgets.IntSlider(
        value=640,
        min=320,
        max=1280,
        step=32,
        description='Image Size:',
        style={'description_width': 'initial'}
    )
    
    # Training configuration
    batch_size = widgets.IntSlider(
        value=16,
        min=4,
        max=64,
        step=4,
        description='Batch Size:',
        style={'description_width': 'initial'}
    )
    
    epochs = widgets.IntSlider(
        value=50,
        min=10,
        max=200,
        step=10,
        description='Epochs:',
        style={'description_width': 'initial'}
    )
    
    learning_rate = widgets.FloatLogSlider(
        value=0.001,
        base=10,
        min=-5,
        max=-1,
        step=0.1,
        description='Learning Rate:',
        style={'description_width': 'initial'}
    )
    
    # Infrastructure configuration
    instance_type = widgets.Dropdown(
        options=['ml.g4dn.xlarge', 'ml.g4dn.2xlarge', 'ml.g4dn.4xlarge', 'ml.p3.2xlarge', 'ml.p3.8xlarge'],
        value='ml.g4dn.xlarge',
        description='Instance Type:',
        style={'description_width': 'initial'}
    )
    
    use_spot = widgets.Checkbox(
        value=True,
        description='Use Spot Instances',
        style={'description_width': 'initial'}
    )
    
    # Performance thresholds
    performance_threshold = widgets.FloatSlider(
        value=0.3,
        min=0.1,
        max=0.9,
        step=0.05,
        description='mAP@0.5 Threshold:',
        style={'description_width': 'initial'}
    )
    
    # Deployment configuration
    auto_deploy = widgets.Checkbox(
        value=False,
        description='Auto-deploy if approved',
        style={'description_width': 'initial'}
    )
    
    endpoint_name = widgets.Text(
        value=f"yolov11-endpoint-{datetime.now().strftime('%Y-%m-%d-%H-%M')}",
        description='Endpoint Name:',
        style={'description_width': 'initial'},
        layout=widgets.Layout(width='400px')
    )
    
    return {
        'dataset_selector': dataset_selector,
        'model_variant': model_variant,
        'image_size': image_size,
        'batch_size': batch_size,
        'epochs': epochs,
        'learning_rate': learning_rate,
        'instance_type': instance_type,
        'use_spot': use_spot,
        'performance_threshold': performance_threshold,
        'auto_deploy': auto_deploy,
        'endpoint_name': endpoint_name
    }

# Create configuration widgets
config_widgets = create_pipeline_config_widgets()

if config_widgets:
    print("🎛️ Pipeline Configuration Interface")
    print("=" * 50)
    
    # Display widgets in organized groups
    dataset_box = widgets.VBox([
        widgets.HTML("<h3>📊 Dataset Configuration</h3>"),
        config_widgets['dataset_selector']
    ])
    
    model_box = widgets.VBox([
        widgets.HTML("<h3>🤖 Model Configuration</h3>"),
        config_widgets['model_variant'],
        config_widgets['image_size']
    ])
    
    training_box = widgets.VBox([
        widgets.HTML("<h3>🏋️ Training Configuration</h3>"),
        config_widgets['batch_size'],
        config_widgets['epochs'],
        config_widgets['learning_rate']
    ])
    
    infrastructure_box = widgets.VBox([
        widgets.HTML("<h3>🏗️ Infrastructure Configuration</h3>"),
        config_widgets['instance_type'],
        config_widgets['use_spot']
    ])
    
    deployment_box = widgets.VBox([
        widgets.HTML("<h3>🚀 Deployment Configuration</h3>"),
        config_widgets['performance_threshold'],
        config_widgets['auto_deploy'],
        config_widgets['endpoint_name']
    ])
    
    # Display all configuration widgets
    display(widgets.VBox([
        dataset_box,
        model_box,
        training_box,
        infrastructure_box,
        deployment_box
    ]))
    
    print("\n💡 Configure the parameters above, then run the next cell to create the pipeline.")
    
else:
    print("❌ Cannot create configuration interface - no pipeline-ready datasets available")
   
def extract_config_values():
    """Extract current values from configuration widgets"""
    if not config_widgets:
        return None
    
    selected_dataset = config_widgets['dataset_selector'].value
    
    return {
        'dataset_name': selected_dataset['name'],
        'dataset_prefix': selected_dataset['prefix'],
        'dataset_path': selected_dataset['full_path'],
        'model_variant': config_widgets['model_variant'].value,
        'image_size': config_widgets['image_size'].value,
        'batch_size': config_widgets['batch_size'].value,
        'epochs': config_widgets['epochs'].value,
        'learning_rate': config_widgets['learning_rate'].value,
        'instance_type': config_widgets['instance_type'].value,
        'use_spot': config_widgets['use_spot'].value,
        'performance_threshold': config_widgets['performance_threshold'].value,
        'auto_deploy': config_widgets['auto_deploy'].value,
        'endpoint_name': config_widgets['endpoint_name'].value
    }
   
print("✅ Configuration interface ready!")


## 3. Pipeline Parameters Definition

Define SageMaker Pipeline parameters for dynamic configuration and reusability.

In [None]:
def create_pipeline_parameters():
    """Create SageMaker Pipeline parameters"""
    
    # Extract current configuration
    config = extract_config_values()
    if not config:
        print("❌ No configuration available")
        return None
    
    # Define pipeline parameters with current values as defaults
    parameters = {
        # Dataset parameters
        'dataset_path': ParameterString(
            name="DatasetPath",
            default_value=config['dataset_path']
        ),
        'dataset_name': ParameterString(
            name="DatasetName",
            default_value=config['dataset_name']
        ),
        
        # Model parameters
        'model_variant': ParameterString(
            name="ModelVariant",
            default_value=config['model_variant']
        ),
        'image_size': ParameterInteger(
            name="ImageSize",
            default_value=config['image_size']
        ),
        
        # Training parameters
        'batch_size': ParameterInteger(
            name="BatchSize",
            default_value=config['batch_size']
        ),
        'epochs': ParameterInteger(
            name="Epochs",
            default_value=config['epochs']
        ),
        'learning_rate': ParameterFloat(
            name="LearningRate",
            default_value=config['learning_rate']
        ),
        
        # Infrastructure parameters
        'instance_type': ParameterString(
            name="InstanceType",
            default_value=config['instance_type']
        ),
        'use_spot': ParameterString(
            name="UseSpot",
            default_value="true" if config['use_spot'] else "false"
        ),
        
        # Performance threshold
        'performance_threshold': ParameterFloat(
            name="PerformanceThreshold",
            default_value=config['performance_threshold']
        ),
        
        # Deployment parameters
        'endpoint_name': ParameterString(
            name="EndpointName",
            default_value=config['endpoint_name']
        ),
        
        # Output paths
        'model_output_path': ParameterString(
            name="ModelOutputPath",
            default_value=f"s3://{BUCKET_NAME}/pipeline-artifacts/models"
        ),
        'evaluation_output_path': ParameterString(
            name="EvaluationOutputPath",
            default_value=f"s3://{BUCKET_NAME}/pipeline-artifacts/evaluation"
        )
    }
    
    return parameters, config

# Create pipeline parameters
pipeline_parameters, current_config = create_pipeline_parameters()

if pipeline_parameters:
    print("✅ Pipeline parameters created successfully!")
    print("\n📋 Parameter Summary:")
    for name, param in pipeline_parameters.items():
        print(f"   {name}: {param.default_value}")
else:
    print("❌ Failed to create pipeline parameters")

## 4. Pipeline Step Definitions

Define each step of the SageMaker Pipeline with proper dependencies and data flow.

In [None]:
def create_data_validation_step(parameters):
    """Create data validation processing step"""
    
    # Data validation processor
    validation_processor = Processor(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-preprocessing:latest",
        role=ROLE_ARN,
        instance_count=1,
        instance_type="ml.m5.large",
        sagemaker_session=pipeline_session
    )
    
    # Define processing step
    validation_step = ProcessingStep(
        name="DataValidation",
        processor=validation_processor,
        inputs=[
            ProcessingInput(
                source=parameters['dataset_path'],
                destination="/opt/ml/processing/input",
                input_name="dataset"
            )
        ],
        outputs=[
            ProcessingOutput(
                output_name="validation_report",
                source="/opt/ml/processing/output",
                destination=f"s3://{BUCKET_NAME}/pipeline-artifacts/validation"
            )
        ],
        code="scripts/validate_dataset.py",
        job_arguments=[
            "--dataset-name", parameters['dataset_name'],
            "--model-variant", parameters['model_variant']
        ]
    )
    
    return validation_step

def create_training_step(parameters, validation_step):
    """Create YOLOv11 training step"""
    
    # Training estimator
    estimator = Estimator(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-training:latest",
        role=ROLE_ARN,
        instance_count=1,
        instance_type=parameters['instance_type'],
        output_path=parameters['model_output_path'],
        sagemaker_session=pipeline_session,
        use_spot_instances=(parameters['use_spot'] == "true"),
        max_wait=3600 if parameters['use_spot'] == "true" else None,
        max_run=3600,
        hyperparameters={
            "model_variant": parameters['model_variant'],
            "image_size": parameters['image_size'],
            "batch_size": parameters['batch_size'],
            "epochs": parameters['epochs'],
            "learning_rate": parameters['learning_rate'],
            "dataset_name": parameters['dataset_name']
        }
    )
    
    # Training step
    training_step = TrainingStep(
        name="YOLOv11Training",
        estimator=estimator,
        inputs={
            "training": TrainingInput(
                s3_data=parameters['dataset_path'],
                content_type="application/x-parquet"
            )
        },
        depends_on=[validation_step.name]  # Wait for validation to complete
    )
    
    return training_step

def create_evaluation_step(parameters, training_step):
    """Create model evaluation processing step"""
    
    # Evaluation processor
    evaluation_processor = Processor(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-evaluation:latest",
        role=ROLE_ARN,
        instance_count=1,
        instance_type="ml.g4dn.xlarge",  # GPU for faster evaluation
        sagemaker_session=pipeline_session
    )
    
    # Property file for evaluation metrics
    evaluation_report = PropertyFile(
        name="EvaluationReport",
        output_name="evaluation",
        path="evaluation.json"
    )
    
    # Evaluation step
    evaluation_step = ProcessingStep(
        name="ModelEvaluation",
        processor=evaluation_processor,
        inputs=[
            ProcessingInput(
                source=training_step.properties.ModelArtifacts.S3ModelArtifacts,
                destination="/opt/ml/processing/model",
                input_name="model"
            ),
            ProcessingInput(
                source=parameters['dataset_path'],
                destination="/opt/ml/processing/test",
                input_name="test_data"
            )
        ],
        outputs=[
            ProcessingOutput(
                output_name="evaluation",
                source="/opt/ml/processing/evaluation",
                destination=parameters['evaluation_output_path']
            )
        ],
        property_files=[evaluation_report],
        code="scripts/evaluate_model.py",
        job_arguments=[
            "--model-variant", parameters['model_variant'],
            "--dataset-name", parameters['dataset_name']
        ]
    )
    
    return evaluation_step, evaluation_report

def create_performance_condition(evaluation_step, evaluation_report, parameters):
    """Create condition step for performance threshold checking"""
    
    # Condition: mAP@0.5 >= threshold
    performance_condition = ConditionGreaterThanOrEqualTo(
        left=JsonGet(
            step_name=evaluation_step.name,
            property_file=evaluation_report,
            json_path="metrics.mAP_50"
        ),
        right=parameters['performance_threshold']
    )
    
    return performance_condition

def create_model_registration_step(parameters, training_step, evaluation_step):
    """Create model registration step for Model Registry"""
    
    # Model registration
    register_model_step = RegisterModel(
        name="RegisterYOLOv11Model",
        estimator=training_step.estimator,
        model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
        content_types=["application/json", "image/jpeg", "image/png"],
        response_types=["application/json"],
        inference_instances=["ml.t2.medium", "ml.m5.large", "ml.g4dn.xlarge"],
        transform_instances=["ml.m5.large", "ml.g4dn.xlarge"],
        model_package_group_name=MODEL_PACKAGE_GROUP_NAME,
        approval_status="PendingManualApproval",
        model_metrics=[
            {
                "Name": "mAP@0.5",
                "Value": JsonGet(
                    step_name=evaluation_step.name,
                    property_file="EvaluationReport",
                    json_path="metrics.mAP_50"
                )
            },
            {
                "Name": "mAP@0.5:0.95",
                "Value": JsonGet(
                    step_name=evaluation_step.name,
                    property_file="EvaluationReport",
                    json_path="metrics.mAP_50_95"
                )
            }
        ]
    )
    
    return register_model_step

def create_model_creation_step(parameters, register_model_step):
    """Create model creation step for deployment"""
    
    # Model creation for deployment
    model = Model(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-inference:latest",
        model_data=register_model_step.properties.ModelArtifacts.S3ModelArtifacts,
        role=ROLE_ARN,
        sagemaker_session=pipeline_session
    )
    
    create_model_step = CreateModelStep(
        name="CreateYOLOv11Model",
        model=model
    )
    
    return create_model_step

def create_serverless_endpoint_step(parameters, create_model_step):
    """Create serverless endpoint deployment step"""
    
    # Serverless inference configuration
    serverless_config = ServerlessInferenceConfig(
        memory_size_in_mb=4096,
        max_concurrency=20,
        provisioned_concurrency=1  # Keep warm for faster response
    )
    
    # Endpoint deployment processor
    deployment_processor = Processor(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/sagemaker-deployment:latest",
        role=ROLE_ARN,
        instance_count=1,
        instance_type="ml.t3.medium",
        sagemaker_session=pipeline_session
    )
    
    # Deployment step
    deployment_step = ProcessingStep(
        name="DeployServerlessEndpoint",
        processor=deployment_processor,
        inputs=[
            ProcessingInput(
                source=create_model_step.properties.ModelName,
                destination="/opt/ml/processing/model",
                input_name="model_name"
            )
        ],
        outputs=[
            ProcessingOutput(
                output_name="deployment_status",
                source="/opt/ml/processing/output",
                destination=f"s3://{BUCKET_NAME}/pipeline-artifacts/deployment"
            )
        ],
        code="scripts/deploy_serverless_endpoint.py",
        job_arguments=[
            "--endpoint-name", parameters['endpoint_name'],
            "--memory-size", "4096",
            "--max-concurrency", "20",
            "--provisioned-concurrency", "1"
        ]
    )
    
    return deployment_step

print("✅ Pipeline step definition functions created!")

## 5. Pipeline Assembly and Creation

Assemble all pipeline steps with proper conditional logic and dependencies.

In [None]:
def create_complete_pipeline(parameters):
    """Create the complete SageMaker Pipeline with all steps and conditions"""
    
    print("🔧 Creating pipeline steps...")
    
    # Step 1: Data Validation
    validation_step = create_data_validation_step(parameters)
    print("   ✅ Data validation step created")
    
    # Step 2: Model Training
    training_step = create_training_step(parameters, validation_step)
    print("   ✅ Training step created")
    
    # Step 3: Model Evaluation
    evaluation_step, evaluation_report = create_evaluation_step(parameters, training_step)
    print("   ✅ Evaluation step created")
    
    # Step 4: Performance Condition Check
    performance_condition = create_performance_condition(evaluation_step, evaluation_report, parameters)
    print("   ✅ Performance condition created")
    
    # Step 5: Model Registration (conditional on performance)
    register_model_step = create_model_registration_step(parameters, training_step, evaluation_step)
    print("   ✅ Model registration step created")
    
    # Step 6: Model Creation for Deployment
    create_model_step = create_model_creation_step(parameters, register_model_step)
    print("   ✅ Model creation step created")
    
    # Step 7: Serverless Endpoint Deployment
    deployment_step = create_serverless_endpoint_step(parameters, create_model_step)
    print("   ✅ Serverless deployment step created")
    
    # Create conditional step for performance-based registration
    performance_condition_step = ConditionStep(
        name="CheckPerformanceThreshold",
        conditions=[performance_condition],
        if_steps=[register_model_step],
        else_steps=[]  # No registration if performance is below threshold
    )
    
    # Create approval condition for deployment
    # Note: In practice, this would check Model Registry approval status
    # For this demo, we'll use auto-deploy configuration
    auto_deploy_condition = ConditionEquals(
        left="true",  # This would be replaced with actual approval status check
        right="true"
    )
    
    deployment_condition_step = ConditionStep(
        name="CheckDeploymentApproval",
        conditions=[auto_deploy_condition],
        if_steps=[create_model_step, deployment_step],
        else_steps=[]
    )
    
    # Assemble pipeline steps
    pipeline_steps = [
        validation_step,
        training_step,
        evaluation_step,
        performance_condition_step,
        deployment_condition_step
    ]
    
    # Create the pipeline
    pipeline = Pipeline(
        name=PIPELINE_NAME,
        parameters=list(parameters.values()),
        steps=pipeline_steps,
        sagemaker_session=pipeline_session
    )
    
    print("   ✅ Pipeline assembly completed")
    
    return pipeline

# Create the complete pipeline
if pipeline_parameters:
    print("🚀 Creating YOLOv11 SageMaker Pipeline...")
    yolov11_pipeline = create_complete_pipeline(pipeline_parameters)
    
    print(f"\n✅ Pipeline '{PIPELINE_NAME}' created successfully!")
    print(f"\n📋 Pipeline Summary:")
    print(f"   Name: {yolov11_pipeline.name}")
    print(f"   Steps: {len(yolov11_pipeline.steps)}")
    print(f"   Parameters: {len(yolov11_pipeline.parameters)}")
    
    # Display pipeline structure
    print(f"\n🔗 Pipeline Flow:")
    print("   1. DataValidation (ProcessingStep)")
    print("   2. YOLOv11Training (TrainingStep)")
    print("   3. ModelEvaluation (ProcessingStep)")
    print("   4. CheckPerformanceThreshold (ConditionStep)")
    print("      ├─ IF mAP@0.5 >= threshold: RegisterYOLOv11Model")
    print("      └─ ELSE: Skip registration")
    print("   5. CheckDeploymentApproval (ConditionStep)")
    print("      ├─ IF approved: CreateYOLOv11Model → DeployServerlessEndpoint")
    print("      └─ ELSE: Skip deployment")
    
else:
    print("❌ Cannot create pipeline - parameters not available")


## 6. Pipeline Execution with MLflow Integration

Execute the pipeline with comprehensive MLflow tracking and monitoring.

In [None]:
def execute_pipeline_with_mlflow(pipeline, parameters, config):
    """Execute pipeline with comprehensive MLflow tracking"""
    
    # Start MLflow run for pipeline execution
    run_name = f"pipeline-{config['dataset_name']}-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
    
    with mlflow.start_run(run_name=run_name) as run:
        print(f"🚀 Starting pipeline execution with MLflow tracking")
        print(f"   MLflow Run ID: {run.info.run_id}")
        print(f"   Run Name: {run_name}")
        
        # Log pipeline parameters to MLflow
        mlflow.log_params({
            "pipeline_name": pipeline.name,
            "dataset_name": config['dataset_name'],
            "dataset_path": config['dataset_path'],
            "model_variant": config['model_variant'],
            "image_size": config['image_size'],
            "batch_size": config['batch_size'],
            "epochs": config['epochs'],
            "learning_rate": config['learning_rate'],
            "instance_type": config['instance_type'],
            "use_spot": config['use_spot'],
            "performance_threshold": config['performance_threshold'],
            "auto_deploy": config['auto_deploy']
        })
        
        # Set MLflow tags
        mlflow.set_tags({
            "pipeline_type": "sagemaker_pipeline",
            "model_type": "YOLOv11",
            "task_type": "object_detection",
            "execution_type": "automated_pipeline",
            "dataset": config['dataset_name'],
            "infrastructure": "sagemaker"
        })
        
        try:
            # Create or update the pipeline
            print("\n📝 Creating/updating pipeline definition...")
            pipeline.create(role_arn=ROLE_ARN)
            print("   ✅ Pipeline definition created/updated")
            
            # Start pipeline execution
            print("\n🎬 Starting pipeline execution...")
            execution = pipeline.start(
                parameters={
                    param_name: param.default_value 
                    for param_name, param in parameters.items()
                }
            )
            
            execution_arn = execution.arn
            print(f"   ✅ Pipeline execution started")
            print(f"   Execution ARN: {execution_arn}")
            
            # Log execution details to MLflow
            mlflow.log_param("pipeline_execution_arn", execution_arn)
            mlflow.log_param("execution_start_time", datetime.now().isoformat())
            mlflow.set_tag("pipeline_status", "running")
            
            return execution, run.info.run_id
            
        except Exception as e:
            print(f"❌ Pipeline execution failed: {str(e)}")
            
            # Log error to MLflow
            mlflow.set_tag("pipeline_status", "failed")
            mlflow.set_tag("error_message", str(e))
            
            return None, run.info.run_id

# Execute the pipeline
if 'yolov11_pipeline' in locals() and yolov11_pipeline and current_config:
    print("🎯 Ready to execute YOLOv11 Pipeline")
    print("\n⚠️  Note: This will start a complete pipeline execution which may take 1-2 hours")
    print("   and incur AWS costs for training instances and storage.")
    
    # Confirmation widget
    execute_button = widgets.Button(
        description="🚀 Execute Pipeline",
        button_style='success',
        layout=widgets.Layout(width='200px', height='40px')
    )
    
    output_widget = widgets.Output()
    
    def on_execute_clicked(b):
        with output_widget:
            output_widget.clear_output()
            print("🚀 Executing pipeline...")
            
            global pipeline_execution, mlflow_run_id
            pipeline_execution, mlflow_run_id = execute_pipeline_with_mlflow(
                yolov11_pipeline, 
                pipeline_parameters, 
                current_config
            )
            
            if pipeline_execution:
                print(f"\n✅ Pipeline execution started successfully!")
                print(f"   You can monitor progress in the next section.")
            else:
                print(f"\n❌ Pipeline execution failed. Check the error messages above.")
    
    execute_button.on_click(on_execute_clicked)
    
    display(widgets.VBox([
        widgets.HTML("<h3>🚀 Pipeline Execution</h3>"),
        widgets.HTML("<p>Click the button below to start the complete YOLOv11 training pipeline:</p>"),
        execute_button,
        output_widget
    ]))
    
else:
    print("❌ Pipeline not ready for execution")
    print("   Please ensure the pipeline was created successfully in the previous steps.")

## 7. Real-time Pipeline Monitoring

Monitor pipeline execution with real-time status updates and step-by-step progress tracking.

In [None]:
def create_pipeline_monitor():
    """Create interactive pipeline monitoring interface"""
    
    # Status display widgets
    status_html = widgets.HTML(value="<h3>📊 Pipeline Status: Not Started</h3>")
    progress_bar = widgets.IntProgress(
        value=0,
        min=0,
        max=100,
        description='Progress:',
        bar_style='info',
        style={'bar_color': '#1f77b4'},
        layout=widgets.Layout(width='500px')
    )
    
    steps_output = widgets.Output()
    metrics_output = widgets.Output()
    
    # Control buttons
    refresh_button = widgets.Button(
        description="🔄 Refresh Status",
        button_style='info',
        layout=widgets.Layout(width='150px')
    )
    
    stop_button = widgets.Button(
        description="⏹️ Stop Pipeline",
        button_style='danger',
        layout=widgets.Layout(width='150px')
    )
    
    def update_pipeline_status():
        """Update pipeline status and progress"""
        if 'pipeline_execution' not in globals() or not pipeline_execution:
            status_html.value = "<h3>📊 Pipeline Status: No Active Execution</h3>"
            return
        
        try:
            # Get execution status
            execution_status = pipeline_execution.describe()
            
            # Update status display
            status = execution_status['PipelineExecutionStatus']
            status_color = {
                'Executing': 'orange',
                'Succeeded': 'green',
                'Failed': 'red',
                'Stopped': 'gray'
            }.get(status, 'blue')
            
            status_html.value = f"<h3 style='color: {status_color}'>📊 Pipeline Status: {status}</h3>"
            
            # Update progress bar
            if status == 'Executing':
                progress_bar.bar_style = 'info'
                progress_bar.value = 50  # Approximate progress
            elif status == 'Succeeded':
                progress_bar.bar_style = 'success'
                progress_bar.value = 100
            elif status == 'Failed':
                progress_bar.bar_style = 'danger'
                progress_bar.value = 0
            
            # Update steps status
            with steps_output:
                steps_output.clear_output()
                print("🔗 Pipeline Steps Status:")
                print("=" * 50)
                
                # Get step executions
                steps = pipeline_execution.list_steps()
                
                for i, step in enumerate(steps):
                    step_name = step['StepName']
                    step_status = step['StepStatus']
                    
                    status_icon = {
                        'Executing': '🔄',
                        'Succeeded': '✅',
                        'Failed': '❌',
                        'Stopped': '⏹️'
                    }.get(step_status, '⏳')
                    
                    print(f"{i+1}. {status_icon} {step_name}: {step_status}")
                    
                    # Show additional details for active/failed steps
                    if step_status in ['Executing', 'Failed']:
                        if 'FailureReason' in step:
                            print(f"   ⚠️  Reason: {step['FailureReason']}")
                        if 'StartTime' in step:
                            elapsed = datetime.now() - step['StartTime'].replace(tzinfo=None)
                            print(f"   ⏱️  Elapsed: {str(elapsed).split('.')[0]}")
            
            # Update MLflow with current status
            if 'mlflow_run_id' in globals() and mlflow_run_id:
                with mlflow.start_run(run_id=mlflow_run_id):
                    mlflow.set_tag("pipeline_status", status.lower())
                    mlflow.log_metric("pipeline_progress", progress_bar.value)
                    
                    if status == 'Succeeded':
                        mlflow.log_param("execution_end_time", datetime.now().isoformat())
                        mlflow.set_tag("pipeline_completed", "true")
            
            # Update metrics if available
            with metrics_output:
                metrics_output.clear_output()
                if status == 'Succeeded':
                    print("📈 Pipeline Metrics:")
                    print("=" * 30)
                    print("✅ Pipeline completed successfully!")
                    print("📊 Check Model Registry for registered model")
                    print("🚀 Check endpoints for deployed model (if auto-deploy enabled)")
                elif status == 'Failed':
                    print("❌ Pipeline Execution Failed")
                    print("=" * 30)
                    print("Check the steps status above for failure details.")
                    print("Review CloudWatch logs for detailed error information.")
                
        except Exception as e:
            status_html.value = f"<h3 style='color: red'>❌ Error monitoring pipeline: {str(e)}</h3>"
    
    def on_refresh_clicked(b):
        update_pipeline_status()
    
    def on_stop_clicked(b):
        if 'pipeline_execution' in globals() and pipeline_execution:
            try:
                pipeline_execution.stop()
                status_html.value = "<h3 style='color: orange'>⏹️ Pipeline Stop Requested</h3>"
            except Exception as e:
                status_html.value = f"<h3 style='color: red'>❌ Error stopping pipeline: {str(e)}</h3>"
    
    refresh_button.on_click(on_refresh_clicked)
    stop_button.on_click(on_stop_clicked)
    
    # Initial status update
    update_pipeline_status()
    
    return widgets.VBox([
        status_html,
        progress_bar,
        widgets.HBox([refresh_button, stop_button]),
        widgets.HTML("<h4>📋 Step Details:</h4>"),
        steps_output,
        widgets.HTML("<h4>📊 Execution Metrics:</h4>"),
        metrics_output
    ])

# Create and display monitoring interface
print("📊 Pipeline Monitoring Interface")
print("=" * 40)

monitoring_interface = create_pipeline_monitor()
display(monitoring_interface)

print("\n💡 Use the 'Refresh Status' button to update the pipeline status.")
print("   The interface will show real-time progress of each pipeline step.")

## 8. Pipeline Results Analysis

Analyze pipeline execution results, model performance, and deployment status.

In [None]:
def analyze_pipeline_results():
    """Analyze and display comprehensive pipeline results"""
    
    if 'pipeline_execution' not in globals() or not pipeline_execution:
        print("❌ No pipeline execution to analyze")
        return
    
    try:
        # Get execution details
        execution_status = pipeline_execution.describe()
        steps = pipeline_execution.list_steps()
        
        print("📊 PIPELINE EXECUTION ANALYSIS")
        print("=" * 60)
        
        # Overall execution summary
        status = execution_status['PipelineExecutionStatus']
        start_time = execution_status.get('CreationTime')
        end_time = execution_status.get('LastModifiedTime')
        
        print(f"\n🎯 Execution Summary:")
        print(f"   Status: {status}")
        print(f"   Start Time: {start_time}")
        print(f"   End Time: {end_time}")
        
        if start_time and end_time:
            duration = end_time - start_time
            print(f"   Duration: {str(duration).split('.')[0]}")
        
        # Step-by-step analysis
        print(f"\n🔗 Step Analysis:")
        successful_steps = 0
        failed_steps = 0
        
        for step in steps:
            step_name = step['StepName']
            step_status = step['StepStatus']
            
            if step_status == 'Succeeded':
                successful_steps += 1
                print(f"   ✅ {step_name}: {step_status}")
            elif step_status == 'Failed':
                failed_steps += 1
                print(f"   ❌ {step_name}: {step_status}")
                if 'FailureReason' in step:
                    print(f"      Reason: {step['FailureReason']}")
            else:
                print(f"   🔄 {step_name}: {step_status}")
        
        print(f"\n📈 Step Summary: {successful_steps} succeeded, {failed_steps} failed")
        
        # Model Registry analysis
        if status == 'Succeeded':
            print(f"\n🏆 SUCCESS ANALYSIS:")
            
            # Check Model Registry for new models
            try:
                models = sagemaker_client.list_model_packages(
                    ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
                    SortBy='CreationTime',
                    SortOrder='Descending',
                    MaxResults=5
                )
                
                recent_models = models.get('ModelPackageSummaryList', [])
                if recent_models:
                    latest_model = recent_models[0]
                    print(f"   📦 Latest Model Registered:")
                    print(f"      ARN: {latest_model['ModelPackageArn']}")
                    print(f"      Status: {latest_model['ModelPackageStatus']}")
                    print(f"      Approval: {latest_model['ModelApprovalStatus']}")
                    print(f"      Created: {latest_model['CreationTime']}")
                
            except Exception as e:
                print(f"   ⚠️  Could not retrieve model registry info: {e}")
            
            # Check for deployed endpoints
            try:
                endpoints = sagemaker_client.list_endpoints(
                    SortBy='CreationTime',
                    SortOrder='Descending',
                    MaxResults=10
                )
                
                recent_endpoints = [
                    ep for ep in endpoints.get('Endpoints', [])
                    if current_config['endpoint_name'] in ep['EndpointName']
                ]
                
                if recent_endpoints:
                    endpoint = recent_endpoints[0]
                    print(f"   🚀 Endpoint Deployed:")
                    print(f"      Name: {endpoint['EndpointName']}")
                    print(f"      Status: {endpoint['EndpointStatus']}")
                    print(f"      Created: {endpoint['CreationTime']}")
                else:
                    print(f"   ℹ️  No matching endpoints found (auto-deploy may be disabled)")
                
            except Exception as e:
                print(f"   ⚠️  Could not retrieve endpoint info: {e}")
        
        elif status == 'Failed':
            print(f"\n❌ FAILURE ANALYSIS:")
            print(f"   Pipeline execution failed. Common causes:")
            print(f"   • Data validation issues")
            print(f"   • Training job failures (resource limits, code errors)")
            print(f"   • Model performance below threshold")
            print(f"   • Infrastructure or permission issues")
            print(f"\n   💡 Check CloudWatch logs for detailed error information.")
        
        # MLflow integration summary
        if 'mlflow_run_id' in globals() and mlflow_run_id:
            print(f"\n📊 MLflow Integration:")
            print(f"   Run ID: {mlflow_run_id}")
            print(f"   Experiment: {experiment_name}")
            print(f"   All pipeline parameters and metrics logged to MLflow")
        
        # Recommendations
        print(f"\n💡 RECOMMENDATIONS:")
        if status == 'Succeeded':
            print(f"   ✅ Pipeline completed successfully!")
            print(f"   • Review model performance in Model Registry")
            print(f"   • Consider approving model for production deployment")
            print(f"   • Set up monitoring for deployed endpoints")
            print(f"   • Compare results with previous pipeline runs")
        else:
            print(f"   🔧 Pipeline needs attention:")
            print(f"   • Review failed step details above")
            print(f"   • Check CloudWatch logs for detailed errors")
            print(f"   • Verify dataset quality and format")
            print(f"   • Consider adjusting hyperparameters")
            print(f"   • Ensure sufficient compute resources")
        
    except Exception as e:
        print(f"❌ Error analyzing pipeline results: {str(e)}")

# Create analysis button
analysis_button = widgets.Button(
    description="📊 Analyze Results",
    button_style='info',
    layout=widgets.Layout(width='200px', height='40px')
)

analysis_output = widgets.Output()

def on_analysis_clicked(b):
    with analysis_output:
        analysis_output.clear_output()
        analyze_pipeline_results()

analysis_button.on_click(on_analysis_clicked)

display(widgets.VBox([
    widgets.HTML("<h3>📊 Pipeline Results Analysis</h3>"),
    widgets.HTML("<p>Click below to analyze the pipeline execution results:</p>"),
    analysis_button,
    analysis_output
]))

## 9. Model Management and Approval Workflow

Manage models in the Model Registry and handle approval workflows for production deployment.

In [None]:
def create_model_management_interface():
    """Create interface for model management and approval"""
    
    # Model list display
    models_output = widgets.Output()
    
    # Control buttons
    refresh_models_button = widgets.Button(
        description="🔄 Refresh Models",
        button_style='info',
        layout=widgets.Layout(width='150px')
    )
    
    approve_button = widgets.Button(
        description="✅ Approve Latest",
        button_style='success',
        layout=widgets.Layout(width='150px')
    )
    
    deploy_button = widgets.Button(
        description="🚀 Deploy Approved",
        button_style='warning',
        layout=widgets.Layout(width='150px')
    )
    
    def refresh_models():
        """Refresh and display models from Model Registry"""
        with models_output:
            models_output.clear_output()
            
            try:
                # Get models from registry
                response = sagemaker_client.list_model_packages(
                    ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
                    SortBy='CreationTime',
                    SortOrder='Descending',
                    MaxResults=10
                )
                
                models = response.get('ModelPackageSummaryList', [])
                
                if not models:
                    print("📦 No models found in Model Registry")
                    print("   Run a successful pipeline to register models.")
                    return
                
                print(f"📦 Models in Registry ({len(models)} found):")
                print("=" * 80)
                
                for i, model in enumerate(models):
                    model_arn = model['ModelPackageArn']
                    model_id = model_arn.split('/')[-1]
                    status = model['ModelPackageStatus']
                    approval = model['ModelApprovalStatus']
                    created = model['CreationTime']
                    
                    # Status icons
                    status_icon = "✅" if status == "Completed" else "🔄"
                    approval_icon = {
                        'Approved': '✅',
                        'PendingManualApproval': '⏳',
                        'Rejected': '❌'
                    }.get(approval, '❓')
                    
                    print(f"{i+1}. {status_icon} Model: {model_id}")
                    print(f"   Status: {status} | Approval: {approval_icon} {approval}")
                    print(f"   Created: {created.strftime('%Y-%m-%d %H:%M:%S')}")
                    
                    # Get additional details for latest model
                    if i == 0:
                        try:
                            details = sagemaker_client.describe_model_package(
                                ModelPackageName=model_arn
                            )
                            
                            if 'ModelMetrics' in details:
                                print(f"   📊 Metrics: Available")
                            
                            if 'Tags' in details:
                                pipeline_tag = next(
                                    (tag['Value'] for tag in details['Tags'] 
                                     if tag['Key'] == 'TrainingJob'), 
                                    'Unknown'
                                )
                                print(f"   🏷️  Source: {pipeline_tag}")
                                
                        except Exception as e:
                            print(f"   ⚠️  Could not get details: {e}")
                    
                    print("-" * 80)
                
                # Store latest model for actions
                global latest_model_arn
                latest_model_arn = models[0]['ModelPackageArn'] if models else None
                
            except Exception as e:
                print(f"❌ Error retrieving models: {str(e)}")
    
    def approve_latest_model():
        """Approve the latest model for deployment"""
        if 'latest_model_arn' not in globals() or not latest_model_arn:
            print("❌ No model available for approval")
            return
        
        try:
            sagemaker_client.update_model_package(
                ModelPackageArn=latest_model_arn,
                ModelApprovalStatus='Approved',
                ApprovalDescription='Approved via ML Engineer Pipeline notebook'
            )
            
            print(f"✅ Model approved successfully!")
            print(f"   Model ARN: {latest_model_arn.split('/')[-1]}")
            
            # Update MLflow if available
            if 'mlflow_run_id' in globals() and mlflow_run_id:
                with mlflow.start_run(run_id=mlflow_run_id):
                    mlflow.set_tag("model_approved", "true")
                    mlflow.log_param("approved_model_arn", latest_model_arn)
            
            # Refresh display
            refresh_models()
            
        except Exception as e:
            print(f"❌ Error approving model: {str(e)}")
    
    def deploy_approved_model():
        """Deploy the latest approved model to a serverless endpoint"""
        try:
            # Find latest approved model
            response = sagemaker_client.list_model_packages(
                ModelPackageGroupName=MODEL_PACKAGE_GROUP_NAME,
                ModelApprovalStatus='Approved',
                SortBy='CreationTime',
                SortOrder='Descending',
                MaxResults=1
            )
            
            approved_models = response.get('ModelPackageSummaryList', [])
            if not approved_models:
                print("❌ No approved models found for deployment")
                print("   Please approve a model first.")
                return
            
            approved_model = approved_models[0]
            model_arn = approved_model['ModelPackageArn']
            
            print(f"🚀 Deploying approved model...")
            print(f"   Model: {model_arn.split('/')[-1]}")
            
            # Create endpoint name
            endpoint_name = f"yolov11-serverless-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"
            
            # Get model details
            model_details = sagemaker_client.describe_model_package(
                ModelPackageName=model_arn
            )
            
            # Create model
            model_name = f"yolov11-model-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
            
            sagemaker_client.create_model(
                ModelName=model_name,
                Containers=model_details['InferenceSpecification']['Containers'],
                ExecutionRoleArn=ROLE_ARN
            )
            
            print(f"   ✅ Model created: {model_name}")
            
            # Create endpoint configuration
            config_name = f"yolov11-serverless-config-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
            
            sagemaker_client.create_endpoint_configuration(
                EndpointConfigName=config_name,
                ProductionVariants=[
                    {
                        'VariantName': 'primary',
                        'ModelName': model_name,
                        'ServerlessConfig': {
                            'MemorySizeInMB': 4096,
                            'MaxConcurrency': 20,
                            'ProvisionedConcurrency': 1
                        }
                    }
                ]
            )
            
            print(f"   ✅ Endpoint configuration created: {config_name}")
            
            # Create endpoint
            sagemaker_client.create_endpoint(
                EndpointName=endpoint_name,
                EndpointConfigName=config_name
            )
            
            print(f"   ✅ Serverless endpoint deployment started: {endpoint_name}")
            print(f"   ⏳ Endpoint will be ready in 5-10 minutes")
            
            # Update MLflow
            if 'mlflow_run_id' in globals() and mlflow_run_id:
                with mlflow.start_run(run_id=mlflow_run_id):
                    mlflow.set_tag("model_deployed", "true")
                    mlflow.log_param("deployed_endpoint_name", endpoint_name)
                    mlflow.log_param("deployed_model_arn", model_arn)
            
        except Exception as e:
            print(f"❌ Error deploying model: {str(e)}")
    
    # Button event handlers
    def on_refresh_clicked(b):
        refresh_models()
    
    def on_approve_clicked(b):
        with models_output:
            approve_latest_model()
    
    def on_deploy_clicked(b):
        with models_output:
            deploy_approved_model()
    
    refresh_models_button.on_click(on_refresh_clicked)
    approve_button.on_click(on_approve_clicked)
    deploy_button.on_click(on_deploy_clicked)
    
    # Initial load
    refresh_models()
    
    return widgets.VBox([
        widgets.HTML("<h3>📦 Model Registry Management</h3>"),
        widgets.HBox([refresh_models_button, approve_button, deploy_button]),
        models_output
    ])

# Display model management interface
model_management_interface = create_model_management_interface()
display(model_management_interface)

## 10. Troubleshooting and Best Practices

Common issues and solutions for SageMaker Pipeline execution.

In [None]:
def display_troubleshooting_guide():
    """Display comprehensive troubleshooting guide"""
    
    troubleshooting_html = """
    <div style="background-color: #f8f9fa; padding: 20px; border-radius: 10px; border-left: 5px solid #007bff;">
        <h3>🔧 Troubleshooting Guide</h3>
        
        <h4>📊 Common Pipeline Issues</h4>
        <ul>
            <li><strong>Data Validation Failures:</strong>
                <ul>
                    <li>Verify dataset structure matches YOLOv11 requirements</li>
                    <li>Check that validation/ directory exists (not val/)</li>
                    <li>Ensure image and label counts match</li>
                    <li>Validate data.yaml file format</li>
                </ul>
            </li>
            
            <li><strong>Training Step Failures:</strong>
                <ul>
                    <li>Check instance type availability in your region</li>
                    <li>Verify ECR container image exists and is accessible</li>
                    <li>Review hyperparameters for reasonable values</li>
                    <li>Check CloudWatch logs for detailed error messages</li>
                    <li>Ensure sufficient disk space for dataset size</li>
                </ul>
            </li>
            
            <li><strong>Performance Threshold Issues:</strong>
                <ul>
                    <li>Lower performance threshold if models consistently fail</li>
                    <li>Increase training epochs for better convergence</li>
                    <li>Adjust learning rate and batch size</li>
                    <li>Verify dataset quality and annotation accuracy</li>
                </ul>
            </li>
            
            <li><strong>Deployment Failures:</strong>
                <ul>
                    <li>Verify inference container image exists</li>
                    <li>Check endpoint name uniqueness</li>
                    <li>Ensure model approval status is correct</li>
                    <li>Review serverless configuration limits</li>
                </ul>
            </li>
        </ul>
        
        <h4>🚀 Performance Optimization</h4>
        <ul>
            <li><strong>Training Optimization:</strong>
                <ul>
                    <li>Use GPU instances (ml.g4dn.xlarge or larger) for training</li>
                    <li>Enable spot instances for cost savings</li>
                    <li>Optimize batch size based on GPU memory</li>
                    <li>Use mixed precision training for faster convergence</li>
                </ul>
            </li>
            
            <li><strong>Pipeline Efficiency:</strong>
                <ul>
                    <li>Cache pipeline steps when possible</li>
                    <li>Use parallel execution for independent steps</li>
                    <li>Optimize data preprocessing steps</li>
                    <li>Monitor step execution times</li>
                </ul>
            </li>
            
            <li><strong>Cost Optimization:</strong>
                <ul>
                    <li>Use spot instances for training (up to 70% savings)</li>
                    <li>Right-size instance types for workload</li>
                    <li>Use serverless endpoints for variable traffic</li>
                    <li>Clean up unused resources regularly</li>
                </ul>
            </li>
        </ul>
        
        <h4>📋 Best Practices</h4>
        <ul>
            <li><strong>Pipeline Design:</strong>
                <ul>
                    <li>Use parameterized pipelines for flexibility</li>
                    <li>Implement proper error handling and retries</li>
                    <li>Add comprehensive logging and monitoring</li>
                    <li>Use conditional steps for complex workflows</li>
                </ul>
            </li>
            
            <li><strong>Model Management:</strong>
                <ul>
                    <li>Always use Model Registry for model versioning</li>
                    <li>Implement approval workflows for production</li>
                    <li>Tag models with relevant metadata</li>
                    <li>Track model lineage and performance</li>
                </ul>
            </li>
            
            <li><strong>Security:</strong>
                <ul>
                    <li>Use least privilege IAM roles</li>
                    <li>Encrypt data at rest and in transit</li>
                    <li>Use VPC endpoints for secure communication</li>
                    <li>Regularly audit access and permissions</li>
                </ul>
            </li>
        </ul>
        
        <h4>🔍 Debugging Resources</h4>
        <ul>
            <li><strong>CloudWatch Logs:</strong> /aws/sagemaker/TrainingJobs and /aws/sagemaker/ProcessingJobs</li>
            <li><strong>SageMaker Console:</strong> Pipeline executions and step details</li>
            <li><strong>MLflow UI:</strong> Experiment tracking and model comparison</li>
            <li><strong>Model Registry:</strong> Model versions and approval status</li>
        </ul>
    </div>
    """
    
    display(HTML(troubleshooting_html))

# Display troubleshooting guide
display_troubleshooting_guide()

## 11. Summary and Next Steps

This comprehensive ML Engineer SageMaker Pipeline notebook has successfully implemented a complete MLOps workflow for YOLOv11 object detection models.

### 🎯 What We've Accomplished

1. **Complete Pipeline Architecture**: Built a full SageMaker Pipeline with:
   - Data validation and preprocessing
   - YOLOv11 model training with GPU optimization
   - Automated model evaluation and performance checking
   - Conditional model registration based on performance thresholds
   - Automated serverless endpoint deployment for approved models

2. **Advanced MLOps Features**:
   - **Parameterized Pipelines**: Flexible configuration for different datasets and hyperparameters
   - **Conditional Logic**: Performance-based decision making for model registration and deployment
   - **Model Registry Integration**: Centralized model management with approval workflows
   - **Serverless Endpoints**: Cost-effective inference with auto-scaling capabilities
   - **Comprehensive Monitoring**: Real-time pipeline execution tracking

3. **Production-Ready Capabilities**:
   - **Error Recovery**: Robust error handling and retry mechanisms
   - **Cost Optimization**: Spot instances and serverless inference
   - **Security**: IAM role-based access control
   - **Observability**: MLflow integration for experiment tracking and lineage

4. **User Experience Enhancements**:
   - **Interactive Configuration**: Widget-based parameter configuration
   - **Real-time Monitoring**: Live pipeline status updates
   - **Model Management**: Easy model approval and deployment workflows
   - **Comprehensive Analysis**: Detailed pipeline results and recommendations

### 🚀 Key Advantages Over Individual Training Jobs

| Aspect | Individual Jobs | SageMaker Pipeline |
|--------|----------------|--------------------|
| **Automation** | Manual execution | Fully automated workflow |
| **Reproducibility** | Manual tracking | Built-in versioning and lineage |
| **Error Handling** | Manual intervention | Automatic retry and recovery |
| **Deployment** | Manual process | Conditional automated deployment |
| **Monitoring** | Basic job status | Comprehensive step-by-step tracking |
| **Governance** | Limited | Full approval workflows and audit trails |
| **Scalability** | Single job focus | End-to-end workflow orchestration |

### 📈 Business Impact

- **Reduced Time-to-Market**: Automated workflows eliminate manual steps
- **Improved Quality**: Consistent validation and performance thresholds
- **Cost Efficiency**: Spot instances and serverless inference reduce costs
- **Risk Mitigation**: Approval workflows prevent poor models from reaching production
- **Operational Excellence**: Comprehensive monitoring and alerting

### 🔮 Next Steps and Enhancements

1. **Advanced Monitoring**:
   - Implement data drift detection
   - Set up model performance monitoring in production
   - Create automated retraining triggers

2. **Multi-Model Support**:
   - Extend pipeline for different YOLO variants
   - Support for ensemble models
   - A/B testing framework for model comparison

3. **Advanced Deployment Patterns**:
   - Blue/green deployment strategies
   - Canary deployments with traffic splitting
   - Multi-region deployment automation

4. **Integration Enhancements**:
   - CI/CD integration with Git workflows
   - Integration with external monitoring tools
   - Custom metrics and alerting

5. **Performance Optimization**:
   - Pipeline caching for faster iterations
   - Distributed training for larger datasets
   - Advanced hyperparameter optimization

### 💡 Key Takeaways

- **Pipeline-First Approach**: Always design ML workflows as pipelines for production readiness
- **Automation is Key**: Reduce manual intervention through comprehensive automation
- **Governance Matters**: Implement proper approval workflows and audit trails
- **Monitor Everything**: Comprehensive monitoring enables proactive issue resolution
- **Cost Consciousness**: Use spot instances and serverless where appropriate

This notebook demonstrates the transformation from individual training job management to complete MLOps pipeline orchestration, providing a foundation for production-ready machine learning workflows with YOLOv11 object detection models.

---

**🎉 Congratulations!** You've successfully implemented a comprehensive SageMaker Pipeline for YOLOv11 object detection with full MLOps capabilities, automated deployment, and production-ready governance features.