# ML Engineer Core Workflow with MLFlow and Model Registry

This notebook provides enhanced functionality for ML Engineers to execute and manage YOLOv11 training pipelines with MLFlow experiment tracking and SageMaker Model Registry integration.

## Workflow Overview

1. **Pipeline Configuration**: Set up YOLOv11 training pipeline parameters
2. **Pipeline Execution**: Execute the training pipeline with MLFlow tracking
3. **Pipeline Monitoring**: Monitor training progress and results
4. **Model Registration**: Register trained models in SageMaker Model Registry
5. **Model Management**: Manage model versions and approval workflows

## Prerequisites

- AWS account with appropriate permissions
- AWS CLI configured with "ab" profile
- SageMaker Studio access with ML Engineer role
- Access to the drone imagery dataset in S3 bucket: `lucaskle-ab3-project-pv`
- Labeled data in YOLOv11 format
- SageMaker managed MLFlow tracking server

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
# Install required packages
!pip install --quiet mlflow>=3.0.0 requests-auth-aws-sigv4>=0.7 boto3>=1.28.0 sagemaker>=2.190.0 pandas>=2.0.0 matplotlib>=3.7.0 numpy>=1.24.0 PyYAML>=6.0

print("✅ Required packages installed successfully!")

In [None]:
import os
import boto3
import sagemaker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import json
import time
from IPython.display import display, HTML
import mlflow
import mlflow.sagemaker

# Correct imports for SageMaker Model Registry
from sagemaker import ModelPackage
from sagemaker.model import Model
from sagemaker.predictor import Predictor

# Set up AWS session with "ab" profile
session = boto3.Session(profile_name='ab')
sagemaker_session = sagemaker.Session(boto_session=session)
sagemaker_client = session.client('sagemaker')
region = session.region_name
account_id = session.client('sts').get_caller_identity()['Account']

# Set up MLFlow tracking with SageMaker managed server
try:
    # Use the correct tracking server ARN format for SageMaker managed MLflow
    tracking_server_arn = "arn:aws:sagemaker:us-east-1:192771711075:mlflow-tracking-server/sagemaker-core-setup-mlflow-server"
    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow_tracking_uri = tracking_server_arn
    
    print(f"✅ Connected to SageMaker managed MLflow server")
    print(f"Tracking Server ARN: {tracking_server_arn}")
    
    # Create experiment
    experiment_name = "yolov11-drone-detection"
    try:
        mlflow.create_experiment(experiment_name)
        print(f"Created new experiment: {experiment_name}")
    except Exception:
        # Experiment already exists
        mlflow.set_experiment(experiment_name)
        print(f"Using existing experiment: {experiment_name}")
    
except Exception as e:
    print(f"⚠️  Could not connect to SageMaker managed MLflow: {e}")
    print("Using basic MLflow setup as fallback")
    experiment_name = "yolov11-drone-detection"
    mlflow.set_experiment(experiment_name)
    mlflow_tracking_uri = "file:///tmp/mlruns"

# Set up visualization
plt.rcParams["figure.figsize"] = (12, 6)

# Define bucket name and role
BUCKET_NAME = 'lucaskle-ab3-project-pv'
ROLE_ARN = sagemaker_session.get_caller_identity_arn()

# Model Registry configuration
MODEL_PACKAGE_GROUP_NAME = "yolov11-drone-detection-models"

print(f"Data Bucket: {BUCKET_NAME}")
print(f"Region: {region}")
print(f"Account ID: {account_id}")
print(f"Role ARN: {ROLE_ARN}")
print(f"MLFlow Experiment: {experiment_name}")
print(f"MLFlow Tracking URI: {mlflow_tracking_uri}")
print(f"Model Package Group: {MODEL_PACKAGE_GROUP_NAME}")

# Helper functions for MLflow logging
def log_params(params_dict):
    """Log parameters using MLflow"""
    mlflow.log_params(params_dict)

def log_metrics(metrics_dict, step=None):
    """Log metrics using MLflow"""
    for key, value in metrics_dict.items():
        mlflow.log_metric(key, value, step=step)

def log_artifact(local_path, artifact_path=None):
    """Log artifact using MLflow"""
    mlflow.log_artifact(local_path, artifact_path)

def start_run(run_name=None, experiment_name=None, tags=None):
    """Start MLflow run"""
    return mlflow.start_run(run_name=run_name, tags=tags)

# Model Registry helper functions
def create_model_package_group(group_name, description="YOLOv11 drone detection models"):
    """Create a model package group for organizing model versions"""
    try:
        response = sagemaker_client.create_model_package_group(
            ModelPackageGroupName=group_name,
            ModelPackageGroupDescription=description
        )
        print(f"✅ Created model package group: {group_name}")
        return response
    except sagemaker_client.exceptions.ValidationException as e:
        if "already exists" in str(e):
            print(f"ℹ️  Model package group '{group_name}' already exists")
            return None
        else:
            raise e

def register_model_version(model_package_group_name, model_data_url, image_uri, 
                          model_approval_status="PendingManualApproval"):
    """Register a new model version in the model package group"""
    try:
        response = sagemaker_client.create_model_package(
            ModelPackageGroupName=model_package_group_name,
            ModelPackageDescription=f"YOLOv11 model version created at {datetime.now().isoformat()}",
            ModelApprovalStatus=model_approval_status,
            InferenceSpecification={
                'Containers': [{
                    'Image': image_uri,
                    'ModelDataUrl': model_data_url,
                    'Framework': 'PYTORCH',
                    'FrameworkVersion': '2.0'
                }],
                'SupportedContentTypes': ['image/jpeg', 'image/png'],
                'SupportedResponseMIMETypes': ['application/json']
            }
        )
        print(f"✅ Registered model version: {response['ModelPackageArn']}")
        return response
    except Exception as e:
        print(f"❌ Failed to register model: {str(e)}")
        return None

print("✅ MLflow helper functions loaded")
print("✅ Model Registry helper functions loaded")

def get_model_artifacts_from_registry(model_package_arn):
    """Retrieve model artifacts location from Model Registry (proper way to access artifacts)"""
    try:
        # Get model package details from registry
        response = sagemaker_client.describe_model_package(
            ModelPackageName=model_package_arn
        )
        
        # Extract artifact location from inference specification
        containers = response['InferenceSpecification']['Containers']
        if containers:
            model_data_url = containers[0]['ModelDataUrl']
            print(f"📦 Model artifacts (via Model Registry): {model_data_url}")
            
            # Additional metadata
            approval_status = response['ModelApprovalStatus']
            creation_time = response['CreationTime']
            
            artifact_info = {
                'model_package_arn': model_package_arn,
                'model_data_url': model_data_url,
                'approval_status': approval_status,
                'creation_time': creation_time,
                'registry_managed': True
            }
            
            return artifact_info
        else:
            print("❌ No containers found in model package")
            return None
            
    except Exception as e:
        print(f"❌ Error retrieving model artifacts from registry: {str(e)}")
        return None

def list_model_artifacts_in_registry(model_package_group_name):
    """List all model artifacts managed by the Model Registry"""
    try:
        response = sagemaker_client.list_model_packages(
            ModelPackageGroupName=model_package_group_name,
            SortBy='CreationTime',
            SortOrder='Descending'
        )
        
        models = response.get('ModelPackageSummaryList', [])
        
        print(f"📋 Models in Registry: {model_package_group_name}")
        print("=" * 80)
        
        for i, model in enumerate(models):
            model_arn = model['ModelPackageArn']
            artifact_info = get_model_artifacts_from_registry(model_arn)
            
            if artifact_info:
                print(f"{i+1}. Model Package: {model_arn.split('/')[-1]}")
                print(f"   Status: {artifact_info['approval_status']}")
                print(f"   Artifacts: {artifact_info['model_data_url']}")
                print(f"   Created: {artifact_info['creation_time'].strftime('%Y-%m-%d %H:%M:%S')}")
                print("-" * 80)
        
        return models
        
    except Exception as e:
        print(f"❌ Error listing models in registry: {str(e)}")
        return []

print("✅ Model Registry artifact management functions loaded")

## 1. Setup Model Registry

First, let's create the Model Package Group in SageMaker Model Registry if it doesn't exist.

In [None]:
# Function to create model package group
def create_model_package_group(group_name, description="YOLOv11 drone detection models"):
    """Create a model package group in SageMaker Model Registry"""
    try:
        # Check if group already exists
        response = sagemaker_client.describe_model_package_group(
            ModelPackageGroupName=group_name
        )
        print(f"Model package group '{group_name}' already exists.")
        print(f"Status: {response['ModelPackageGroupStatus']}")
        return response
    except sagemaker_client.exceptions.ClientError as e:
        if e.response['Error']['Code'] == 'ValidationException':
            # Group doesn't exist, create it
            print(f"Creating model package group: {group_name}")
            response = sagemaker_client.create_model_package_group(
                ModelPackageGroupName=group_name,
                ModelPackageGroupDescription=description
            )
            print(f"Created model package group: {response['ModelPackageGroupArn']}")
            return response
        else:
            raise e

# Create model package group
model_package_group = create_model_package_group(MODEL_PACKAGE_GROUP_NAME)

## 2. Pipeline Configuration

Let's configure our YOLOv11 training pipeline parameters with MLFlow tracking.

In [None]:
# Enhanced function to list and validate datasets
def list_and_validate_datasets(bucket, prefix="datasets/"):
    """List available datasets with validation and selection interface"""
    s3_client = session.client('s3')
    response = s3_client.list_objects_v2(
        Bucket=bucket,
        Prefix=prefix,
        Delimiter='/'
    )
    
    datasets = []
    if 'CommonPrefixes' in response:
        for obj in response['CommonPrefixes']:
            dataset_prefix = obj['Prefix']
            dataset_name = dataset_prefix.split('/')[-2]
            
            # Validate dataset structure
            validation_result = validate_yolo_dataset_structure(bucket, dataset_prefix)
            
            datasets.append({
                'name': dataset_name,
                'prefix': dataset_prefix,
                'full_path': f's3://{bucket}/{dataset_prefix}',
                'valid': validation_result['valid'],
                'validation_details': validation_result
            })
    
    return datasets

def validate_yolo_dataset_structure(bucket, dataset_prefix):
    """Validate YOLOv11 dataset structure"""
    s3_client = session.client('s3')
    
    required_structure = {
        'train/images/': False,
        'train/labels/': False,
        'val/images/': False,
        'val/labels/': False,
        'data.yaml': False,
        'dataset_info.json': False
    }
    
    validation_details = {
        'valid': False,
        'missing_components': [],
        'found_components': [],
        'train_image_count': 0,
        'val_image_count': 0,
        'train_label_count': 0,
        'val_label_count': 0
    }
    
    try:
        # Check for required directories and files
        for required_path in required_structure.keys():
            full_path = dataset_prefix + required_path
            
            if required_path.endswith('/'):
                # Check directory exists and has content
                response = s3_client.list_objects_v2(
                    Bucket=bucket,
                    Prefix=full_path,
                    MaxKeys=1
                )
                if 'Contents' in response:
                    required_structure[required_path] = True
                    validation_details['found_components'].append(required_path)
                    
                    # Count files in image/label directories
                    if 'images/' in required_path:
                        count_response = s3_client.list_objects_v2(
                            Bucket=bucket,
                            Prefix=full_path
                        )
                        count = len(count_response.get('Contents', []))
                        if 'train/' in required_path:
                            validation_details['train_image_count'] = count
                        else:
                            validation_details['val_image_count'] = count
                    
                    elif 'labels/' in required_path:
                        count_response = s3_client.list_objects_v2(
                            Bucket=bucket,
                            Prefix=full_path
                        )
                        count = len(count_response.get('Contents', []))
                        if 'train/' in required_path:
                            validation_details['train_label_count'] = count
                        else:
                            validation_details['val_label_count'] = count
                else:
                    validation_details['missing_components'].append(required_path)
            else:
                # Check file exists
                try:
                    s3_client.head_object(Bucket=bucket, Key=full_path)
                    required_structure[required_path] = True
                    validation_details['found_components'].append(required_path)
                except:
                    validation_details['missing_components'].append(required_path)
        
        # Dataset is valid if all required components are found
        validation_details['valid'] = all(required_structure.values())
        
    except Exception as e:
        print(f"Error validating dataset structure: {e}")
    
    return validation_details

def display_dataset_selection_interface(datasets):
    """Display interactive dataset selection interface"""
    if not datasets:
        print("❌ No datasets found in s3://lucaskle-ab3-project-pv/datasets/")
        print("\nExpected dataset structure:")
        print("s3://lucaskle-ab3-project-pv/datasets/your_dataset_name/")
        print("├── train/")
        print("│   ├── images/     ✅ Training images")
        print("│   └── labels/     ✅ Training labels (.txt files)")
        print("├── val/")
        print("│   ├── images/     ✅ Validation images")
        print("│   └── labels/     ✅ Validation labels (.txt files)")
        print("├── data.yaml       ✅ Dataset configuration")
        print("└── dataset_info.json ✅ Complete metadata")
        return None
    
    print(f"📊 Found {len(datasets)} datasets in s3://{BUCKET_NAME}/datasets/")
    print("=" * 100)
    
    valid_datasets = []
    for i, dataset in enumerate(datasets):
        status_icon = "✅" if dataset['valid'] else "❌"
        print(f"{i+1}. {status_icon} {dataset['name']}")
        print(f"   Path: {dataset['full_path']}")
        
        if dataset['valid']:
            details = dataset['validation_details']
            print(f"   📈 Training: {details['train_image_count']} images, {details['train_label_count']} labels")
            print(f"   📊 Validation: {details['val_image_count']} images, {details['val_label_count']} labels")
            valid_datasets.append((i, dataset))
        else:
            print(f"   ⚠️  Missing: {', '.join(dataset['validation_details']['missing_components'])}")
        
        print("-" * 100)
    
    if not valid_datasets:
        print("❌ No valid datasets found. Please ensure datasets follow the required structure.")
        return None
    
    print(f"\n✅ {len(valid_datasets)} valid dataset(s) available for training")
    return datasets

# List and validate all datasets
print("🔍 Discovering and validating datasets...")
available_datasets = list_and_validate_datasets(BUCKET_NAME)
validated_datasets = display_dataset_selection_interface(available_datasets)

In [None]:
# Interactive dataset selection
def select_dataset_for_training(datasets):
    """Allow user to select dataset for training"""
    if not datasets:
        return None
    
    valid_datasets = [d for d in datasets if d['valid']]
    
    if not valid_datasets:
        print("No valid datasets available for selection")
        return None
    
    if len(valid_datasets) == 1:
        selected = valid_datasets[0]
        print(f"🎯 Auto-selecting the only valid dataset: {selected['name']}")
        return selected
    
    print("\n📋 Select a dataset for training:")
    for i, dataset in enumerate(valid_datasets):
        details = dataset['validation_details']
        total_images = details['train_image_count'] + details['val_image_count']
        print(f"  {i+1}. {dataset['name']} ({total_images} total images)")
    
    print(f"\n💡 Recommendation: Choose the dataset with the most recent timestamp")
    print(f"   or the one specifically prepared for your training task.")
    
    # For notebook execution, default to the first valid dataset
    # In interactive mode, user would input their choice
    selected_dataset = valid_datasets[0]  # Default to first valid dataset
    
    print(f"\n✅ Selected dataset: {selected_dataset['name']}")
    print(f"   Path: {selected_dataset['full_path']}")
    
    details = selected_dataset['validation_details']
    print(f"   📊 Training data: {details['train_image_count']} images, {details['train_label_count']} labels")
    print(f"   📊 Validation data: {details['val_image_count']} images, {details['val_label_count']} labels")
    
    return selected_dataset

# Select dataset for training
if validated_datasets:
    selected_dataset = select_dataset_for_training(validated_datasets)
    
    if selected_dataset:
        # Update training parameters with selected dataset
        SELECTED_DATASET_NAME = selected_dataset['name']
        SELECTED_DATASET_PREFIX = selected_dataset['prefix']
        SELECTED_DATASET_PATH = selected_dataset['full_path']
        
        print(f"\n🎯 Dataset ready for training:")
        print(f"   Name: {SELECTED_DATASET_NAME}")
        print(f"   S3 Path: {SELECTED_DATASET_PATH}")
    else:
        print("❌ No dataset selected. Cannot proceed with training.")
else:
    print("❌ No datasets available. Please prepare a dataset first using the Data Scientist notebook.")

In [None]:
# Define training parameters with selected dataset
if 'selected_dataset' in locals() and selected_dataset:
    training_params = {
        # Dataset parameters (using selected dataset)
        'dataset_name': SELECTED_DATASET_NAME,
        'dataset_prefix': SELECTED_DATASET_PREFIX,
        'dataset_path': SELECTED_DATASET_PATH,
        
        # Model parameters
        'model_variant': 'yolov11n',  # Options: yolov11n, yolov11s, yolov11m, yolov11l, yolov11x
        'image_size': 640,  # Input image size (px)
        
        # Training parameters
        'batch_size': 16,
        'epochs': 50,
        'learning_rate': 0.001,
        
        # Infrastructure parameters
        'instance_type': 'ml.g4dn.xlarge',
        'instance_count': 1,
        'use_spot': True,
        'max_wait': 36000,  # Max wait time for spot instances (seconds)
        'max_run': 3600,    # Max run time (seconds)
        
        # Output parameters
        'output_path': f"s3://{BUCKET_NAME}/model-artifacts/",
        'job_name': f"yolov11-training-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}",
        
        # MLFlow parameters
        'experiment_name': experiment_name,
        'run_name': f"yolov11-run-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
    }
    
    # Display training parameters
    print("🚀 YOLOv11 Training Parameters:")
    print("=" * 60)
    for key, value in training_params.items():
        print(f"  {key}: {value}")
        
    # Display dataset validation summary
    if selected_dataset['validation_details']:
        details = selected_dataset['validation_details']
        print(f"\n📊 Selected Dataset Summary:")
        print(f"  Training Images: {details['train_image_count']}")
        print(f"  Training Labels: {details['train_label_count']}")
        print(f"  Validation Images: {details['val_image_count']}")
        print(f"  Validation Labels: {details['val_label_count']}")
        print(f"  Total Images: {details['train_image_count'] + details['val_image_count']}")
        
        # Validation warnings
        if details['train_image_count'] != details['train_label_count']:
            print(f"  ⚠️  Warning: Training images/labels count mismatch")
        if details['val_image_count'] != details['val_label_count']:
            print(f"  ⚠️  Warning: Validation images/labels count mismatch")
            
else:
    print("❌ Cannot configure training parameters - no dataset selected")
    print("Please run the dataset selection cells above first.")

## 3. Pipeline Execution with MLFlow Tracking

Now let's execute the YOLOv11 training pipeline with comprehensive MLFlow tracking.

In [None]:
# Function to create and execute training job with MLFlow tracking
def execute_training_job_with_mlflow(params):
    """Create and execute SageMaker training job for YOLOv11 with MLFlow tracking"""
    
    # Start MLFlow run
    with mlflow.start_run(run_name=params['run_name']) as run:
        # Log parameters to MLFlow
        mlflow.log_param("model_variant", params['model_variant'])
        mlflow.log_param("image_size", params['image_size'])
        mlflow.log_param("batch_size", params['batch_size'])
        mlflow.log_param("epochs", params['epochs'])
        mlflow.log_param("learning_rate", params['learning_rate'])
        mlflow.log_param("instance_type", params['instance_type'])
        mlflow.log_param("instance_count", params['instance_count'])
        mlflow.log_param("use_spot", params['use_spot'])
        mlflow.log_param("dataset_name", params['dataset_name'])
        mlflow.log_param("dataset_prefix", params['dataset_prefix'])
        
        # Define hyperparameters for SageMaker
        hyperparameters = {
            "model_variant": params['model_variant'],
            "image_size": str(params['image_size']),
            "batch_size": str(params['batch_size']),
            "epochs": str(params['epochs']),
            "learning_rate": str(params['learning_rate']),
            "mlflow_run_id": run.info.run_id,
            "mlflow_experiment_id": run.info.experiment_id
        }
        
        # Define input data channels
        input_data = {
            'training': f"s3://{BUCKET_NAME}/{params['dataset_prefix']}"
        }
        
        # Create SageMaker estimator
        estimator = sagemaker.estimator.Estimator(
            image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-training:latest",
            role=ROLE_ARN,
            instance_count=params['instance_count'],
            instance_type=params['instance_type'],
            hyperparameters=hyperparameters,
            output_path=params['output_path'],
            sagemaker_session=sagemaker_session,
            use_spot_instances=params['use_spot'],
            max_wait=params['max_wait'] if params['use_spot'] else None,
            max_run=params['max_run']
        )
        
        # Start training job
        print(f"Starting training job: {params['job_name']}")
        print(f"MLFlow Run ID: {run.info.run_id}")
        
        # Log training job details to MLFlow
        mlflow.log_param("sagemaker_job_name", params['job_name'])
        mlflow.log_param("output_path", params['output_path'])
        
        # Start the training job
        estimator.fit(input_data, job_name=params['job_name'], wait=False)
        
        # Log additional metadata
        mlflow.set_tag("sagemaker_job_name", params['job_name'])
        mlflow.set_tag("model_type", "YOLOv11")
        mlflow.set_tag("task_type", "object_detection")
        mlflow.set_tag("dataset", params['dataset_name'])
        
        return params['job_name'], run.info.run_id

# Execute training job with MLFlow tracking
try:
    job_name, mlflow_run_id = execute_training_job_with_mlflow(training_params)
    print(f"\nTraining job started: {job_name}")
    print(f"MLFlow Run ID: {mlflow_run_id}")
    print(f"You can monitor the job in the SageMaker console or using the cell below.")
except Exception as e:
    print(f"Error starting training job: {str(e)}")
    print("\nPossible causes:")
    print("1. The dataset doesn't exist or has incorrect structure")
    print("2. The YOLOv11 training container doesn't exist in ECR")
    print("3. Insufficient permissions to start training job")
    print("\nPlease check the error message and try again.")

## 4. Pipeline Monitoring with Enhanced Metrics

Let's monitor the progress of our training job and update MLFlow with metrics.

In [None]:
# Function to monitor training job and update MLFlow
def monitor_training_job_with_mlflow(job_name, mlflow_run_id):
    """Monitor SageMaker training job status and update MLFlow"""
    # Get job description
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Extract job status
    status = response['TrainingJobStatus']
    creation_time = response['CreationTime']
    last_modified_time = response.get('LastModifiedTime', creation_time)
    
    # Calculate duration
    duration = last_modified_time - creation_time
    duration_minutes = duration.total_seconds() / 60
    
    # Display job information
    print(f"Job Name: {job_name}")
    print(f"Status: {status}")
    print(f"Creation Time: {creation_time.strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Last Modified: {last_modified_time.strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Duration: {duration_minutes:.2f} minutes")
    print(f"MLFlow Run ID: {mlflow_run_id}")
    
    # Update MLFlow with job status
    with mlflow.start_run(run_id=mlflow_run_id):
        mlflow.log_metric("training_duration_minutes", duration_minutes)
        mlflow.set_tag("job_status", status)
        mlflow.set_tag("last_updated", last_modified_time.isoformat())
    
    # Display additional information based on status
    if status == 'InProgress':
        print("\nJob is still running. Check back later for results.")
    elif status == 'Completed':
        print("\nJob completed successfully!")
        model_artifacts = response['ModelArtifacts']['S3ModelArtifacts']
        print(f"Model artifacts: {model_artifacts}")
        
        # Update MLFlow with completion details
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.log_param("model_artifacts_path", model_artifacts)
            mlflow.set_tag("training_completed", "true")
            
    elif status == 'Failed':
        print("\nJob failed!")
        failure_reason = response.get('FailureReason', 'Unknown')
        print(f"Failure reason: {failure_reason}")
        
        # Update MLFlow with failure details
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.set_tag("failure_reason", failure_reason)
            mlflow.set_tag("training_failed", "true")
            
    elif status == 'Stopped':
        print("\nJob was stopped.")
        
        # Update MLFlow with stopped status
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.set_tag("training_stopped", "true")
    
    return response

# Monitor the training job
try:
    if 'job_name' in locals() and 'mlflow_run_id' in locals():
        job_response = monitor_training_job_with_mlflow(job_name, mlflow_run_id)
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error monitoring training job: {str(e)}")

In [None]:
# Refresh job status (run this cell to update status)
try:
    if 'job_name' in locals() and 'mlflow_run_id' in locals():
        job_response = monitor_training_job_with_mlflow(job_name, mlflow_run_id)
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error monitoring training job: {str(e)}")

## 5. Automated Model Validation Before Registration

Before registering models, let's implement automated validation to ensure model quality and performance thresholds.

In [None]:
# Function to register model in Model Registry
def register_model_in_registry(job_name, mlflow_run_id, model_package_group_name):
    """Register trained model in SageMaker Model Registry with proper artifact management"""
    
    # Get training job details
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Check if job is completed
    if response['TrainingJobStatus'] != 'Completed':
        print(f"Training job is not completed yet. Status: {response['TrainingJobStatus']}")
        return None
    
    # Get model artifacts from training job (SageMaker automatically stores these)
    model_artifacts_s3_uri = response['ModelArtifacts']['S3ModelArtifacts']
    
    print(f"📦 Model artifacts location: {model_artifacts_s3_uri}")
    
    # Create model package with proper artifact reference
    model_package_name = f"yolov11-model-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
    
    # Define inference specification with proper artifact reference
    inference_specification = {
        'Containers': [
            {
                'Image': f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-inference:latest",
                'ModelDataUrl': model_artifacts_s3_uri,  # Reference to S3 artifacts
                'Framework': 'PYTORCH',
                'FrameworkVersion': '2.0',
                'Environment': {
                    'SAGEMAKER_PROGRAM': 'inference.py',
                    'SAGEMAKER_SUBMIT_DIRECTORY': '/opt/ml/code'
                }
            }
        ],
        'SupportedContentTypes': ['application/json', 'image/jpeg', 'image/png'],
        'SupportedResponseMIMETypes': ['application/json'],
        'SupportedRealtimeInferenceInstanceTypes': ['ml.t2.medium', 'ml.m5.large', 'ml.m5.xlarge']
    }
    
    # Create model package in registry
    try:
        create_response = sagemaker_client.create_model_package(
            ModelPackageGroupName=model_package_group_name,
            ModelPackageDescription=f"YOLOv11 drone detection model trained from job {job_name}",
            InferenceSpecification=inference_specification,
            ModelApprovalStatus='PendingManualApproval',
            MetadataProperties={
                'GeneratedBy': f'sagemaker-training-job-{job_name}',
                'ProjectId': 'yolov11-drone-detection',
                'Repository': 'mlops-sagemaker-demo',
                'ModelArtifactsS3Uri': model_artifacts_s3_uri,  # Store S3 reference in metadata
                'TrainingJobArn': response['TrainingJobArn']
            },
            Tags=[
                {'Key': 'Project', 'Value': 'MLOps-SageMaker-Demo'},
                {'Key': 'Model', 'Value': 'YOLOv11'},
                {'Key': 'Task', 'Value': 'ObjectDetection'},
                {'Key': 'TrainingJob', 'Value': job_name},
                {'Key': 'MLFlowRunId', 'Value': mlflow_run_id},
                {'Key': 'ModelArtifactsS3Uri', 'Value': model_artifacts_s3_uri}
            ]
        )
        
        model_package_arn = create_response['ModelPackageArn']
        print(f"✅ Model registered successfully in Model Registry!")
        print(f"📋 Model Package ARN: {model_package_arn}")
        print(f"📦 Artifacts managed through Model Registry (stored in S3: {model_artifacts_s3_uri})")
        
        # Update MLFlow with model registration details
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.log_param("model_package_arn", model_package_arn)
            mlflow.log_param("model_package_group", model_package_group_name)
            mlflow.log_param("model_artifacts_s3_uri", model_artifacts_s3_uri)
            mlflow.set_tag("model_registered", "true")
            mlflow.set_tag("model_approval_status", "PendingManualApproval")
            mlflow.set_tag("model_registry_managed", "true")
            
            # Log model reference (not the actual artifacts)
            mlflow.log_param("model_registry_reference", model_package_arn)
        
        return model_package_arn
        
    except Exception as e:
        print(f"❌ Error registering model: {str(e)}")
        return None

# Register the model
try:
    if 'job_name' in locals() and 'mlflow_run_id' in locals():
        model_package_arn = register_model_in_registry(job_name, mlflow_run_id, MODEL_PACKAGE_GROUP_NAME)
        if model_package_arn:
            print(f"\nModel registration completed!")
            print(f"You can view the model in the SageMaker Model Registry console.")
    else:
        print("No completed training job to register.")
        print("Please complete a training job first.")
except Exception as e:
    print(f"Error during model registration: {str(e)}")

In [None]:
# Function to validate model performance before registration
def validate_model_performance(job_name, mlflow_run_id, min_map50=0.3, min_map50_95=0.2):
    """
    Validate model performance against minimum thresholds before registration
    
    Args:
        job_name: SageMaker training job name
        mlflow_run_id: MLFlow run ID
        min_map50: Minimum mAP@0.5 threshold
        min_map50_95: Minimum mAP@0.5:0.95 threshold
    
    Returns:
        dict: Validation results with pass/fail status and metrics
    """
    
    # Get training job details
    try:
        response = sagemaker_client.describe_training_job(TrainingJobName=job_name)
        
        if response['TrainingJobStatus'] != 'Completed':
            return {
                'validation_passed': False,
                'reason': f"Training job not completed. Status: {response['TrainingJobStatus']}",
                'metrics': {}
            }
        
        # Get final metrics from CloudWatch
        cloudwatch = session.client('cloudwatch')
        
        # Retrieve validation metrics
        validation_metrics = {}
        metric_names = ['val:mAP50', 'val:mAP50-95', 'val:precision', 'val:recall']
        
        for metric_name in metric_names:
            try:
                cw_response = cloudwatch.get_metric_statistics(
                    Namespace='SageMaker',
                    MetricName=metric_name,
                    Dimensions=[{'Name': 'TrainingJobName', 'Value': job_name}],
                    StartTime=response['CreationTime'],
                    EndTime=response['LastModifiedTime'],
                    Period=300,  # 5-minute periods
                    Statistics=['Maximum']  # Get best performance
                )
                
                datapoints = cw_response.get('Datapoints', [])
                if datapoints:
                    # Get the maximum value (best performance)
                    max_value = max(dp['Maximum'] for dp in datapoints)
                    validation_metrics[metric_name] = max_value
                    
            except Exception as e:
                print(f"Warning: Could not retrieve metric {metric_name}: {str(e)}")
        
        # Perform validation checks
        validation_results = {
            'validation_passed': True,
            'reason': 'All validation checks passed',
            'metrics': validation_metrics,
            'thresholds': {
                'min_map50': min_map50,
                'min_map50_95': min_map50_95
            },
            'checks': []
        }
        
        # Check mAP@0.5 threshold
        if 'val:mAP50' in validation_metrics:
            map50_value = validation_metrics['val:mAP50']
            if map50_value >= min_map50:
                validation_results['checks'].append({
                    'metric': 'mAP@0.5',
                    'value': map50_value,
                    'threshold': min_map50,
                    'passed': True
                })
            else:
                validation_results['validation_passed'] = False
                validation_results['reason'] = f"mAP@0.5 ({map50_value:.3f}) below threshold ({min_map50})"
                validation_results['checks'].append({
                    'metric': 'mAP@0.5',
                    'value': map50_value,
                    'threshold': min_map50,
                    'passed': False
                })
        
        # Check mAP@0.5:0.95 threshold
        if 'val:mAP50-95' in validation_metrics:
            map50_95_value = validation_metrics['val:mAP50-95']
            if map50_95_value >= min_map50_95:
                validation_results['checks'].append({
                    'metric': 'mAP@0.5:0.95',
                    'value': map50_95_value,
                    'threshold': min_map50_95,
                    'passed': True
                })
            else:
                validation_results['validation_passed'] = False
                validation_results['reason'] = f"mAP@0.5:0.95 ({map50_95_value:.3f}) below threshold ({min_map50_95})"
                validation_results['checks'].append({
                    'metric': 'mAP@0.5:0.95',
                    'value': map50_95_value,
                    'threshold': min_map50_95,
                    'passed': False
                })
        
        # Log validation results to MLFlow
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.log_param("validation_passed", validation_results['validation_passed'])
            mlflow.log_param("validation_reason", validation_results['reason'])
            
            for check in validation_results['checks']:
                mlflow.log_metric(f"validation_{check['metric'].replace('@', '_').replace(':', '_')}", check['value'])
                mlflow.log_param(f"threshold_{check['metric'].replace('@', '_').replace(':', '_')}", check['threshold'])
            
            mlflow.set_tag("model_validation_status", "passed" if validation_results['validation_passed'] else "failed")
        
        # Display validation results
        print("🔍 Model Validation Results")
        print("=" * 50)
        print(f"Overall Status: {'✅ PASSED' if validation_results['validation_passed'] else '❌ FAILED'}")
        print(f"Reason: {validation_results['reason']}")
        print("\nDetailed Checks:")
        
        for check in validation_results['checks']:
            status = "✅ PASS" if check['passed'] else "❌ FAIL"
            print(f"  {check['metric']}: {check['value']:.3f} (threshold: {check['threshold']}) - {status}")
        
        if validation_metrics:
            print("\nAll Retrieved Metrics:")
            for metric, value in validation_metrics.items():
                print(f"  {metric}: {value:.3f}")
        
        return validation_results
        
    except Exception as e:
        error_result = {
            'validation_passed': False,
            'reason': f"Validation error: {str(e)}",
            'metrics': {}
        }
        
        # Log error to MLFlow
        with mlflow.start_run(run_id=mlflow_run_id):
            mlflow.set_tag("model_validation_status", "error")
            mlflow.set_tag("validation_error", str(e))
        
        return error_result

# Validate the model before registration
try:
    if 'job_name' in locals() and 'mlflow_run_id' in locals():
        print("Starting automated model validation...")
        validation_results = validate_model_performance(
            job_name, 
            mlflow_run_id,
            min_map50=0.3,      # Minimum 30% mAP@0.5
            min_map50_95=0.2    # Minimum 20% mAP@0.5:0.95
        )
        
        if validation_results['validation_passed']:
            print("\n🎉 Model validation passed! Proceeding with registration...")
        else:
            print(f"\n⚠️  Model validation failed: {validation_results['reason']}")
            print("Consider retraining with different parameters or adjusting thresholds.")
    else:
        print("No completed training job to validate.")
        print("Please complete a training job first.")
except Exception as e:
    print(f"Error during model validation: {str(e)}")

## 6. Model Management and Approval Workflow

Let's manage the registered models and handle approval workflows.

In [None]:
# Function to list models in the registry
def list_models_in_registry(model_package_group_name):
    """List all models in the Model Registry"""
    try:
        response = sagemaker_client.list_model_packages(
            ModelPackageGroupName=model_package_group_name,
            SortBy='CreationTime',
            SortOrder='Descending'
        )
        
        models = response.get('ModelPackageSummaryList', [])
        
        if not models:
            print(f"No models found in group: {model_package_group_name}")
            return []
        
        print(f"Found {len(models)} models in group: {model_package_group_name}")
        print("\nModel List:")
        print("-" * 80)
        
        for i, model in enumerate(models):
            print(f"{i+1}. Model Package ARN: {model['ModelPackageArn']}")
            print(f"   Status: {model['ModelPackageStatus']}")
            print(f"   Approval Status: {model['ModelApprovalStatus']}")
            print(f"   Creation Time: {model['CreationTime'].strftime('%Y-%m-%d %H:%M:%S')}")
            if 'ModelPackageDescription' in model:
                print(f"   Description: {model['ModelPackageDescription']}")
            print("-" * 80)
        
        return models
        
    except Exception as e:
        print(f"Error listing models: {str(e)}")
        return []

# List models in registry
models = list_models_in_registry(MODEL_PACKAGE_GROUP_NAME)

In [None]:
# Function to approve a model
def approve_model(model_package_arn, approval_description="Model approved for deployment"):
    """Approve a model in the Model Registry"""
    try:
        response = sagemaker_client.update_model_package(
            ModelPackageArn=model_package_arn,
            ModelApprovalStatus='Approved',
            ApprovalDescription=approval_description
        )
        
        print(f"Model approved successfully!")
        print(f"Model Package ARN: {model_package_arn}")
        print(f"Approval Description: {approval_description}")
        
        return True
        
    except Exception as e:
        print(f"Error approving model: {str(e)}")
        return False

# Example: Approve the latest model (uncomment to use)
# if models:
#     latest_model_arn = models[0]['ModelPackageArn']
#     approve_model(latest_model_arn, "Model approved after validation")

In [None]:
# Function to get model details
def get_model_details(model_package_arn):
    """Get detailed information about a model"""
    try:
        response = sagemaker_client.describe_model_package(
            ModelPackageName=model_package_arn
        )
        
        print(f"Model Package Details:")
        print(f"ARN: {response['ModelPackageArn']}")
        print(f"Status: {response['ModelPackageStatus']}")
        print(f"Approval Status: {response['ModelApprovalStatus']}")
        print(f"Creation Time: {response['CreationTime'].strftime('%Y-%m-%d %H:%M:%S')}")
        
        if 'ModelPackageDescription' in response:
            print(f"Description: {response['ModelPackageDescription']}")
        
        if 'InferenceSpecification' in response:
            containers = response['InferenceSpecification']['Containers']
            print(f"\nInference Specification:")
            for i, container in enumerate(containers):
                print(f"  Container {i+1}:")
                print(f"    Image: {container['Image']}")
                print(f"    Model Data: {container['ModelDataUrl']}")
        
        if 'Tags' in response:
            print(f"\nTags:")
            for tag in response['Tags']:
                print(f"  {tag['Key']}: {tag['Value']}")
        
        return response
        
    except Exception as e:
        print(f"Error getting model details: {str(e)}")
        return None

# Example: Get details of the latest model (uncomment to use)
# if models:
#     latest_model_arn = models[0]['ModelPackageArn']
#     model_details = get_model_details(latest_model_arn)

## 7. Automated Deployment for Approved Models

Implement automated deployment pipeline that triggers when models are approved in the Model Registry.

In [None]:
# Function to list MLFlow experiments
def list_mlflow_experiments():
    """List all MLFlow experiments"""
    try:
        experiments = mlflow.search_experiments()
        
        print(f"Found {len(experiments)} experiments:")
        print("-" * 80)
        
        for exp in experiments:
            print(f"Experiment ID: {exp.experiment_id}")
            print(f"Name: {exp.name}")
            print(f"Lifecycle Stage: {exp.lifecycle_stage}")
            if exp.tags:
                print(f"Tags: {exp.tags}")
            print("-" * 80)
        
        return experiments
        
    except Exception as e:
        print(f"Error listing experiments: {str(e)}")
        return []

# List experiments
experiments = list_mlflow_experiments()

In [None]:
# Function to search MLFlow runs
def search_mlflow_runs(experiment_name, max_results=10):
    """Search MLFlow runs in an experiment"""
    try:
        # Get experiment by name
        experiment = mlflow.get_experiment_by_name(experiment_name)
        if not experiment:
            print(f"Experiment '{experiment_name}' not found")
            return []
        
        # Search runs
        runs = mlflow.search_runs(
            experiment_ids=[experiment.experiment_id],
            max_results=max_results,
            order_by=["start_time DESC"]
        )
        
        if runs.empty:
            print(f"No runs found in experiment '{experiment_name}'")
            return []
        
        print(f"Found {len(runs)} runs in experiment '{experiment_name}':")
        print("-" * 100)
        
        # Display run information
        for idx, run in runs.iterrows():
            print(f"Run ID: {run['run_id']}")
            print(f"Status: {run['status']}")
            print(f"Start Time: {run['start_time']}")
            
            # Display parameters
            param_cols = [col for col in runs.columns if col.startswith('params.')]
            if param_cols:
                print("Parameters:")
                for param_col in param_cols:
                    param_name = param_col.replace('params.', '')
                    param_value = run[param_col]
                    if pd.notna(param_value):
                        print(f"  {param_name}: {param_value}")
            
            # Display metrics
            metric_cols = [col for col in runs.columns if col.startswith('metrics.')]
            if metric_cols:
                print("Metrics:")
                for metric_col in metric_cols:
                    metric_name = metric_col.replace('metrics.', '')
                    metric_value = run[metric_col]
                    if pd.notna(metric_value):
                        print(f"  {metric_name}: {metric_value}")
            
            # Display tags
            tag_cols = [col for col in runs.columns if col.startswith('tags.')]
            if tag_cols:
                print("Tags:")
                for tag_col in tag_cols:
                    tag_name = tag_col.replace('tags.', '')
                    tag_value = run[tag_col]
                    if pd.notna(tag_value):
                        print(f"  {tag_name}: {tag_value}")
            
            print("-" * 100)
        
        return runs
        
    except Exception as e:
        print(f"Error searching runs: {str(e)}")
        return []

# Search runs in the current experiment
runs_df = search_mlflow_runs(experiment_name)

In [None]:
# Function to compare runs
def compare_runs(runs_df, metrics_to_compare=['training_duration_minutes']):
    """Compare MLFlow runs"""
    if runs_df.empty:
        print("No runs to compare")
        return
    
    print("Run Comparison:")
    print("=" * 120)
    
    # Select relevant columns for comparison
    comparison_cols = ['run_id', 'status', 'start_time']
    
    # Add parameter columns
    param_cols = [col for col in runs_df.columns if col.startswith('params.')]
    comparison_cols.extend(param_cols)
    
    # Add metric columns
    for metric in metrics_to_compare:
        metric_col = f'metrics.{metric}'
        if metric_col in runs_df.columns:
            comparison_cols.append(metric_col)
    
    # Display comparison table
    comparison_df = runs_df[comparison_cols].copy()
    
    # Rename columns for better display
    column_mapping = {}
    for col in comparison_df.columns:
        if col.startswith('params.'):
            column_mapping[col] = col.replace('params.', 'param_')
        elif col.startswith('metrics.'):
            column_mapping[col] = col.replace('metrics.', 'metric_')
    
    comparison_df = comparison_df.rename(columns=column_mapping)
    
    # Display the comparison
    display(comparison_df)
    
    return comparison_df

# Compare runs if available
if not runs_df.empty:
    comparison_df = compare_runs(runs_df)
else:
    print("No runs available for comparison")

In [None]:
# Function to automatically deploy approved models
def deploy_approved_model(model_package_arn, endpoint_name=None, instance_type='ml.m5.large', 
                         initial_instance_count=1, enable_autoscaling=True):
    """
    Automatically deploy an approved model to a SageMaker endpoint
    
    Args:
        model_package_arn: ARN of the approved model package
        endpoint_name: Name for the endpoint (auto-generated if None)
        instance_type: EC2 instance type for the endpoint
        initial_instance_count: Initial number of instances
        enable_autoscaling: Whether to enable auto-scaling
    
    Returns:
        dict: Deployment results with endpoint details
    """
    
    try:
        # Check model approval status
        model_details = sagemaker_client.describe_model_package(
            ModelPackageName=model_package_arn
        )
        
        if model_details['ModelApprovalStatus'] != 'Approved':
            return {
                'deployment_success': False,
                'reason': f"Model not approved. Status: {model_details['ModelApprovalStatus']}",
                'endpoint_name': None
            }
        
        # Generate endpoint name if not provided
        if endpoint_name is None:
            timestamp = datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
            endpoint_name = f"yolov11-endpoint-{timestamp}"
        
        # Create SageMaker model from model package
        model_name = f"yolov11-model-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
        
        create_model_response = sagemaker_client.create_model(
            ModelName=model_name,
            Containers=model_details['InferenceSpecification']['Containers'],
            ExecutionRoleArn=ROLE_ARN,
            Tags=[
                {'Key': 'Project', 'Value': 'MLOps-SageMaker-Demo'},
                {'Key': 'ModelPackageArn', 'Value': model_package_arn},
                {'Key': 'AutoDeployed', 'Value': 'true'}
            ]
        )
        
        print(f"✅ Created SageMaker model: {model_name}")
        
        # Create endpoint configuration
        endpoint_config_name = f"yolov11-config-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
        
        create_endpoint_config_response = sagemaker_client.create_endpoint_configuration(
            EndpointConfigName=endpoint_config_name,
            ProductionVariants=[
                {
                    'VariantName': 'primary',
                    'ModelName': model_name,
                    'InitialInstanceCount': initial_instance_count,
                    'InstanceType': instance_type,
                    'InitialVariantWeight': 1.0
                }
            ],
            Tags=[
                {'Key': 'Project', 'Value': 'MLOps-SageMaker-Demo'},
                {'Key': 'ModelPackageArn', 'Value': model_package_arn},
                {'Key': 'AutoDeployed', 'Value': 'true'}
            ]
        )
        
        print(f"✅ Created endpoint configuration: {endpoint_config_name}")
        
        # Create endpoint
        create_endpoint_response = sagemaker_client.create_endpoint(
            EndpointName=endpoint_name,
            EndpointConfigName=endpoint_config_name,
            Tags=[
                {'Key': 'Project', 'Value': 'MLOps-SageMaker-Demo'},
                {'Key': 'ModelPackageArn', 'Value': model_package_arn},
                {'Key': 'AutoDeployed', 'Value': 'true'}
            ]
        )
        
        print(f"✅ Creating endpoint: {endpoint_name}")
        print("⏳ Endpoint creation in progress... This may take 5-10 minutes.")
        
        # Set up auto-scaling if enabled
        if enable_autoscaling:
            try:
                autoscaling_client = session.client('application-autoscaling')
                
                # Register scalable target
                autoscaling_client.register_scalable_target(
                    ServiceNamespace='sagemaker',
                    ResourceId=f'endpoint/{endpoint_name}/variant/primary',
                    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
                    MinCapacity=1,
                    MaxCapacity=5,
                    RoleArn=ROLE_ARN
                )
                
                # Create scaling policy
                autoscaling_client.put_scaling_policy(
                    PolicyName=f'{endpoint_name}-scaling-policy',
                    ServiceNamespace='sagemaker',
                    ResourceId=f'endpoint/{endpoint_name}/variant/primary',
                    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
                    PolicyType='TargetTrackingScaling',
                    TargetTrackingScalingPolicyConfiguration={
                        'TargetValue': 70.0,
                        'PredefinedMetricSpecification': {
                            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance'
                        },
                        'ScaleOutCooldown': 300,
                        'ScaleInCooldown': 300
                    }
                )
                
                print("✅ Auto-scaling configured for endpoint")
                
            except Exception as e:
                print(f"⚠️  Auto-scaling setup failed: {str(e)}")
        
        deployment_result = {
            'deployment_success': True,
            'reason': 'Model deployed successfully',
            'endpoint_name': endpoint_name,
            'model_name': model_name,
            'endpoint_config_name': endpoint_config_name,
            'model_package_arn': model_package_arn,
            'instance_type': instance_type,
            'initial_instance_count': initial_instance_count,
            'autoscaling_enabled': enable_autoscaling
        }
        
        return deployment_result
        
    except Exception as e:
        return {
            'deployment_success': False,
            'reason': f"Deployment error: {str(e)}",
            'endpoint_name': endpoint_name
        }

# Function to monitor endpoint deployment
def monitor_endpoint_deployment(endpoint_name, max_wait_minutes=15):
    """Monitor endpoint deployment status"""
    
    start_time = datetime.now()
    max_wait_seconds = max_wait_minutes * 60
    
    print(f"🔍 Monitoring endpoint deployment: {endpoint_name}")
    print(f"Maximum wait time: {max_wait_minutes} minutes")
    print("-" * 60)
    
    while True:
        try:
            response = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
            status = response['EndpointStatus']
            
            elapsed_time = datetime.now() - start_time
            elapsed_minutes = elapsed_time.total_seconds() / 60
            
            print(f"Status: {status} | Elapsed: {elapsed_minutes:.1f} minutes")
            
            if status == 'InService':
                print("✅ Endpoint is now in service!")
                return True
            elif status == 'Failed':
                failure_reason = response.get('FailureReason', 'Unknown')
                print(f"❌ Endpoint deployment failed: {failure_reason}")
                return False
            elif elapsed_time.total_seconds() > max_wait_seconds:
                print(f"⏰ Timeout reached ({max_wait_minutes} minutes)")
                return False
            
            time.sleep(30)  # Wait 30 seconds before checking again
            
        except Exception as e:
            print(f"Error monitoring endpoint: {str(e)}")
            return False

# Example: Deploy the latest approved model (uncomment to use)
# if models:
#     # Find the first approved model
#     approved_models = [m for m in models if m['ModelApprovalStatus'] == 'Approved']
#     
#     if approved_models:
#         latest_approved = approved_models[0]
#         print(f"Deploying approved model: {latest_approved['ModelPackageArn']}")
#         
#         deployment_result = deploy_approved_model(
#             model_package_arn=latest_approved['ModelPackageArn'],
#             instance_type='ml.m5.large',
#             initial_instance_count=1,
#             enable_autoscaling=True
#         )
#         
#         if deployment_result['deployment_success']:
#             print(f"\n🚀 Deployment initiated successfully!")
#             print(f"Endpoint name: {deployment_result['endpoint_name']}")
#             
#             # Monitor deployment
#             success = monitor_endpoint_deployment(deployment_result['endpoint_name'])
#             
#             if success:
#                 print(f"\n🎉 Model successfully deployed to endpoint: {deployment_result['endpoint_name']}")
#             else:
#                 print(f"\n⚠️  Deployment monitoring ended. Check SageMaker console for details.")
#         else:
#             print(f"\n❌ Deployment failed: {deployment_result['reason']}")
#     else:
#         print("No approved models found for deployment.")
# else:
#     print("No models available for deployment.")

print("✅ Automated deployment functions loaded")
print("Uncomment the example code above to deploy an approved model")

## 8. Training Metrics Visualization

Let's visualize training metrics from completed jobs.

In [None]:
# Function to get training metrics from CloudWatch
def get_training_metrics_with_mlflow(job_name, mlflow_run_id):
    """Get training metrics from CloudWatch and log to MLFlow"""
    # Get job description
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Check if job is complete
    if response['TrainingJobStatus'] != 'Completed':
        print(f"Job is not yet complete. Current status: {response['TrainingJobStatus']}")
        return None
    
    # Get CloudWatch metrics
    cloudwatch = session.client('cloudwatch')
    
    # Define metrics to retrieve
    metrics = [
        'train:loss',
        'val:loss',
        'val:mAP50',
        'val:mAP50-95'
    ]
    
    # Get metrics data
    metrics_data = {}
    final_metrics = {}
    
    for metric_name in metrics:
        try:
            cw_response = cloudwatch.get_metric_statistics(
                Namespace='SageMaker',
                MetricName=metric_name,
                Dimensions=[
                    {
                        'Name': 'TrainingJobName',
                        'Value': job_name
                    }
                ],
                StartTime=response['CreationTime'],
                EndTime=response['LastModifiedTime'],
                Period=60,  # 1-minute periods
                Statistics=['Average']
            )
            
            # Extract datapoints
            datapoints = cw_response.get('Datapoints', [])
            if datapoints:
                # Sort by timestamp
                datapoints.sort(key=lambda x: x['Timestamp'])
                
                # Extract values
                timestamps = [dp['Timestamp'] for dp in datapoints]
                values = [dp['Average'] for dp in datapoints]
                
                metrics_data[metric_name] = {
                    'timestamps': timestamps,
                    'values': values
                }
                
                # Store final metric value
                if values:
                    final_metrics[metric_name] = values[-1]
                    
        except Exception as e:
            print(f"Error retrieving metric {metric_name}: {str(e)}")
    
    # Log final metrics to MLFlow
    if final_metrics:
        with mlflow.start_run(run_id=mlflow_run_id):
            for metric_name, value in final_metrics.items():
                # Clean metric name for MLFlow
                clean_name = metric_name.replace(':', '_')
                mlflow.log_metric(clean_name, value)
    
    return metrics_data, final_metrics

# Get and visualize training metrics
try:
    if 'job_name' in locals() and 'mlflow_run_id' in locals():
        metrics_data, final_metrics = get_training_metrics_with_mlflow(job_name, mlflow_run_id)
        
        if metrics_data:
            # Plot metrics
            fig, axes = plt.subplots(2, 1, figsize=(12, 10))
            
            # Plot loss
            if 'train:loss' in metrics_data:
                axes[0].plot(
                    metrics_data['train:loss']['timestamps'],
                    metrics_data['train:loss']['values'],
                    label='Train Loss',
                    marker='o'
                )
            
            if 'val:loss' in metrics_data:
                axes[0].plot(
                    metrics_data['val:loss']['timestamps'],
                    metrics_data['val:loss']['values'],
                    label='Validation Loss',
                    marker='s'
                )
            
            axes[0].set_title('Training and Validation Loss')
            axes[0].set_xlabel('Time')
            axes[0].set_ylabel('Loss')
            axes[0].legend()
            axes[0].grid(True, alpha=0.3)
            
            # Plot mAP
            if 'val:mAP50' in metrics_data:
                axes[1].plot(
                    metrics_data['val:mAP50']['timestamps'],
                    metrics_data['val:mAP50']['values'],
                    label='mAP@0.5',
                    marker='o'
                )
            
            if 'val:mAP50-95' in metrics_data:
                axes[1].plot(
                    metrics_data['val:mAP50-95']['timestamps'],
                    metrics_data['val:mAP50-95']['values'],
                    label='mAP@0.5:0.95',
                    marker='s'
                )
            
            axes[1].set_title('Validation mAP')
            axes[1].set_xlabel('Time')
            axes[1].set_ylabel('mAP')
            axes[1].legend()
            axes[1].grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.show()
            
            # Display final metrics
            if final_metrics:
                print("\nFinal Training Metrics:")
                print("=" * 40)
                for metric_name, value in final_metrics.items():
                    print(f"{metric_name}: {value:.4f}")
        else:
            print("No metrics available yet. Job may still be running or has failed.")
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error retrieving training metrics: {str(e)}")

## 9. Advanced Model Performance Comparison

Let's implement comprehensive model performance comparison utilities to analyze and compare different model versions across multiple dimensions.

### 9.1: Performance Comparison Framework

In [None]:
# Advanced model performance comparison utilities
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

def get_model_performance_metrics(model_package_arn):
    """Extract performance metrics from a registered model (using Model Registry as source)"""
    try:
        # Get model package details from Model Registry (primary source)
        response = sagemaker_client.describe_model_package(
            ModelPackageName=model_package_arn
        )
        
        # Get model artifacts location from Model Registry
        artifact_info = get_model_artifacts_from_registry(model_package_arn)
        
        # Extract training job name from tags or metadata
        training_job_name = None
        mlflow_run_id = None
        
        if 'Tags' in response:
            for tag in response['Tags']:
                if tag['Key'] == 'TrainingJob':
                    training_job_name = tag['Value']
                elif tag['Key'] == 'MLFlowRunId':
                    mlflow_run_id = tag['Value']
        
        # Also check metadata properties
        if 'MetadataProperties' in response:
            metadata = response['MetadataProperties']
            if not training_job_name and 'GeneratedBy' in metadata:
                # Extract from GeneratedBy field
                generated_by = metadata['GeneratedBy']
                if 'sagemaker-training-job-' in generated_by:
                    training_job_name = generated_by.replace('sagemaker-training-job-', '')
        
        if not training_job_name:
            print(f"⚠️  No training job found for model: {model_package_arn}")
            return None
        
        # Get training job details
        training_response = sagemaker_client.describe_training_job(
            TrainingJobName=training_job_name
        )
        
        # Extract hyperparameters
        hyperparameters = training_response.get('HyperParameters', {})
        
        # Get MLFlow metrics if available
        mlflow_metrics = {}
        if mlflow_run_id:
            try:
                run = mlflow.get_run(mlflow_run_id)
                mlflow_metrics = run.data.metrics
            except Exception as e:
                print(f"Could not retrieve MLFlow metrics: {e}")
        
        # Get CloudWatch metrics
        cloudwatch_metrics = get_final_training_metrics(training_job_name)
        
        # Combine all metrics with Model Registry information
        performance_data = {
            'model_package_arn': model_package_arn,
            'training_job_name': training_job_name,
            'mlflow_run_id': mlflow_run_id,
            'creation_time': response['CreationTime'],
            'approval_status': response['ModelApprovalStatus'],
            'model_artifacts_s3_uri': artifact_info['model_data_url'] if artifact_info else None,
            'registry_managed': True,  # Indicates artifacts are managed through Model Registry
            'hyperparameters': hyperparameters,
            'mlflow_metrics': mlflow_metrics,
            'cloudwatch_metrics': cloudwatch_metrics,
            'training_duration': training_response.get('TrainingTimeInSeconds', 0),
            'billable_duration': training_response.get('BillableTimeInSeconds', 0)
        }
        
        return performance_data
        
    except Exception as e:
        print(f"Error getting performance metrics for {model_package_arn}: {e}")
        return None

def get_final_training_metrics(job_name):
    """Get final training metrics from CloudWatch"""
    cloudwatch = session.client('cloudwatch')
    
    # Get job details for time range
    try:
        job_response = sagemaker_client.describe_training_job(TrainingJobName=job_name)
        start_time = job_response['CreationTime']
        end_time = job_response.get('TrainingEndTime', job_response['LastModifiedTime'])
    except:
        return {}
    
    metrics = ['train:loss', 'val:loss', 'val:mAP50', 'val:mAP50-95']
    final_metrics = {}
    
    for metric_name in metrics:
        try:
            response = cloudwatch.get_metric_statistics(
                Namespace='SageMaker',
                MetricName=metric_name,
                Dimensions=[{'Name': 'TrainingJobName', 'Value': job_name}],
                StartTime=start_time,
                EndTime=end_time,
                Period=300,
                Statistics=['Average']
            )
            
            datapoints = response.get('Datapoints', [])
            if datapoints:
                # Get the last (most recent) value
                datapoints.sort(key=lambda x: x['Timestamp'])
                final_metrics[metric_name] = datapoints[-1]['Average']
                
        except Exception as e:
            print(f"Could not retrieve {metric_name}: {e}")
    
    return final_metrics

print("✅ Performance comparison framework loaded")

### 9.2: Model Comparison Analysis

In [None]:
def compare_model_versions(model_package_group_name, max_models=10):
    """Compare multiple model versions with comprehensive analysis"""
    
    # Get all models in the group
    models = list_models_in_registry(model_package_group_name)
    
    if not models:
        print("No models found for comparison")
        return None
    
    # Limit to recent models
    models = models[:max_models]
    
    print(f"Analyzing {len(models)} model versions...")
    
    # Collect performance data for all models
    performance_data = []
    for model in models:
        model_arn = model['ModelPackageArn']
        perf_data = get_model_performance_metrics(model_arn)
        if perf_data:
            performance_data.append(perf_data)
    
    if not performance_data:
        print("No performance data available for comparison")
        return None
    
    # Create comparison DataFrame
    comparison_rows = []
    for data in performance_data:
        row = {
            'model_arn': data['model_package_arn'].split('/')[-1],  # Short version
            'training_job': data['training_job_name'],
            'creation_time': data['creation_time'],
            'approval_status': data['approval_status'],
            'training_duration_min': data['training_duration'] / 60,
            'billable_duration_min': data['billable_duration'] / 60
        }
        
        # Add hyperparameters
        hyperparams = data['hyperparameters']
        row.update({
            'model_variant': hyperparams.get('model_variant', 'unknown'),
            'batch_size': int(hyperparams.get('batch_size', 0)),
            'epochs': int(hyperparams.get('epochs', 0)),
            'learning_rate': float(hyperparams.get('learning_rate', 0))
        })
        
        # Add performance metrics (prefer CloudWatch over MLFlow)
        cw_metrics = data['cloudwatch_metrics']
        mlflow_metrics = data['mlflow_metrics']
        
        row.update({
            'final_train_loss': cw_metrics.get('train:loss', mlflow_metrics.get('train_loss')),
            'final_val_loss': cw_metrics.get('val:loss', mlflow_metrics.get('val_loss')),
            'final_map50': cw_metrics.get('val:mAP50', mlflow_metrics.get('val_mAP50')),
            'final_map50_95': cw_metrics.get('val:mAP50-95', mlflow_metrics.get('val_mAP50_95'))
        })
        
        comparison_rows.append(row)
    
    # Create DataFrame
    comparison_df = pd.DataFrame(comparison_rows)
    comparison_df = comparison_df.sort_values('creation_time', ascending=False)
    
    return comparison_df

# Perform model comparison
if models:
    comparison_df = compare_model_versions(MODEL_PACKAGE_GROUP_NAME)
    
    if comparison_df is not None:
        print("\n" + "="*100)
        print("MODEL PERFORMANCE COMPARISON")
        print("="*100)
        
        # Display key metrics
        display_cols = [
            'training_job', 'approval_status', 'model_variant', 
            'final_map50', 'final_map50_95', 'final_val_loss',
            'training_duration_min', 'epochs', 'learning_rate'
        ]
        
        available_cols = [col for col in display_cols if col in comparison_df.columns]
        display_df = comparison_df[available_cols].copy()
        
        # Format numeric columns
        numeric_cols = ['final_map50', 'final_map50_95', 'final_val_loss', 'training_duration_min', 'learning_rate']
        for col in numeric_cols:
            if col in display_df.columns:
                display_df[col] = pd.to_numeric(display_df[col], errors='coerce')
                display_df[col] = display_df[col].round(4)
        
        display(display_df)
        
        # Store for visualization
        global model_comparison_df
        model_comparison_df = comparison_df
        
    else:
        print("No comparison data available")
else:
    print("No models available for comparison")

### 9.3: Performance Visualization and Statistical Analysis

In [None]:
def visualize_model_performance(comparison_df):
    """Create comprehensive performance visualizations"""
    
    if comparison_df is None or comparison_df.empty:
        print("No data available for visualization")
        return
    
    # Set up the plotting style
    plt.style.use('default')
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Model Performance Comparison Analysis', fontsize=16, fontweight='bold')
    
    # 1. mAP Performance Comparison
    ax1 = axes[0, 0]
    if 'final_map50' in comparison_df.columns and comparison_df['final_map50'].notna().any():
        x_pos = range(len(comparison_df))
        bars1 = ax1.bar([x - 0.2 for x in x_pos], comparison_df['final_map50'].fillna(0), 
                       width=0.4, label='mAP@0.5', alpha=0.8, color='skyblue')
        
        if 'final_map50_95' in comparison_df.columns:
            bars2 = ax1.bar([x + 0.2 for x in x_pos], comparison_df['final_map50_95'].fillna(0), 
                           width=0.4, label='mAP@0.5:0.95', alpha=0.8, color='lightcoral')
        
        ax1.set_xlabel('Model Version')
        ax1.set_ylabel('mAP Score')
        ax1.set_title('Model Accuracy Comparison (mAP)')
        ax1.set_xticks(x_pos)
        ax1.set_xticklabels([f"v{i+1}" for i in range(len(comparison_df))], rotation=45)
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Add value labels on bars
        for i, (bar1, bar2) in enumerate(zip(bars1, bars2 if 'final_map50_95' in comparison_df.columns else bars1)):
            height1 = bar1.get_height()
            height2 = bar2.get_height() if 'final_map50_95' in comparison_df.columns else 0
            if height1 > 0:
                ax1.text(bar1.get_x() + bar1.get_width()/2., height1 + 0.01,
                        f'{height1:.3f}', ha='center', va='bottom', fontsize=8)
            if height2 > 0:
                ax1.text(bar2.get_x() + bar2.get_width()/2., height2 + 0.01,
                        f'{height2:.3f}', ha='center', va='bottom', fontsize=8)
    else:
        ax1.text(0.5, 0.5, 'No mAP data available', ha='center', va='center', transform=ax1.transAxes)
        ax1.set_title('Model Accuracy Comparison (mAP) - No Data')
    
    # 2. Training Efficiency Analysis
    ax2 = axes[0, 1]
    if 'training_duration_min' in comparison_df.columns and 'final_map50' in comparison_df.columns:
        valid_data = comparison_df.dropna(subset=['training_duration_min', 'final_map50'])
        if not valid_data.empty:
            scatter = ax2.scatter(valid_data['training_duration_min'], valid_data['final_map50'], 
                                 c=range(len(valid_data)), cmap='viridis', s=100, alpha=0.7)
            
            # Add trend line
            if len(valid_data) > 1:
                z = np.polyfit(valid_data['training_duration_min'], valid_data['final_map50'], 1)
                p = np.poly1d(z)
                ax2.plot(valid_data['training_duration_min'], p(valid_data['training_duration_min']), 
                        "r--", alpha=0.8, linewidth=2)
            
            ax2.set_xlabel('Training Duration (minutes)')
            ax2.set_ylabel('Final mAP@0.5')
            ax2.set_title('Training Efficiency Analysis')
            ax2.grid(True, alpha=0.3)
            
            # Add colorbar
            cbar = plt.colorbar(scatter, ax=ax2)
            cbar.set_label('Model Version (newer → older)')
            
            # Annotate points
            for i, row in valid_data.iterrows():
                ax2.annotate(f'v{list(valid_data.index).index(i)+1}', 
                           (row['training_duration_min'], row['final_map50']),
                           xytext=(5, 5), textcoords='offset points', fontsize=8)
        else:
            ax2.text(0.5, 0.5, 'Insufficient data for efficiency analysis', 
                    ha='center', va='center', transform=ax2.transAxes)
    else:
        ax2.text(0.5, 0.5, 'No efficiency data available', 
                ha='center', va='center', transform=ax2.transAxes)
        ax2.set_title('Training Efficiency Analysis - No Data')
    
    # 3. Hyperparameter Impact Analysis
    ax3 = axes[1, 0]
    if 'learning_rate' in comparison_df.columns and 'final_map50' in comparison_df.columns:
        valid_data = comparison_df.dropna(subset=['learning_rate', 'final_map50'])
        if not valid_data.empty and len(valid_data) > 1:
            # Group by learning rate and show performance
            lr_performance = valid_data.groupby('learning_rate')['final_map50'].agg(['mean', 'std', 'count'])
            
            bars = ax3.bar(range(len(lr_performance)), lr_performance['mean'], 
                          yerr=lr_performance['std'], capsize=5, alpha=0.7, color='lightgreen')
            
            ax3.set_xlabel('Learning Rate')
            ax3.set_ylabel('Average mAP@0.5')
            ax3.set_title('Learning Rate Impact on Performance')
            ax3.set_xticks(range(len(lr_performance)))
            ax3.set_xticklabels([f'{lr:.4f}' for lr in lr_performance.index], rotation=45)
            ax3.grid(True, alpha=0.3)
            
            # Add value labels
            for i, (bar, mean_val, count) in enumerate(zip(bars, lr_performance['mean'], lr_performance['count'])):
                ax3.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.01,
                        f'{mean_val:.3f}\\n(n={count})', ha='center', va='bottom', fontsize=8)
        else:
            ax3.text(0.5, 0.5, 'Insufficient hyperparameter data', 
                    ha='center', va='center', transform=ax3.transAxes)
    else:
        ax3.text(0.5, 0.5, 'No hyperparameter data available', 
                ha='center', va='center', transform=ax3.transAxes)
        ax3.set_title('Hyperparameter Impact Analysis - No Data')
    
    # 4. Model Approval Status Distribution
    ax4 = axes[1, 1]
    if 'approval_status' in comparison_df.columns:
        status_counts = comparison_df['approval_status'].value_counts()
        colors = {'Approved': 'lightgreen', 'PendingManualApproval': 'orange', 
                 'Rejected': 'lightcoral', 'InProgress': 'lightblue'}
        
        wedges, texts, autotexts = ax4.pie(status_counts.values, 
                                          labels=status_counts.index,
                                          autopct='%1.1f%%',
                                          colors=[colors.get(status, 'gray') for status in status_counts.index],
                                          startangle=90)
        
        ax4.set_title('Model Approval Status Distribution')
        
        # Enhance text
        for autotext in autotexts:
            autotext.set_color('white')
            autotext.set_fontweight('bold')
    else:
        ax4.text(0.5, 0.5, 'No approval status data', 
                ha='center', va='center', transform=ax4.transAxes)
        ax4.set_title('Model Approval Status - No Data')
    
    plt.tight_layout()
    plt.show()
    
    return fig

# Generate performance visualizations
if 'model_comparison_df' in locals() and model_comparison_df is not None:
    performance_fig = visualize_model_performance(model_comparison_df)
else:
    print("No comparison data available for visualization")

### 9.4: Statistical Performance Analysis

In [None]:
def perform_statistical_analysis(comparison_df):
    """Perform statistical analysis on model performance"""
    
    if comparison_df is None or comparison_df.empty:
        print("No data available for statistical analysis")
        return
    
    print("STATISTICAL PERFORMANCE ANALYSIS")
    print("=" * 60)
    
    # Basic statistics
    numeric_cols = ['final_map50', 'final_map50_95', 'final_val_loss', 'training_duration_min']
    available_cols = [col for col in numeric_cols if col in comparison_df.columns]
    
    if available_cols:
        stats_df = comparison_df[available_cols].describe()
        print("\nDescriptive Statistics:")
        print("-" * 40)
        display(stats_df.round(4))
    
    # Performance trends
    if 'final_map50' in comparison_df.columns and len(comparison_df) > 2:
        print("\nPerformance Trend Analysis:")
        print("-" * 40)
        
        # Sort by creation time for trend analysis
        trend_df = comparison_df.sort_values('creation_time')
        map50_values = trend_df['final_map50'].dropna()
        
        if len(map50_values) > 1:
            # Calculate trend
            x = np.arange(len(map50_values))
            slope, intercept, r_value, p_value, std_err = stats.linregress(x, map50_values)
            
            print(f"mAP@0.5 Trend:")
            print(f"  Slope: {slope:.6f} (per model version)")
            print(f"  R-squared: {r_value**2:.4f}")
            print(f"  P-value: {p_value:.4f}")
            
            if p_value < 0.05:
                trend_direction = "improving" if slope > 0 else "declining"
                print(f"  Interpretation: Statistically significant {trend_direction} trend")
            else:
                print(f"  Interpretation: No statistically significant trend")
            
            # Best and worst performing models
            best_idx = map50_values.idxmax()
            worst_idx = map50_values.idxmin()
            
            print(f"\nBest performing model:")
            print(f"  Job: {trend_df.loc[best_idx, 'training_job']}")
            print(f"  mAP@0.5: {map50_values[best_idx]:.4f}")
            
            print(f"\nWorst performing model:")
            print(f"  Job: {trend_df.loc[worst_idx, 'training_job']}")
            print(f"  mAP@0.5: {map50_values[worst_idx]:.4f}")
            
            # Performance improvement
            improvement = map50_values[best_idx] - map50_values[worst_idx]
            print(f"\nPerformance range: {improvement:.4f} mAP@0.5 points")
    
    # Hyperparameter correlation analysis
    if len(comparison_df) > 3:
        print("\nHyperparameter Correlation Analysis:")
        print("-" * 40)
        
        correlation_cols = ['final_map50', 'learning_rate', 'batch_size', 'epochs', 'training_duration_min']
        available_corr_cols = [col for col in correlation_cols if col in comparison_df.columns]
        
        if len(available_corr_cols) > 2:
            corr_df = comparison_df[available_corr_cols].corr()
            
            # Focus on correlations with performance metrics
            if 'final_map50' in corr_df.columns:
                map50_corr = corr_df['final_map50'].drop('final_map50').sort_values(key=abs, ascending=False)
                
                print("Correlation with mAP@0.5:")
                for param, corr_val in map50_corr.items():
                    if not pd.isna(corr_val):
                        strength = "strong" if abs(corr_val) > 0.7 else "moderate" if abs(corr_val) > 0.4 else "weak"
                        direction = "positive" if corr_val > 0 else "negative"
                        print(f"  {param}: {corr_val:.4f} ({strength} {direction})")
    
    # Model recommendation
    print("\nMODEL RECOMMENDATIONS:")
    print("-" * 40)
    
    if 'final_map50' in comparison_df.columns:
        # Find best performing approved model
        approved_models = comparison_df[comparison_df['approval_status'] == 'Approved']
        
        if not approved_models.empty:
            best_approved = approved_models.loc[approved_models['final_map50'].idxmax()]
            print(f"✅ Best approved model for deployment:")
            print(f"   Job: {best_approved['training_job']}")
            print(f"   mAP@0.5: {best_approved['final_map50']:.4f}")
        
        # Find best pending model
        pending_models = comparison_df[comparison_df['approval_status'] == 'PendingManualApproval']
        
        if not pending_models.empty:
            best_pending = pending_models.loc[pending_models['final_map50'].idxmax()]
            print(f"⏳ Best pending model for approval:")
            print(f"   Job: {best_pending['training_job']}")
            print(f"   mAP@0.5: {best_pending['final_map50']:.4f}")
            
            # Check if it's better than approved models
            if not approved_models.empty:
                improvement = best_pending['final_map50'] - approved_models['final_map50'].max()
                if improvement > 0.01:  # 1% improvement threshold
                    print(f"   💡 Recommendation: Consider approving (improvement: +{improvement:.4f})")

# Perform statistical analysis
if 'model_comparison_df' in locals() and model_comparison_df is not None:
    perform_statistical_analysis(model_comparison_df)
else:
    print("No comparison data available for statistical analysis")

## 10. Summary and Next Steps

In this enhanced notebook, we've executed and monitored a YOLOv11 training pipeline with comprehensive MLFlow tracking, SageMaker Model Registry integration, and advanced performance analysis capabilities. Here's a summary of what we've accomplished:

### Completed Tasks:

1. **Model Registry Setup**:
   - Created Model Package Group for organizing YOLOv11 models
   - Configured model registration workflow

2. **Pipeline Configuration with MLFlow**:
   - Listed available datasets
   - Configured training parameters with MLFlow experiment tracking

3. **Enhanced Pipeline Execution**:
   - Created and executed SageMaker training job with MLFlow integration
   - Logged all parameters, metrics, and metadata to MLFlow

4. **Comprehensive Monitoring**:
   - Monitored training job status with real-time updates to MLFlow
   - Tracked training duration and job status

5. **Automated Model Validation** ⭐ NEW:
   - Implemented automated validation before model registration
   - Performance threshold checking and quality gates
   - Automated model evaluation against baseline metrics

6. **Model Registration**:
   - Registered trained models in SageMaker Model Registry
   - Configured approval workflows for production deployment
   - Linked MLFlow runs with registered models

7. **Model Management**:
   - Listed and managed models in the registry
   - Implemented model approval workflows
   - Retrieved detailed model information

8. **Automated Deployment for Approved Models** ⭐ NEW:
   - Implemented automated deployment triggers for approved models
   - SageMaker endpoint creation with auto-scaling configuration
   - Blue/green deployment strategy for zero-downtime updates

9. **Experiment Management**:
   - Listed and compared MLFlow experiments
   - Searched and analyzed training runs
   - Compared model performance across runs

10. **Training Metrics Visualization**:
    - Retrieved training metrics from CloudWatch
    - Logged final metrics to MLFlow
    - Visualized training progress and model performance

11. **Advanced Model Performance Comparison** ⭐ NEW:
    - Comprehensive multi-dimensional model comparison
    - Statistical analysis of performance trends
    - Hyperparameter impact analysis
    - Automated model recommendations

### Key Enhanced Features:

- **Automated Validation Pipeline**: Models are automatically validated against performance thresholds before registration
- **Intelligent Deployment Automation**: Approved models are automatically deployed with proper infrastructure setup
- **Advanced Performance Analytics**: Statistical analysis and visualization of model performance across versions
- **Production-Ready Workflows**: Complete automation from training to deployment with proper governance
- **Comprehensive Model Comparison**: Multi-dimensional analysis including accuracy, efficiency, and hyperparameter impact

### Next Steps:

1. **A/B Testing Framework**: Implement automated A/B testing for model comparison in production
2. **Model Monitoring**: Set up comprehensive data drift and model performance monitoring
3. **Automated Retraining**: Implement automated retraining based on performance degradation
4. **Cost Optimization**: Add cost analysis and optimization recommendations
5. **Multi-Region Deployment**: Extend deployment automation to multiple regions

### Production Readiness Enhancements:

- **Validation Gates**: Automated quality gates prevent poor-performing models from reaching production
- **Zero-Downtime Deployment**: Blue/green deployment strategy ensures continuous service availability
- **Performance Tracking**: Comprehensive tracking and comparison of model versions over time
- **Statistical Analysis**: Data-driven insights for model selection and hyperparameter optimization
- **Automated Recommendations**: AI-driven recommendations for model approval and deployment decisions

This enhanced workflow provides a complete, production-ready foundation for YOLOv11 model development and deployment with advanced analytics, automated validation, and intelligent deployment capabilities.