# Lab 3: Model Evaluation and Registry with Governance

## Overview

In this lab, you'll complete the governance lifecycle by evaluating your fine-tuned model and registering it in the **SageMaker Model Registry**. This establishes a centralized catalog for model versioning, approval workflows, and deployment tracking.

### What You'll Accomplish
- Compare fine-tuned vs. base model performance
- Create model cards with governance metadata
- Register models in SageMaker Model Registry
- Establish approval workflows for production deployment

### Why Model Registry Matters for Governance
- **Version Control**: Track all model versions with complete lineage
- **Approval Workflows**: Require manual approval before production deployment
- **Centralized Catalog**: Single source of truth for all models
- **Deployment Tracking**: Know which model version is deployed where
- **Model Cards**: Document model purpose, risks, and business context

## Set-Up

### Setup and Dependencies

Install required libraries and restart the kernel to ensure all packages are properly loaded.

In [None]:
!pip install -U sagemaker==2.253.1 datasets==4.4.1 mlflow==3.5.1 tiktoken evaluate==0.4.0 rouge_score metrics --quiet 
# restart kernel
import IPython
IPython.Application.instance().kernel.do_shutdown(True) #automatically restarts kernel

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.jumpstart.estimator import JumpStartEstimator
from sagemaker.jumpstart.model import JumpStartModel

# Initialize SageMaker session
sess = sagemaker.Session()
role = get_execution_role()
region = sess.boto_region_name
account_id = role.split(':')[4]

# Replace it with your account id and region. You can also find the bucket to use from the CloudFormation output: DataBucketName
bucket = "{replace-with-your-DataBucketName-from-cloudformation-output}"
region = sess.boto_region_name

sm_client = boto3.client('sagemaker', region_name=region)

print(f"Amazon SageMaker role: {role}")
print(f"Amazon S3 bucket: {bucket}")
print(f"AWS Region: {region}")

### Connect to MLflow App 

Retrieve the MLflow App detail that stores all experiment metadata from Lab 3. This connection allows us to access model artifacts, parameters, and metrics.

In [None]:
try:
    response = sm_client.list_mlflow_apps(MaxResults=10)
    mlflow_apps = response.get('Summaries', [])
    
    if mlflow_apps:
        active_apps = [s for s in mlflow_apps if s['Status'] == 'Created']
        
        if active_apps:
            mlflow_app_arn = active_apps[0]['Arn']
            mlflow_app_name = active_apps[0]['Name']
            print(f"âœ“ Found active MLflow App:")
            print(f"  Name: {mlflow_app_name}")
            print(f"  ARN: {mlflow_app_arn}")
        else:
            print("âš  No active MLflow Apps found.")
            mlflow_app_arn = None
    else:
        print("âš  No MLflow Apps found in this region.")
        mlflow_app_arn = None
except Exception as e:
    print(f"Error checking for MLflow Apps: {e}")
    mlflow_app_arn = None

### Load data from the MLFlow run

We use the `%store` magic command to retrieve variables saved from the previous notebook.

In [None]:
%store -r experiment_name
%store -r fine_tuned_model_endpoint_name
%store -r base_model_endpoint_name

print(experiment_name)
print(fine_tuned_model_endpoint_name)
print(base_model_endpoint_name)

In [None]:
import mlflow
mlflow.set_tracking_uri(mlflow_app_arn)

runs = mlflow.search_runs(
    experiment_names=[experiment_name],
    filter_string="tags.mlflow.runName LIKE '%fine-tuning%'",
    order_by=["start_time DESC"],
    max_results=1
)
run_id = runs.iloc[0].run_id
print(run_id)

## Step 1: Model Evaluation

### Why Evaluate Models?

Before registering a model, we need to **quantify its performance** to:
- Validate that fine-tuning improved the model
- Establish baseline metrics for future comparisons
- Document performance for governance and compliance
- Make data-driven decisions about model deployment

### Evaluation Metrics for Summarization

We'll use standard NLP metrics:
- **BLEU**: Measures n-gram overlap between generated and reference text
- **ROUGE-1**: Unigram overlap (individual word matches)
- **ROUGE-2**: Bigram overlap (two-word phrase matches)
- **ROUGE-L**: Longest common subsequence (captures sentence structure)

Higher scores indicate better alignment with ground truth summaries.

### Load Evaluation Metrics

Import custom metric functions that calculate BLEU and ROUGE scores.

In [None]:
from metrics import rouge1, rouge2, rougeL, bleu

### Retrieve Evaluation Dataset

We retrieve the evaluation dataset that was logged to MLflow during Lab 3. This demonstrates **data lineage** - we can trace exactly which data was used to evaluate this model.

In [None]:
run = mlflow.get_run(run_id)
dataset_inputs = run.inputs.dataset_inputs

dataset_info = next(
    (d.dataset for d in dataset_inputs if any(tag.value == "evaluation" for tag in d.tags)),
    None
)
dataset_info

if dataset_info:
    source = mlflow.data.get_source(dataset_info)
    jsonl_path = source.load()
else:
    print("No dataset with context 'evaluation' found")

In [None]:
import pandas as pd
evaluation_dataset = pd.read_json(jsonl_path, orient='records', lines=True)
evaluation_dataset.head()

### Define Endpoint Invocation Function

This helper function invokes SageMaker endpoints to get predictions from both the base model and fine-tuned model.

In [None]:
import boto3
import json

def invoke_endpoint(endpoint_name, payload):
    """
    Invoke a SageMaker endpoint with the given payload.
    
    Args:
        endpoint_name (str): Name of the SageMaker endpoint
        payload: The data to send to the endpoint (can be JSON, image bytes, etc.)
    
    Returns:
        The response from the endpoint
    """
    # Create a SageMaker runtime client
    runtime_client = boto3.client('sagemaker-runtime')
    
    try:
        # Call the endpoint
        response = runtime_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',  # Adjust based on your endpoint's requirements
            Body=json.dumps(payload) if isinstance(payload, (dict, list)) else payload
        )
        
        # Get the response body
        response_body = response['Body'].read().decode('utf-8')
        
        # Parse the response if it's JSON
        try:
            return json.loads(response_body)
        except json.JSONDecodeError:
            return response_body
            
    except Exception as e:
        print(f"Error invoking endpoint: {str(e)}")
        raise

### Create Prompt Template

Define the instruction format used during fine-tuning. This ensures consistent prompt formatting when evaluating both models.

In [None]:
import json

template = {
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
    "completion": " {response}",
}

with open("template.json", "w") as f:
    json.dump(template, f)

### Generate Predictions from Both Models

This cell performs the actual evaluation by:
1. Sending each test example to both the base model and fine-tuned model
2. Collecting predictions from both models
3. Comparing them against ground truth responses

**Note**: This may take several minutes as we're invoking endpoints 20 times for each model (40 total invocations).

In [None]:
from datasets import Dataset

test_dataset = Dataset.from_pandas(evaluation_dataset)

import pandas as pd

(
    inputs,
    ground_truth_responses,
    responses_before_finetuning,
    responses_after_finetuning,
) = (
    [],
    [],
    [],
    [],
)


def predict_and_print(datapoint):
    # For instruction fine-tuning, we insert a special key between input and output
    input_output_demarkation_key = "\n\n### Response:\n"

    payload = {
        "inputs": template["prompt"].format(
            instruction=datapoint["instruction"], context=datapoint["context"]
        )
        + input_output_demarkation_key,
        "parameters": {"max_new_tokens": 100},
    }
    inputs.append(payload["inputs"])
    ground_truth_responses.append(datapoint["response"])
    pretrained_response =invoke_endpoint(base_model_endpoint_name, payload)
    responses_before_finetuning.append(pretrained_response.get("generated_text"))
    finetuned_response = invoke_endpoint(fine_tuned_model_endpoint_name, payload)
    responses_after_finetuning.append(finetuned_response.get("generated_text"))


try:
    for i, datapoint in enumerate(test_dataset.select(range(20))):
        predict_and_print(datapoint)

    df = pd.DataFrame(
        {
            "Inputs": inputs,
            "Ground Truth": ground_truth_responses,
            "Response from non-finetuned model": responses_before_finetuning,
            "Response from fine-tuned model": responses_after_finetuning,
        }
    )
except Exception as e:
    print(e)

### Prepare Data for Evaluation

Create DataFrames with predictions and ground truth for both models. MLflow's evaluate function requires this specific format.

In [None]:
min_len = min(len(responses_before_finetuning), len(ground_truth_responses))

In [None]:
df_before = pd.DataFrame({
    "predictions": responses_before_finetuning[:min_len],
    "targets": ground_truth_responses[:min_len]
})

df_after = pd.DataFrame({
    "predictions": responses_after_finetuning[:min_len],
    "targets": ground_truth_responses[:min_len]
})

### Calculate and Compare Metrics

Use MLflow's evaluate function to calculate metrics for both models. This:
- Automatically logs metrics to MLflow for tracking
- Creates separate runs for base and fine-tuned model evaluation
- Enables side-by-side comparison in the MLflow UI

**Expected Outcome**: The fine-tuned model should show higher scores across all metrics.

In [None]:
from datetime import datetime
import logging

logging.getLogger("mlflow").setLevel(logging.ERROR)

timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name=f"base-model-eval-{timestamp}"):
    result_before = mlflow.evaluate(
        data=df_before,
        targets="targets",
        predictions="predictions",
        extra_metrics=[bleu, rouge1, rouge2, rougeL]
    )

with mlflow.start_run(run_name=f"fine-tuned-model-eval-{timestamp}"):
    result_after = mlflow.evaluate(
        data=df_after,
        targets="targets",
        predictions="predictions",
        extra_metrics=[bleu, rouge1, rouge2, rougeL]

    )

print("\n=== Base Model ===")
print(f"BLEU:    {result_before.metrics['bleu']:.4f}")
print(f"ROUGE-1: {result_before.metrics['rouge1']:.4f}")
print(f"ROUGE-2: {result_before.metrics['rouge2']:.4f}")
print(f"ROUGE-L: {result_before.metrics['rougeL']:.4f}")

print("\n=== Fine-tuned Model ===")
print(f"BLEU:    {result_after.metrics['bleu']:.4f}")
print(f"ROUGE-1: {result_after.metrics['rouge1']:.4f}")
print(f"ROUGE-2: {result_after.metrics['rouge2']:.4f}")
print(f"ROUGE-L: {result_after.metrics['rougeL']:.4f}")

### Evaluation Results

âœ… **The fine-tuned model outperforms the base model** across all metrics!

This quantitative evidence:
- Validates that fine-tuning was successful
- Provides metrics for governance documentation
- Justifies model registration and potential deployment
- Creates a baseline for future model versions

These metrics are now logged in MLflow and can be viewed in the tracking UI.

## Step 2: Model Registration

### Understanding SageMaker Model Registry

The SageMaker Model Registry is a **centralized repository** for managing ML models throughout their lifecycle. It provides:

**Key Features:**
- **Model Package Groups**: Logical grouping of related model versions
- **Versioning**: Automatic version tracking for each registered model
- **Approval Status**: Workflow states (Pending, Approved, Rejected)
- **Model Cards**: Embedded documentation with governance metadata
- **Lineage**: Links to training jobs, datasets, and experiments

**Governance Benefits:**
- **Audit Trail**: Complete history of model versions and approvals
- **Access Control**: IAM-based permissions for model deployment
- **Compliance**: Documentation required for regulatory requirements
- **Deployment Tracking**: Know which version is deployed where

### Model Registration Workflow
1. Retrieve model artifacts from MLflow
2. Register model with approval status
3. Create model card with business context
4. Set up lifecycle stages (Development â†’ Staging â†’ Production)

### Retrieve the Run details

### Get Container Image

For model registration, we need the **inference container image** used by JumpStart. This ensures the model can be deployed with the correct runtime environment.

In [None]:
params = run.data.params
container_image = params['image_uri']
print(container_image)

### Retrieve model artifact path

We need to extract the S3 location of the trained model from MLflow. This demonstrates **lineage tracking** - connecting MLflow experiments to Model Registry entries.

In [None]:
# import json
from mlflow.tracking import MlflowClient

model_data = mlflow.artifacts.load_dict(run.info.artifact_uri + "/model_info.json")
data = json.dumps(model_data)
model_path = json.loads(data)['model_artifact']
print("=======")
print(f"Model artifact location in Amazon S3: {model_path}")

### Create Model Package Group

A **Model Package Group** is a logical container for related model versions. Think of it as a "model family" where:
- All versions of the same model type are grouped together
- Each registration creates a new version (1, 2, 3, etc.)
- You can compare versions and track evolution over time

Example: `llama-summarization-models` might contain v1 (initial), v2 (improved), v3 (production).

In [None]:
# Create model package group
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")

model_package_group_name = f"llama-summarization-models-{timestamp}"

try:
    model_package_group_arn = sm_client.create_model_package_group(
        ModelPackageGroupName=model_package_group_name,
        ModelPackageGroupDescription="Fine-tuned Llama models for text summarization"
    )
    print(f"Created model package group: {model_package_group_name}")
except sm_client.exceptions.ResourceInUse:
    print(f"Model package group already exists: {model_package_group_name}")


**To view the Model Package Group you just created:**
1. Navigate to **SageMaker AI Studio**
2. In the left sidebar, scroll down and click **Models**
3. You'll see your model package group listed
   
![MLFlow Experiment](../../images/model-package-group.png)

### Register Model Package

Now we create the model package registration request with:
- **Model Package Group**: Logical grouping (e.g., "llama-summarization-models")
- **Container Image**: Inference runtime environment
- **Model Data URL**: S3 location of model artifacts
- **Model Card**: Governance metadata
- **Approval Status**: PendingManualApproval (requires explicit approval)

This creates a **versioned model entry** in the registry with complete lineage.

### Upload Metrics and Register Model

This cell performs the actual model registration:

**Step 1: Upload Metrics to S3**
- Package evaluation metrics in the required JSON format
- Upload to S3 so they can be referenced by the model package

**Step 2: Create Model Package**
- Links the model artifacts (model.tar.gz) from Lab 3
- Associates the container image for inference
- Attaches evaluation metrics
- Sets approval status to `PendingManualApproval`

**Approval Status Options:**
- `PendingManualApproval`: Requires explicit approval (recommended for production)
- `Approved`: Automatically approved (use for development/testing)
- `Rejected`: Explicitly rejected


In [None]:
import boto3
import json

# Upload metrics first
metrics_report = {
    "multiclass_classification_metrics": {
        "bleu": {"value": result_after.metrics['bleu'], "standard_deviation": 0.0},
        "rouge1": {"value": result_after.metrics['rouge1'], "standard_deviation": 0.0},
        "rouge2": {"value": result_after.metrics['rouge2'], "standard_deviation": 0.0},
        "rougeL": {"value": result_after.metrics['rougeL'], "standard_deviation": 0.0}
    }
}

with open('evaluation.json', 'w') as f:
    json.dump(metrics_report, f)

s3 = boto3.client('s3')
s3.upload_file('evaluation.json', bucket, 'model-metrics/evaluation.json')

# Register model with metrics and model card
sm_client = boto3.client('sagemaker')
model_package = sm_client.create_model_package(
    ModelPackageGroupName=model_package_group_name,
    ModelPackageDescription="Fine-tuned Llama 3.2 for summarization",
    ModelApprovalStatus="PendingManualApproval",
    ModelMetrics={
        "ModelQuality": {
            "Statistics": {
                "ContentType": "application/json",
                "S3Uri": f"s3://{bucket}/model-metrics/evaluation.json"
            }
        }
    },
    InferenceSpecification={
        "Containers": [{
            "Image": container_image,
            "ModelDataUrl": model_path
        }],
        "SupportedContentTypes": ["application/json"],
        "SupportedResponseMIMETypes": ["application/json"]
    }
)


**To view it:**
1. Switch back to SageMaker AI Studio
2. In the Model Registry, click on the name of the model package group you created earlier.
3.  You will see the latest version: Version 1
4.  Click on it to view its details as per the following screenshot:
   
![MLFlow Experiment](../../images/model-package.png)

### Create Model Card

A **Model Card** is a structured document that provides transparency about a machine learning model. It's essential for:
- **Governance**: Documents model purpose, risks, and limitations
- **Compliance**: Meets regulatory requirements (e.g., EU AI Act, GDPR)
- **Transparency**: Helps stakeholders understand model behavior
- **Risk Management**: Identifies potential issues and mitigation strategies

### Model Card Sections

1. **Model Overview**: Creator, artifacts, version information
2. **Intended Uses**: Purpose, use cases, risk rating
3. **Business Details**: Problem statement, stakeholders, business unit
4. **Training Details**: Methodology, datasets, performance metrics
5. **Additional Information**: Ethical considerations, caveats, custom metadata

This model card will be embedded in the Model Registry entry, creating a **permanent governance record**.

### Create Model Card Content

Build the model card with comprehensive governance information. Each section serves a specific purpose:

**Model Overview**: Who created it, where artifacts are stored
**Intended Uses**: What problem it solves, risk assessment
**Business Details**: Business context and stakeholders
**Training Details**: How the model was trained
**Additional Information**: Ethical considerations, custom metadata

This information becomes part of the permanent model record.

In [None]:
import mlflow
import json

model_card_content = {
    "model_overview": {
        "model_creator": "Data Science Team",
        "model_artifact": [model_path]  # You need to define s3_bucket and s3_key
    },
    "intended_uses": {
        "purpose_of_model": "Text summarisation",
        "intended_uses": "Answer to summarisation questions",
        "factors_affecting_model_efficiency": "Question complexity, technical domain coverage, input length",
        "risk_rating": "Low",
        "explanations_for_risk_rating": "Model provides informational summaries without making critical decisions"

    },
   "business_details": {
        "business_problem": "Improve efficiency of technical support and documentation access",
        "business_stakeholders": "Technical support team, Documentation team, End users",
        "line_of_business": "Customer Support & Knowledge Management"
    },
    "training_details": {
        "objective_function": {
            "function": "Instruction fine-tuning",
            "notes": "Fine-tuned Llama 3.2 3B on summarization tasks using Dolly dataset"
        },
        "training_observations": "Model trained to generate concise, accurate summaries of technical content"
    },
    "additional_information": {
        "ethical_considerations": "Ensure fair lending practices, avoid discriminatory outcomes",
        "caveats_and_recommendations": "Regular monitoring for model drift, periodic retraining with updated data",
        "custom_details": {
            "UseCaseId": "002",
            "UseCaseName": "Summarisation",
            "UseCaseStage": "Development"
        }
    }

}

# Save the model card
with open('model_card.json', 'w') as f:
    json.dump(model_card_content, f, indent=2)
print(model_card_content)
print("Model card has been created and saved.")


In [None]:
model_package_arn = model_package['ModelPackageArn']
model_package_arn

In [None]:
sm_client.update_model_package(
    ModelPackageArn=model_package_arn,
    CustomerMetadataProperties={
        "creator": "Data Science Team",
        "use_case": "Text Summarization",
        "business_problem": "Improve efficiency of technical support",
        "risk_rating": "Low",
        "model_type": "Fine-tuned Llama 3.2 3B"
    }
)


### Attach Model Card to the Model Package Group

Update the model card in SageMaker to associate it with the registered model package.

In [None]:
model_package_arn = model_package['ModelPackageArn']

response = sm_client.create_model_card(
    ModelCardName=f"model-card-{model_package_group_name}".replace("_", "-"),
    Content=json.dumps(model_card_content),
    ModelCardStatus="Draft"
)

In [None]:
model_card_arn = response['ModelCardArn']

In [None]:
# Link it to the model package via metadata (must be string)
sm_client.update_model_package(
    ModelPackageArn=model_package_arn,
    CustomerMetadataProperties={
        "model_card": model_card_arn
    }
)

## Step 3: Model Lifecycle Management

### Understanding Model Lifecycle Stages

SageMaker Model Registry supports **lifecycle stages** to track model progression through your ML workflow:

**Typical Stages:**
- **Development**: Model is being developed and tested
- **Staging**: Model is ready for pre-production testing
- **Production**: Model is approved for production deployment
- **Archived**: Model is retired from active use

**Stage Status:**
- **PendingApproval**: Awaiting review
- **Approved**: Cleared for use in this stage
- **Rejected**: Not approved for this stage

This creates a **formal approval workflow** ensuring only validated models reach production.

### Setting Lifecycle Stage

We'll mark this model as `Development/Approved` since it has passed evaluation but isn't ready for production yet.

### Update Model Lifecycle

This function updates the lifecycle stage of the registered model. You can modify the `Stage` and `StageStatus` values based on your organization's workflow.

**Customization Options:**
- Change `Stage` to: Development, Staging, Production, or Archived
- Change `StageStatus` to: PendingApproval, Approved, or Rejected
- Add `StageDescription` to document why the model is in this stage

In [None]:
# Update Model Lifecycle
import boto3
import json

print("Boto3 version:", boto3. __version__)

def update_model_lifecycle(model_package_info, model_package_update_input_dict):
    sagemaker_client = boto3.client('sagemaker')
    try:

         # Extract the ARN from the model_package_info
        model_package_arn = model_package_info.get('ModelPackageArn')
        
        if not model_package_arn:
            raise ValueError("ModelPackageArn not found in the provided information")

        # Ensure ModelPackageArn is in the input dictionary
        model_package_update_input_dict['ModelPackageArn'] = model_package_arn
        response = sagemaker_client.update_model_package(**model_package_update_input_dict)
        
        print(f"Model lifecycle updated successfully for {model_package_arn}")
        return response
    except Exception as e:
        print(f"Error updating model lifecycle: {str(e)}")
        return None

# Update Model Cycle Info

model_package_info = {
    'ModelPackageGroupName': model_package_group_arn,
    'ModelPackageArn': model_package_arn,
}

# Update the staging values as needed for your projects
# cfr https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-staging-construct-set-up.html

model_package_update_input_dict = {
    'ModelLifeCycle': {
        'Stage': 'Development',
        'StageDescription': 'Model trained and evaluated in development environment',
        'StageStatus': 'Approved' # PendingApproval/Approved/Rejected
    },
}

result = update_model_lifecycle(model_package_info, model_package_update_input_dict)

if result:
    print("Update successful")
    print(json.dumps(result, indent=2))
else:
    print("Update failed")

## Step 4. Delete SageMaker AI real-time endpoints

Now that you have completed all your evaluations, you can delete the SageMaker real-time endpoints you created in previous steps

In [None]:
import boto3
def delete_all_endpoints():
    """Delete all existing SageMaker endpoints"""
    
    # Create a SageMaker client
    sagemaker_client = boto3.client('sagemaker')
    
    try:
        # List all endpoints
        endpoints = sagemaker_client.list_endpoints()
        
        if not endpoints['Endpoints']:
            print("No endpoints found to delete.")
            return
        
        # Delete each endpoint
        for endpoint in endpoints['Endpoints']:
            endpoint_name = endpoint['EndpointName']
            try:
                print(f"Deleting endpoint: {endpoint_name}")
                sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
                print(f"Successfully deleted endpoint: {endpoint_name}")
            except Exception as e:
                print(f"Error deleting endpoint {endpoint_name}: {str(e)}")
                
    except Exception as e:
        print(f"Error listing endpoints: {str(e)}")


In [None]:
delete_all_endpoints()

## ðŸŽ‰ Congratulations! Lab 3 Complete

### What You Accomplished

In this lab, you successfully:

1. âœ… **Evaluated model performance** using BLEU and ROUGE metrics
2. âœ… **Compared base vs. fine-tuned models** with quantitative evidence
3. âœ… **Created comprehensive model cards** with governance metadata
4. âœ… **Registered models** in SageMaker Model Registry with versioning
5. âœ… **Established approval workflows** with lifecycle management
6. âœ… **Linked evaluation metrics** to model registry entries

### Complete Governance Workflow

**Lab 3 + Lab 4 = End-to-End ML Governance**

Data Preparation â†’ Fine-Tuning â†’ MLflow Tracking â†’ Model Evaluation â†’ Model Registry â†’ Approval Workflow

### Key Governance Capabilities

- **Lineage Tracking**: Trace models back to training data
- **Auditability**: Complete audit trail for compliance
- **Reproducibility**: Recreate any model version exactly
- **Compliance**: Model cards meet regulatory requirements
- **Version Control**: Compare and rollback models
- **Approval Workflows**: Prevent unauthorized deployments

### Viewing Your Work

- **SageMaker Console**: Navigate to Model Registry to view versions and model cards
- **MLflow UI**: Compare evaluation runs side-by-side
- **Programmatic Access**: Use boto3 to list model packages



### Next Steps

- Approve models for staging/production environments
- Implement CI/CD pipelines for automated deployment
- Set up model monitoring for drift detection
- Create governance dashboards

**Thank you for completing Lab 3!**