# ML Engineer Core Workflow

This notebook provides essential functionality for ML Engineers to execute and manage YOLOv11 training pipelines. It focuses on core capabilities while maintaining simplicity.

## Workflow Overview

1. **Pipeline Configuration**: Set up YOLOv11 training pipeline parameters
2. **Pipeline Execution**: Execute the training pipeline
3. **Pipeline Monitoring**: Monitor training progress and results

## Prerequisites

- AWS account with appropriate permissions
- AWS CLI configured with "ab" profile
- SageMaker Studio access with ML Engineer role
- Access to the drone imagery dataset in S3 bucket: `lucaskle-ab3-project-pv`
- Labeled data in YOLOv11 format

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
import os
import boto3
import sagemaker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import json
import time
from IPython.display import display, HTML

# Set up AWS session with "ab" profile
session = boto3.Session(profile_name='ab')
sagemaker_session = sagemaker.Session(boto_session=session)
sagemaker_client = session.client('sagemaker')
region = session.region_name
account_id = session.client('sts').get_caller_identity()['Account']

# Set up visualization
plt.rcParams["figure.figsize"] = (12, 6)

# Define bucket name
BUCKET_NAME = 'lucaskle-ab3-project-pv'
ROLE_ARN = sagemaker_session.get_caller_identity_arn()

print(f"Data Bucket: {BUCKET_NAME}")
print(f"Region: {region}")
print(f"Account ID: {account_id}")
print(f"Role ARN: {ROLE_ARN}")

## 0. Core SageMaker Pipeline (NEW)

The core setup now includes a simplified SageMaker Pipeline for YOLOv11 training. This section shows how to use the pipeline for streamlined model training.

### 0.1 List Available Pipelines

First, let's check what pipelines are available in our account.

In [None]:
# Function to list SageMaker pipelines
def list_core_pipelines():
    """List available core SageMaker pipelines"""
    try:
        response = sagemaker_client.list_pipelines(
            SortBy='CreationTime',
            SortOrder='Descending',
            MaxResults=50
        )
        
        pipelines = response.get('PipelineSummaries', [])
        
        # Filter for core setup pipelines
        core_pipelines = [
            p for p in pipelines 
            if 'sagemaker-core-setup' in p['PipelineName'] or 'yolov11' in p['PipelineName'].lower()
        ]
        
        if core_pipelines:
            print(f"Found {len(core_pipelines)} core pipeline(s):")
            print("=" * 60)
            
            for i, pipeline in enumerate(core_pipelines, 1):
                print(f"{i}. {pipeline['PipelineName']}")
                print(f"   Status: {pipeline['PipelineStatus']}")
                print(f"   Created: {pipeline['CreationTime'].strftime('%Y-%m-%d %H:%M:%S')}")
                if 'PipelineDescription' in pipeline:
                    print(f"   Description: {pipeline['PipelineDescription']}")
                print()
        else:
            print("No core pipelines found.")
            print("\nTo create a core pipeline, run:")
            print("!cd ../scripts/setup && ./setup_core_pipeline.sh --profile ab")
        
        return core_pipelines
        
    except Exception as e:
        print(f"Error listing pipelines: {str(e)}")
        return []

# List available pipelines
available_pipelines = list_core_pipelines()

### 0.2 Execute Core Pipeline

Now let's execute a pipeline with our dataset. Make sure you have a dataset prepared using the Data Scientist notebook.

In [None]:
# Function to execute a SageMaker pipeline
def execute_core_pipeline(pipeline_name, parameters=None):
    """Execute a SageMaker pipeline with optional parameters"""
    
    # Prepare execution parameters
    execution_params = []
    if parameters:
        for key, value in parameters.items():
            execution_params.append({
                'Name': key,
                'Value': str(value)
            })
    
    # Generate execution name
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    execution_name = f"{pipeline_name}-execution-{timestamp}"
    
    try:
        # Start pipeline execution
        response = sagemaker_client.start_pipeline_execution(
            PipelineName=pipeline_name,
            PipelineExecutionDisplayName=execution_name,
            PipelineParameters=execution_params
        )
        
        execution_arn = response['PipelineExecutionArn']
        
        print(f"✅ Pipeline execution started!")
        print(f"Pipeline: {pipeline_name}")
        print(f"Execution ARN: {execution_arn}")
        
        if parameters:
            print(f"\nParameters:")
            for key, value in parameters.items():
                print(f"  {key}: {value}")
        
        return execution_arn
        
    except Exception as e:
        print(f"❌ Failed to execute pipeline: {str(e)}")
        return None

# Configure pipeline execution parameters
pipeline_parameters = {
    # Update this with your dataset path from the Data Scientist notebook
    'InputData': f"s3://{BUCKET_NAME}/datasets/",  # Update with specific dataset path
    'Epochs': 10,
    'BatchSize': 16,
    'LearningRate': 0.001,
    'TrainingInstanceType': 'ml.g4dn.xlarge',
    'ModelVariant': 'yolov11n'  # Options: yolov11n, yolov11s, yolov11m, yolov11l, yolov11x
}

print("Pipeline Parameters:")
for key, value in pipeline_parameters.items():
    print(f"  {key}: {value}")

# Execute pipeline if available
if available_pipelines:
    # Use the first available pipeline
    selected_pipeline = available_pipelines[0]['PipelineName']
    print(f"\nExecuting pipeline: {selected_pipeline}")
    
    # Uncomment the line below to execute the pipeline
    # execution_arn = execute_core_pipeline(selected_pipeline, pipeline_parameters)
    print("\n⚠️  Pipeline execution is commented out for safety.")
    print("Uncomment the execution line above to run the pipeline.")
else:
    print("\nNo pipelines available for execution.")
    print("Please create a pipeline first using the setup script.")

### 0.3 Monitor Pipeline Execution

Let's create functions to monitor pipeline execution progress.

In [None]:
# Function to monitor pipeline execution
def monitor_pipeline_execution(execution_arn):
    """Monitor pipeline execution status"""
    try:
        response = sagemaker_client.describe_pipeline_execution(
            PipelineExecutionArn=execution_arn
        )
        
        status = response['PipelineExecutionStatus']
        creation_time = response['CreationTime']
        pipeline_name = response['PipelineName']
        
        print(f"Pipeline: {pipeline_name}")
        print(f"Status: {status}")
        print(f"Started: {creation_time.strftime('%Y-%m-%d %H:%M:%S')}")
        
        if 'LastModifiedTime' in response:
            last_modified = response['LastModifiedTime']
            duration = last_modified - creation_time
            print(f"Last Modified: {last_modified.strftime('%Y-%m-%d %H:%M:%S')}")
            print(f"Duration: {duration}")
        
        if 'FailureReason' in response:
            print(f"❌ Failure Reason: {response['FailureReason']}")
        
        return response
        
    except Exception as e:
        print(f"Error monitoring execution: {str(e)}")
        return None

# Function to list recent executions
def list_pipeline_executions(pipeline_name, max_results=10):
    """List recent pipeline executions"""
    try:
        response = sagemaker_client.list_pipeline_executions(
            PipelineName=pipeline_name,
            SortBy='CreationTime',
            SortOrder='Descending',
            MaxResults=max_results
        )
        
        executions = response.get('PipelineExecutionSummaries', [])
        
        if executions:
            print(f"Recent executions for {pipeline_name}:")
            print("=" * 60)
            
            for i, execution in enumerate(executions, 1):
                print(f"{i}. {execution['PipelineExecutionDisplayName']}")
                print(f"   Status: {execution['PipelineExecutionStatus']}")
                print(f"   Started: {execution['StartTime'].strftime('%Y-%m-%d %H:%M:%S')}")
                if 'EndTime' in execution:
                    print(f"   Ended: {execution['EndTime'].strftime('%Y-%m-%d %H:%M:%S')}")
                print(f"   ARN: {execution['PipelineExecutionArn']}")
                print()
        else:
            print(f"No executions found for pipeline: {pipeline_name}")
        
        return executions
        
    except Exception as e:
        print(f"Error listing executions: {str(e)}")
        return []

# Example usage (uncomment to use with actual execution ARN)
# if 'execution_arn' in locals():
#     monitor_pipeline_execution(execution_arn)

# List recent executions for available pipelines
if available_pipelines:
    pipeline_name = available_pipelines[0]['PipelineName']
    recent_executions = list_pipeline_executions(pipeline_name)
else:
    print("No pipelines available to check executions.")

### 0.4 Pipeline Management Commands

Here are some useful commands for managing pipelines from the command line:

In [None]:
# Display useful pipeline management commands
print("🔧 Pipeline Management Commands")
print("=" * 40)

print("\n1. Create a new pipeline:")
print("   !cd ../scripts/setup && ./setup_core_pipeline.sh --profile ab")

print("\n2. List available pipelines:")
print("   !cd ../scripts/setup && ./execute_core_pipeline.py --list-pipelines --profile ab")

print("\n3. Execute a pipeline:")
print("   !cd ../scripts/setup && ./execute_core_pipeline.py \\")
print("       --pipeline-name PIPELINE_NAME \\")
print("       --input-data s3://lucaskle-ab3-project-pv/datasets/your_dataset/ \\")
print("       --epochs 20 \\")
print("       --batch-size 32 \\")
print("       --profile ab")

print("\n4. Monitor execution:")
print("   !cd ../scripts/setup && ./execute_core_pipeline.py --monitor EXECUTION_ARN --profile ab")

print("\n5. List recent executions:")
print("   !cd ../scripts/setup && ./execute_core_pipeline.py --list-executions PIPELINE_NAME --profile ab")

print("\n📚 Documentation:")
print("   - Pipeline setup guide: ../scripts/setup/CORE_PIPELINE_README.md")
print(f"   - SageMaker Console: https://{region}.console.aws.amazon.com/sagemaker/home?region={region}#/pipelines")

print("\n💡 Tips:")
print("   - Use Data Scientist notebook to prepare datasets first")
print("   - Start with small datasets and short training for testing")
print("   - Monitor costs when using GPU instances")
print("   - Check CloudWatch logs for detailed training information")

## 1. Pipeline Configuration

Let's configure our YOLOv11 training pipeline parameters.

In [None]:
# Function to list available datasets
def list_datasets(bucket, prefix="datasets/"):
    """List available datasets in S3"""
    s3_client = session.client('s3')
    response = s3_client.list_objects_v2(
        Bucket=bucket,
        Prefix=prefix,
        Delimiter='/'
    )
    
    datasets = []
    if 'CommonPrefixes' in response:
        for obj in response['CommonPrefixes']:
            dataset_prefix = obj['Prefix']
            dataset_name = dataset_prefix.split('/')[-2]
            datasets.append({
                'name': dataset_name,
                'prefix': dataset_prefix
            })
    
    return datasets

# List available datasets
datasets = list_datasets(BUCKET_NAME)

print(f"Found {len(datasets)} datasets:")
for i, dataset in enumerate(datasets):
    print(f"  {i+1}. {dataset['name']} - s3://{BUCKET_NAME}/{dataset['prefix']}")

# If no datasets found, provide instructions
if not datasets:
    print("\nNo datasets found. Please prepare a dataset using the Data Scientist notebook first.")
    print("The dataset should be organized in the following structure:")
    print("s3://lucaskle-ab3-project-pv/datasets/your_dataset_name/")
    print("├── train/")
    print("│   ├── images/")
    print("│   └── labels/")
    print("└── val/")
    print("    ├── images/")
    print("    └── labels/")

In [None]:
# Define training parameters
# You can modify these parameters based on your requirements
training_params = {
    # Dataset parameters
    'dataset_name': datasets[0]['name'] if datasets else 'your_dataset_name',
    'dataset_prefix': datasets[0]['prefix'] if datasets else 'datasets/your_dataset_name/',
    
    # Model parameters
    'model_variant': 'yolov11n',  # Options: yolov11n, yolov11s, yolov11m, yolov11l, yolov11x
    'image_size': 640,  # Input image size (px)
    
    # Training parameters
    'batch_size': 16,
    'epochs': 50,
    'learning_rate': 0.001,
    
    # Infrastructure parameters
    'instance_type': 'ml.g4dn.xlarge',
    'instance_count': 1,
    'use_spot': True,
    'max_wait': 36000,  # Max wait time for spot instances (seconds)
    'max_run': 3600,    # Max run time (seconds)
    
    # Output parameters
    'output_path': f"s3://{BUCKET_NAME}/model-artifacts/",
    'job_name': f"yolov11-training-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"
}

# Display training parameters
print("YOLOv11 Training Parameters:")
for key, value in training_params.items():
    print(f"  {key}: {value}")

### 1.1 Customize Training Parameters

You can modify the training parameters above based on your requirements. Here's a guide to the parameters:

- **Dataset Parameters**:
  - `dataset_name`: Name of your dataset
  - `dataset_prefix`: S3 prefix where your dataset is stored

- **Model Parameters**:
  - `model_variant`: YOLOv11 model variant (yolov11n, yolov11s, yolov11m, yolov11l, yolov11x)
  - `image_size`: Input image size in pixels

- **Training Parameters**:
  - `batch_size`: Batch size for training
  - `epochs`: Number of training epochs
  - `learning_rate`: Learning rate for optimizer

- **Infrastructure Parameters**:
  - `instance_type`: SageMaker instance type for training
  - `instance_count`: Number of instances to use
  - `use_spot`: Whether to use spot instances (cheaper but can be interrupted)
  - `max_wait`: Maximum wait time for spot instances (seconds)
  - `max_run`: Maximum run time for training job (seconds)

- **Output Parameters**:
  - `output_path`: S3 path for model artifacts
  - `job_name`: Name for the training job

## 2. Pipeline Execution

Now let's execute the YOLOv11 training pipeline.

In [None]:
# Function to create and execute training job
def execute_training_job(params):
    """Create and execute SageMaker training job for YOLOv11"""
    # Define hyperparameters
    hyperparameters = {
        "model_variant": params['model_variant'],
        "image_size": str(params['image_size']),
        "batch_size": str(params['batch_size']),
        "epochs": str(params['epochs']),
        "learning_rate": str(params['learning_rate'])
    }
    
    # Define input data channels
    input_data = {
        'training': f"s3://{BUCKET_NAME}/{params['dataset_prefix']}"
    }
    
    # Create SageMaker estimator
    estimator = sagemaker.estimator.Estimator(
        image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/yolov11-training:latest",
        role=ROLE_ARN,
        instance_count=params['instance_count'],
        instance_type=params['instance_type'],
        hyperparameters=hyperparameters,
        output_path=params['output_path'],
        sagemaker_session=sagemaker_session,
        use_spot_instances=params['use_spot'],
        max_wait=params['max_wait'] if params['use_spot'] else None,
        max_run=params['max_run']
    )
    
    # Start training job
    print(f"Starting training job: {params['job_name']}")
    estimator.fit(input_data, job_name=params['job_name'], wait=False)
    
    return params['job_name']

# Execute training job
try:
    job_name = execute_training_job(training_params)
    print(f"\nTraining job started: {job_name}")
    print(f"You can monitor the job in the SageMaker console or using the cell below.")
except Exception as e:
    print(f"Error starting training job: {str(e)}")
    print("\nPossible causes:")
    print("1. The dataset doesn't exist or has incorrect structure")
    print("2. The YOLOv11 training container doesn't exist in ECR")
    print("3. Insufficient permissions to start training job")
    print("\nPlease check the error message and try again.")

## 3. Pipeline Monitoring

Let's monitor the progress of our training job.

In [None]:
# Function to monitor training job
def monitor_training_job(job_name):
    """Monitor SageMaker training job status"""
    # Get job description
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Extract job status
    status = response['TrainingJobStatus']
    creation_time = response['CreationTime']
    last_modified_time = response.get('LastModifiedTime', creation_time)
    
    # Calculate duration
    duration = last_modified_time - creation_time
    duration_minutes = duration.total_seconds() / 60
    
    # Display job information
    print(f"Job Name: {job_name}")
    print(f"Status: {status}")
    print(f"Creation Time: {creation_time.strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Last Modified: {last_modified_time.strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Duration: {duration_minutes:.2f} minutes")
    
    # Display additional information based on status
    if status == 'InProgress':
        print("\nJob is still running. Check back later for results.")
    elif status == 'Completed':
        print("\nJob completed successfully!")
        print(f"Model artifacts: {response['ModelArtifacts']['S3ModelArtifacts']}")
    elif status == 'Failed':
        print("\nJob failed!")
        print(f"Failure reason: {response.get('FailureReason', 'Unknown')}")
    elif status == 'Stopped':
        print("\nJob was stopped.")
    
    return response

# Monitor the training job
try:
    if 'job_name' in locals():
        job_response = monitor_training_job(job_name)
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error monitoring training job: {str(e)}")

### 3.1 Refresh Job Status

You can run the cell below to refresh the job status.

In [None]:
# Refresh job status
try:
    if 'job_name' in locals():
        job_response = monitor_training_job(job_name)
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error monitoring training job: {str(e)}")

### 3.2 View Training Metrics

Once the training job is complete, you can view the training metrics.

In [None]:
# Function to get training metrics
def get_training_metrics(job_name):
    """Get training metrics from CloudWatch"""
    # Get job description
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Check if job is complete
    if response['TrainingJobStatus'] != 'Completed':
        print(f"Job is not yet complete. Current status: {response['TrainingJobStatus']}")
        return None
    
    # Get CloudWatch metrics
    cloudwatch = session.client('cloudwatch')
    
    # Define metrics to retrieve
    metrics = [
        'train:loss',
        'val:loss',
        'val:mAP50',
        'val:mAP50-95'
    ]
    
    # Get metrics data
    metrics_data = {}
    for metric_name in metrics:
        try:
            response = cloudwatch.get_metric_statistics(
                Namespace='SageMaker',
                MetricName=metric_name,
                Dimensions=[
                    {
                        'Name': 'TrainingJobName',
                        'Value': job_name
                    }
                ],
                StartTime=response['CreationTime'],
                EndTime=response['LastModifiedTime'],
                Period=60,  # 1-minute periods
                Statistics=['Average']
            )
            
            # Extract datapoints
            datapoints = response.get('Datapoints', [])
            if datapoints:
                # Sort by timestamp
                datapoints.sort(key=lambda x: x['Timestamp'])
                
                # Extract values
                timestamps = [dp['Timestamp'] for dp in datapoints]
                values = [dp['Average'] for dp in datapoints]
                
                metrics_data[metric_name] = {
                    'timestamps': timestamps,
                    'values': values
                }
        except Exception as e:
            print(f"Error retrieving metric {metric_name}: {str(e)}")
    
    return metrics_data

# Get training metrics
try:
    if 'job_name' in locals():
        metrics_data = get_training_metrics(job_name)
        
        if metrics_data:
            # Plot metrics
            fig, axes = plt.subplots(2, 1, figsize=(12, 10))
            
            # Plot loss
            if 'train:loss' in metrics_data:
                axes[0].plot(
                    metrics_data['train:loss']['timestamps'],
                    metrics_data['train:loss']['values'],
                    label='Train Loss'
                )
            
            if 'val:loss' in metrics_data:
                axes[0].plot(
                    metrics_data['val:loss']['timestamps'],
                    metrics_data['val:loss']['values'],
                    label='Validation Loss'
                )
            
            axes[0].set_title('Training and Validation Loss')
            axes[0].set_xlabel('Time')
            axes[0].set_ylabel('Loss')
            axes[0].legend()
            axes[0].grid(True, alpha=0.3)
            
            # Plot mAP
            if 'val:mAP50' in metrics_data:
                axes[1].plot(
                    metrics_data['val:mAP50']['timestamps'],
                    metrics_data['val:mAP50']['values'],
                    label='mAP@0.5'
                )
            
            if 'val:mAP50-95' in metrics_data:
                axes[1].plot(
                    metrics_data['val:mAP50-95']['timestamps'],
                    metrics_data['val:mAP50-95']['values'],
                    label='mAP@0.5:0.95'
                )
            
            axes[1].set_title('Validation mAP')
            axes[1].set_xlabel('Time')
            axes[1].set_ylabel('mAP')
            axes[1].legend()
            axes[1].grid(True, alpha=0.3)
            
            plt.tight_layout()
            plt.show()
            
            # Display final metrics
            print("Final Metrics:")
            for metric_name, data in metrics_data.items():
                if data['values']:
                    print(f"  {metric_name}: {data['values'][-1]:.4f}")
        else:
            print("No metrics available yet. Job may still be running or has failed.")
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error retrieving training metrics: {str(e)}")

## 4. Model Artifacts

Once the training job is complete, you can access the model artifacts.

In [None]:
# Function to get model artifacts
def get_model_artifacts(job_name):
    """Get model artifacts from training job"""
    # Get job description
    response = sagemaker_client.describe_training_job(
        TrainingJobName=job_name
    )
    
    # Check if job is complete
    if response['TrainingJobStatus'] != 'Completed':
        print(f"Job is not yet complete. Current status: {response['TrainingJobStatus']}")
        return None
    
    # Get model artifacts
    model_artifacts = response['ModelArtifacts']['S3ModelArtifacts']
    
    print(f"Model artifacts: {model_artifacts}")
    print("\nThe model artifacts contain:")
    print("1. model.tar.gz - The compressed model files")
    print("2. Inside model.tar.gz:")
    print("   - best.pt - The best model weights")
    print("   - last.pt - The last model weights")
    print("   - results.csv - Training results")
    print("   - args.yaml - Training arguments")
    
    return model_artifacts

# Get model artifacts
try:
    if 'job_name' in locals():
        model_artifacts = get_model_artifacts(job_name)
    else:
        print("No active training job to monitor.")
        print("Please execute a training job first.")
except Exception as e:
    print(f"Error retrieving model artifacts: {str(e)}")

## 5. Summary and Next Steps

In this notebook, we've executed and monitored YOLOv11 training pipelines using both the new core SageMaker Pipeline and traditional training jobs. Here's a summary of what we've accomplished:

0. **Core SageMaker Pipeline** (NEW):
   - Listed available simplified pipelines for YOLOv11 training
   - Configured pipeline parameters for training execution
   - Monitored pipeline execution progress and results
   - Learned command-line tools for pipeline management

1. **Pipeline Configuration**:
   - Listed available datasets
   - Configured training parameters

2. **Pipeline Execution**:
   - Created and executed SageMaker training jobs

3. **Pipeline Monitoring**:
   - Monitored training job status
   - Viewed training metrics

4. **Model Artifacts**:
   - Accessed model artifacts from training jobs

### Key Features of the Core Pipeline:

- **Simplified Setup**: Easy-to-use SageMaker Pipeline for YOLOv11 training
- **Configurable Parameters**: Customizable training parameters (epochs, batch size, learning rate, etc.)
- **Multiple Model Variants**: Support for all YOLOv11 variants (n, s, m, l, x)
- **GPU Training**: Optimized for GPU instances with cost-effective options
- **Automated Evaluation**: Built-in model evaluation step
- **Command-Line Tools**: Scripts for pipeline creation, execution, and monitoring

### Next Steps

1. **Create Core Pipeline**: Use the setup script to create a simplified pipeline
2. **Prepare Dataset**: Use Data Scientist notebook to create YOLOv11-formatted datasets
3. **Execute Pipeline**: Run training with your prepared dataset
4. **Monitor Progress**: Track training progress and results
5. **Model Evaluation**: Review evaluation metrics and model performance
6. **Model Deployment**: Deploy trained models to endpoints (advanced)
7. **Iterative Improvement**: Refine models based on evaluation results

### Pipeline Management Workflow:

1. **Setup**: `./setup_core_pipeline.sh --profile ab`
2. **Execute**: `./execute_core_pipeline.py --pipeline-name NAME --input-data PATH --profile ab`
3. **Monitor**: `./execute_core_pipeline.py --monitor ARN --profile ab`
4. **Review**: Check results in SageMaker Console and S3

### Cost Optimization Tips:

- Start with `yolov11n` (nano) for initial testing
- Use `ml.g4dn.xlarge` for cost-effective GPU training
- Set appropriate `max_run` times to prevent runaway costs
- Use small datasets for development and testing
- Monitor training progress and stop early if needed

For more detailed functionality, refer to the comprehensive notebooks in the `notebooks/` directory and the pipeline documentation in `scripts/setup/CORE_PIPELINE_README.md`.3. **Pipeline Monitoring**:
   - Monitored training job status
   - Viewed training metrics

4. **Model Artifacts**:
   - Accessed model artifacts from the training job

### Next Steps

1. **Model Evaluation**: Evaluate the trained model on test data
2. **Model Deployment**: Deploy the model to a SageMaker endpoint
3. **Model Monitoring**: Set up monitoring for the deployed model
4. **Iterative Improvement**: Refine the model based on evaluation results

For more detailed functionality, refer to the comprehensive notebooks in the `notebooks/` directory.