# Data Scientist Core Workflow with MLFlow

This notebook provides essential functionality for Data Scientists to explore and prepare drone imagery data for YOLOv11 model training, with MLFlow experiment tracking integration.

## Workflow Overview

1. **Data Exploration**: Analyze and visualize the drone imagery dataset
2. **Data Preparation**: Prepare data for YOLOv11 training
3. **Ground Truth Labeling**: Create labeling jobs for annotation
4. **Experiment Tracking**: Track data exploration experiments with MLFlow

## Prerequisites

- AWS account with appropriate permissions
- AWS CLI configured with "ab" profile
- SageMaker Studio access with Data Scientist role
- Access to the drone imagery dataset in S3 bucket: `lucaskle-ab3-project-pv`
- SageMaker managed MLFlow tracking server

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
import os
import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from IPython.display import display, HTML
import io
import json
from PIL import Image
import mlflow
import mlflow.sklearn
import sagemaker

# Set up AWS session with "ab" profile
session = boto3.Session(profile_name='ab')
s3_client = session.client('s3')
sagemaker_client = session.client('sagemaker')
sagemaker_session = sagemaker.Session(boto_session=session)
region = session.region_name
account_id = session.client('sts').get_caller_identity()['Account']

# Set up MLFlow tracking
# Get SageMaker managed MLFlow tracking server URI
mlflow_tracking_uri = sagemaker_session.get_caller_identity_arn().replace('arn:aws:sts::', 'https://').replace(':assumed-role', '.mlflow')
mlflow.set_tracking_uri(mlflow_tracking_uri)

# Set up visualization
plt.rcParams["figure.figsize"] = (12, 6)

# Define bucket name
BUCKET_NAME = 'lucaskle-ab3-project-pv'

print(f"Data Bucket: {BUCKET_NAME}")
print(f"Region: {region}")
print(f"Account ID: {account_id}")
print(f"MLFlow Tracking URI: {mlflow_tracking_uri}")

## 1. Data Exploration with MLFlow Tracking

Let's start by exploring the drone imagery dataset stored in S3 and track our exploration with MLFlow.

In [None]:
# Start MLFlow experiment for data exploration
experiment_name = "drone-imagery-data-exploration"
mlflow.set_experiment(experiment_name)

# Start MLFlow run
with mlflow.start_run(run_name=f"data-exploration-{datetime.now().strftime('%Y%m%d-%H%M%S')}"):
    # Function to list objects in S3 bucket
    def list_s3_objects(bucket, prefix="", max_items=100):
        """List objects in an S3 bucket with the given prefix"""
        response = s3_client.list_objects_v2(
            Bucket=bucket,
            Prefix=prefix,
            MaxKeys=max_items
        )
        
        if 'Contents' in response:
            return response['Contents']
        else:
            return []

    # Function to filter image files
    def filter_image_files(objects):
        """Filter image files from S3 objects list"""
        image_extensions = [".jpg", ".jpeg", ".png", ".tiff", ".tif"]
        return [obj for obj in objects 
                if any(obj['Key'].lower().endswith(ext) for ext in image_extensions)]

    # List raw images in the bucket
    raw_objects = list_s3_objects(BUCKET_NAME, prefix="raw-images/")
    raw_images = filter_image_files(raw_objects)

    print(f"Found {len(raw_images)} raw images in the bucket")
    
    # Log dataset statistics to MLFlow
    mlflow.log_param("bucket_name", BUCKET_NAME)
    mlflow.log_param("data_prefix", "raw-images/")
    mlflow.log_metric("total_images", len(raw_images))
    
    # Display the first few image keys
    if raw_images:
        print("\nSample image keys:")
        for i, img in enumerate(raw_images[:5]):
            print(f"  {i+1}. {img['Key']}")

### 1.1 Display Sample Images

Let's display some sample images from the dataset to get a visual understanding.

In [None]:
# Function to download and display images
def display_sample_images(bucket, image_objects, num_samples=4):
    """Download and display sample images from S3"""
    # Limit to the requested number of samples
    samples = image_objects[:min(num_samples, len(image_objects))]
    
    # Create a figure with subplots
    fig, axes = plt.subplots(1, len(samples), figsize=(16, 4))
    
    # If only one sample, axes is not an array
    if len(samples) == 1:
        axes = [axes]
    
    # Download and display each image
    for i, img_obj in enumerate(samples):
        try:
            # Download image from S3
            response = s3_client.get_object(Bucket=bucket, Key=img_obj['Key'])
            img_data = response['Body'].read()
            
            # Open image with PIL
            img = Image.open(io.BytesIO(img_data))
            
            # Display image
            axes[i].imshow(img)
            axes[i].set_title(os.path.basename(img_obj['Key']))
            axes[i].axis('off')
            
        except Exception as e:
            print(f"Error displaying image {img_obj['Key']}: {str(e)}")
            axes[i].text(0.5, 0.5, f"Error loading image", ha='center')
            axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Save plot as artifact in MLFlow
    plt.savefig('sample_images.png', dpi=150, bbox_inches='tight')
    mlflow.log_artifact('sample_images.png')

# Display sample images
if raw_images:
    display_sample_images(BUCKET_NAME, raw_images, num_samples=4)
else:
    print("No images found to display")

### 1.2 Basic Image Analysis with MLFlow Tracking

Let's analyze some basic characteristics of the images in our dataset and track the results.

In [None]:
# Continue with the same MLFlow run
with mlflow.start_run(run_name=f"image-analysis-{datetime.now().strftime('%Y%m%d-%H%M%S')}"):
    # Function to analyze image characteristics
    def analyze_images(bucket, image_objects, sample_size=20):
        """Analyze basic characteristics of images"""
        # Limit to sample size
        samples = image_objects[:min(sample_size, len(image_objects))]
        
        # Initialize lists to store image characteristics
        widths = []
        heights = []
        aspect_ratios = []
        file_sizes = []
        formats = []
        
        print(f"Analyzing {len(samples)} sample images...")
        
        # Process each image
        for img_obj in samples:
            try:
                # Download image from S3
                response = s3_client.get_object(Bucket=bucket, Key=img_obj['Key'])
                img_data = response['Body'].read()
                
                # Get file size
                file_size = len(img_data) / (1024 * 1024)  # Convert to MB
                file_sizes.append(file_size)
                
                # Open image with PIL
                img = Image.open(io.BytesIO(img_data))
                
                # Get image dimensions
                width, height = img.size
                widths.append(width)
                heights.append(height)
                
                # Calculate aspect ratio
                aspect_ratio = width / height
                aspect_ratios.append(aspect_ratio)
                
                # Get image format
                formats.append(img.format)
                
            except Exception as e:
                print(f"Error analyzing image {img_obj['Key']}: {str(e)}")
        
        # Calculate statistics
        stats = {
            'count': len(widths),
            'avg_width': np.mean(widths) if widths else 0,
            'avg_height': np.mean(heights) if heights else 0,
            'min_width': min(widths) if widths else 0,
            'max_width': max(widths) if widths else 0,
            'min_height': min(heights) if heights else 0,
            'max_height': max(heights) if heights else 0,
            'avg_aspect_ratio': np.mean(aspect_ratios) if aspect_ratios else 0,
            'avg_file_size': np.mean(file_sizes) if file_sizes else 0,
            'formats': list(set(formats)) if formats else []
        }
        
        return {
            'stats': stats,
            'widths': widths,
            'heights': heights,
            'aspect_ratios': aspect_ratios,
            'file_sizes': file_sizes,
            'formats': formats
        }

    # Analyze sample images
    if raw_images:
        analysis_results = analyze_images(BUCKET_NAME, raw_images, sample_size=20)
        
        # Display statistics
        stats = analysis_results['stats']
        print("\nImage Statistics:")
        print(f"Total images analyzed: {stats['count']}")
        print(f"Average dimensions: {stats['avg_width']:.1f}x{stats['avg_height']:.1f} pixels")
        print(f"Dimension range: {stats['min_width']}x{stats['min_height']} to {stats['max_width']}x{stats['max_height']} pixels")
        print(f"Average aspect ratio: {stats['avg_aspect_ratio']:.2f}")
        print(f"Average file size: {stats['avg_file_size']:.2f} MB")
        print(f"Image formats: {', '.join(stats['formats'])}")
        
        # Log statistics to MLFlow
        mlflow.log_param("sample_size", stats['count'])
        mlflow.log_metric("avg_width", stats['avg_width'])
        mlflow.log_metric("avg_height", stats['avg_height'])
        mlflow.log_metric("min_width", stats['min_width'])
        mlflow.log_metric("max_width", stats['max_width'])
        mlflow.log_metric("min_height", stats['min_height'])
        mlflow.log_metric("max_height", stats['max_height'])
        mlflow.log_metric("avg_aspect_ratio", stats['avg_aspect_ratio'])
        mlflow.log_metric("avg_file_size_mb", stats['avg_file_size'])
        mlflow.log_param("image_formats", ", ".join(stats['formats']))
        
    else:
        print("No images found to analyze")

### 1.3 Visualize Image Characteristics

Let's create some visualizations to better understand our dataset and save them as MLFlow artifacts.

In [None]:
# Visualize image characteristics
if 'analysis_results' in locals() and analysis_results['stats']['count'] > 0:
    # Create a figure with subplots
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Plot image dimensions
    axes[0, 0].scatter(analysis_results['widths'], analysis_results['heights'])
    axes[0, 0].set_xlabel('Width (pixels)')
    axes[0, 0].set_ylabel('Height (pixels)')
    axes[0, 0].set_title('Image Dimensions')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot aspect ratio distribution
    axes[0, 1].hist(analysis_results['aspect_ratios'], bins=10)
    axes[0, 1].set_xlabel('Aspect Ratio (width/height)')
    axes[0, 1].set_ylabel('Count')
    axes[0, 1].set_title('Aspect Ratio Distribution')
    axes[0, 1].grid(True, alpha=0.3)
    
    # Plot file size distribution
    axes[1, 0].hist(analysis_results['file_sizes'], bins=10)
    axes[1, 0].set_xlabel('File Size (MB)')
    axes[1, 0].set_ylabel('Count')
    axes[1, 0].set_title('File Size Distribution')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Plot format distribution
    format_counts = {}
    for fmt in analysis_results['formats']:
        if fmt in format_counts:
            format_counts[fmt] += 1
        else:
            format_counts[fmt] = 1
    
    formats = list(format_counts.keys())
    counts = list(format_counts.values())
    
    axes[1, 1].bar(formats, counts)
    axes[1, 1].set_xlabel('Image Format')
    axes[1, 1].set_ylabel('Count')
    axes[1, 1].set_title('Image Format Distribution')
    
    plt.tight_layout()
    plt.show()
    
    # Save visualization as MLFlow artifact
    plt.savefig('image_analysis.png', dpi=150, bbox_inches='tight')
    mlflow.log_artifact('image_analysis.png')
    
else:
    print("No analysis results available for visualization")

## 2. Data Preparation for YOLOv11 Training

Now let's prepare our data for YOLOv11 training and track the preparation process.

In [None]:
# Start new MLFlow run for data preparation
with mlflow.start_run(run_name=f"data-preparation-{datetime.now().strftime('%Y%m%d-%H%M%S')}"):
    # Function to check if labeled data exists
    def check_labeled_data(bucket, prefix="labeled-data/"):
        """Check if labeled data exists in the bucket"""
        objects = list_s3_objects(bucket, prefix=prefix)
        
        if objects:
            print(f"Found {len(objects)} objects in labeled data directory")
            
            # Group by job name (assuming directory structure)
            jobs = {}
            for obj in objects:
                key = obj['Key']
                parts = key.split('/')
                if len(parts) > 2:
                    job_name = parts[1]
                    if job_name not in jobs:
                        jobs[job_name] = []
                    jobs[job_name].append(key)
            
            # Display job information
            if jobs:
                print(f"\nFound {len(jobs)} labeling jobs:")
                for job, files in jobs.items():
                    print(f"  - {job}: {len(files)} files")
                
                # Log to MLFlow
                mlflow.log_metric("labeled_jobs_count", len(jobs))
                mlflow.log_metric("labeled_files_count", len(objects))
            
            return jobs
        else:
            print("No labeled data found")
            mlflow.log_metric("labeled_jobs_count", 0)
            return {}

    # Check for labeled data
    labeled_jobs = check_labeled_data(BUCKET_NAME)

### 2.1 Prepare Data Structure for YOLOv11

YOLOv11 requires a specific data structure. Let's prepare our data accordingly.

In [None]:
# Function to create YOLO dataset structure
def prepare_yolo_structure(bucket, job_name=None):
    """Prepare YOLO dataset structure in S3"""
    # Define dataset structure
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    dataset_name = f"yolov11_dataset_{timestamp}"
    
    # Define directories
    base_prefix = f"datasets/{dataset_name}/"
    train_prefix = f"{base_prefix}train/"
    val_prefix = f"{base_prefix}val/"
    
    # Create empty directories in S3
    for prefix in [train_prefix, val_prefix]:
        for subdir in ["images/", "labels/"]:
            full_prefix = f"{prefix}{subdir}"
            # Create an empty object to represent the directory
            s3_client.put_object(Bucket=bucket, Key=full_prefix)
    
    print(f"Created YOLO dataset structure at s3://{bucket}/{base_prefix}")
    print("\nDirectory structure:")
    print(f"s3://{bucket}/{base_prefix}")
    print(f"├── train/")
    print(f"│   ├── images/")
    print(f"│   └── labels/")
    print(f"└── val/")
    print(f"    ├── images/")
    print(f"    └── labels/")
    
    # Log to MLFlow
    mlflow.log_param("dataset_name", dataset_name)
    mlflow.log_param("dataset_s3_path", f"s3://{bucket}/{base_prefix}")
    
    return {
        'dataset_name': dataset_name,
        'base_prefix': base_prefix,
        'train_prefix': train_prefix,
        'val_prefix': val_prefix
    }

# Create YOLO dataset structure
yolo_structure = prepare_yolo_structure(BUCKET_NAME)

## 3. Ground Truth Labeling Job Creation

Now let's create a SageMaker Ground Truth labeling job to annotate our drone imagery for object detection.

### 3.1 Configure Labeling Job Parameters

Let's configure the parameters for our Ground Truth labeling job.

In [None]:
# Start new MLFlow run for labeling job creation
with mlflow.start_run(run_name=f"labeling-job-creation-{datetime.now().strftime('%Y%m%d-%H%M%S')}"):
    # Configure labeling job parameters
    labeling_job_config = {
        'job_name': f"drone-detection-labeling-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
        'input_s3_path': f"s3://{BUCKET_NAME}/raw-images/",
        'output_s3_path': f"s3://{BUCKET_NAME}/labeled-data/",
        'task_type': 'BoundingBox',
        'labels': ['drone', 'vehicle', 'person', 'building'],
        'instructions': 'Please draw bounding boxes around all drones and other objects visible in the image.',
        'max_budget_usd': 50.00,
        'workforce_type': 'private'  # or 'public' for Mechanical Turk
    }
    
    # Display configuration
    print("Labeling Job Configuration:")
    for key, value in labeling_job_config.items():
        print(f"  {key}: {value}")
    
    # Log configuration to MLFlow
    for key, value in labeling_job_config.items():
        if isinstance(value, (str, int, float)):
            mlflow.log_param(f"labeling_{key}", value)
        elif isinstance(value, list):
            mlflow.log_param(f"labeling_{key}", ", ".join(value))

### 3.2 Create Input Manifest for Ground Truth

Ground Truth requires an input manifest file that lists all images to be labeled.

In [None]:
# Function to create input manifest for Ground Truth
def create_input_manifest(bucket, image_objects, output_key="input-manifest.json"):
    """Create input manifest file for Ground Truth labeling job"""
    
    # Create manifest entries
    manifest_entries = []
    for img_obj in image_objects[:10]:  # Limit to first 10 images for demo
        s3_uri = f"s3://{bucket}/{img_obj['Key']}"
        manifest_entry = {
            "source-ref": s3_uri
        }
        manifest_entries.append(json.dumps(manifest_entry))
    
    # Create manifest content
    manifest_content = "\n".join(manifest_entries)
    
    # Upload manifest to S3
    s3_client.put_object(
        Bucket=bucket,
        Key=output_key,
        Body=manifest_content,
        ContentType='application/json'
    )
    
    manifest_s3_uri = f"s3://{bucket}/{output_key}"
    print(f"Created input manifest: {manifest_s3_uri}")
    print(f"Number of images in manifest: {len(manifest_entries)}")
    
    return manifest_s3_uri

# Create input manifest
if raw_images:
    manifest_uri = create_input_manifest(BUCKET_NAME, raw_images)
    mlflow.log_param("input_manifest_uri", manifest_uri)
    mlflow.log_metric("images_to_label", min(10, len(raw_images)))
else:
    print("No images available for labeling")

### 3.3 Create Ground Truth Labeling Job

Now let's create the actual Ground Truth labeling job.

In [None]:
# Function to create Ground Truth labeling job
def create_ground_truth_job(config, manifest_uri):
    """Create SageMaker Ground Truth labeling job"""
    
    try:
        # Get execution role
        role_arn = sagemaker_session.get_caller_identity_arn()
        
        # Create labeling job
        response = sagemaker_client.create_labeling_job(
            LabelingJobName=config['job_name'],
            LabelAttributeName='drone-detection',
            InputConfig={
                'DataSource': {
                    'S3DataSource': {
                        'ManifestS3Uri': manifest_uri
                    }
                }
            },
            OutputConfig={
                'S3OutputPath': config['output_s3_path']
            },
            RoleArn=role_arn,
            HumanTaskConfig={
                'WorkteamArn': f"arn:aws:sagemaker:{region}:{account_id}:workteam/private-crowd/default",
                'UiConfig': {
                    'UiTemplateS3Uri': f"s3://sagemaker-{region}/labeling-jobs/templates/bounding-box/template.liquid"
                },
                'PreHumanTaskLambdaArn': f"arn:aws:lambda:{region}:432418664414:function:PRE-BoundingBox",
                'TaskTitle': 'Drone Detection Labeling',
                'TaskDescription': config['instructions'],
                'NumberOfHumanWorkersPerDataObject': 1,
                'TaskTimeLimitInSeconds': 3600,
                'AnnotationConsolidationLambdaArn': f"arn:aws:lambda:{region}:432418664414:function:ACS-BoundingBox"
            }
        )
        
        job_arn = response['LabelingJobArn']
        print(f"✅ Created labeling job: {config['job_name']}")
        print(f"Job ARN: {job_arn}")
        print(f"You can monitor the job in the SageMaker console")
        
        # Log to MLFlow
        mlflow.log_param("labeling_job_arn", job_arn)
        mlflow.log_param("labeling_job_status", "created")
        
        return job_arn
        
    except Exception as e:
        print(f"❌ Error creating labeling job: {str(e)}")
        print("\nPossible causes:")
        print("1. Insufficient permissions to create labeling jobs")
        print("2. No private workforce configured")
        print("3. Invalid S3 paths or manifest format")
        print("\nTo set up a private workforce:")
        print("1. Go to SageMaker Console > Ground Truth > Labeling workforces")
        print("2. Create a private workforce")
        print("3. Add team members to the workforce")
        
        # Log error to MLFlow
        mlflow.log_param("labeling_job_error", str(e))
        mlflow.log_param("labeling_job_status", "failed")
        
        return None

# Create the labeling job
if 'manifest_uri' in locals():
    job_arn = create_ground_truth_job(labeling_job_config, manifest_uri)
else:
    print("No manifest available for labeling job creation")

### 3.4 Monitor Labeling Job Progress

Let's create a function to monitor the labeling job progress.

In [None]:
# Function to monitor labeling job
def monitor_labeling_job(job_name):
    """Monitor Ground Truth labeling job progress"""
    
    try:
        response = sagemaker_client.describe_labeling_job(
            LabelingJobName=job_name
        )
        
        status = response['LabelingJobStatus']
        creation_time = response['CreationTime']
        
        print(f"Job Name: {job_name}")
        print(f"Status: {status}")
        print(f"Created: {creation_time.strftime('%Y-%m-%d %H:%M:%S')}")
        
        if 'LabelCounters' in response:
            counters = response['LabelCounters']
            print(f"Total objects: {counters.get('TotalLabeled', 0) + counters.get('Unlabeled', 0)}")
            print(f"Labeled: {counters.get('TotalLabeled', 0)}")
            print(f"Remaining: {counters.get('Unlabeled', 0)}")
        
        if status == 'Completed':
            output_location = response['LabelingJobOutput']['OutputDatasetS3Uri']
            print(f"✅ Job completed! Output: {output_location}")
            
            # Log completion to MLFlow
            mlflow.log_param("labeling_job_output", output_location)
            mlflow.log_param("labeling_job_status", "completed")
            
        elif status == 'Failed':
            failure_reason = response.get('FailureReason', 'Unknown')
            print(f"❌ Job failed: {failure_reason}")
            
            # Log failure to MLFlow
            mlflow.log_param("labeling_job_failure", failure_reason)
            mlflow.log_param("labeling_job_status", "failed")
        
        return response
        
    except Exception as e:
        print(f"Error monitoring labeling job: {str(e)}")
        return None

# Monitor the job if it was created
if 'job_arn' in locals() and job_arn:
    job_status = monitor_labeling_job(labeling_job_config['job_name'])

## 4. View MLFlow Experiments

Let's view our MLFlow experiments and runs.

In [None]:
# Function to list MLFlow experiments
def list_mlflow_experiments():
    """List all MLFlow experiments"""
    experiments = mlflow.search_experiments()
    
    if experiments:
        print("MLFlow Experiments:")
        for exp in experiments:
            print(f"  - {exp.name} (ID: {exp.experiment_id})")
            
            # Get runs for this experiment
            runs = mlflow.search_runs(experiment_ids=[exp.experiment_id])
            print(f"    Runs: {len(runs)}")
            
            if len(runs) > 0:
                print("    Recent runs:")
                for _, run in runs.head(3).iterrows():
                    run_name = run.get('tags.mlflow.runName', 'Unnamed')
                    status = run.get('status', 'Unknown')
                    print(f"      - {run_name} ({status})")
            print()
    else:
        print("No MLFlow experiments found")

# List experiments
list_mlflow_experiments()

## 5. Summary and Next Steps

In this notebook, we've explored the drone imagery dataset, prepared the structure for YOLOv11 training, created a Ground Truth labeling job, and tracked our work with MLFlow. Here's a summary of what we've accomplished:

1. **Data Exploration with MLFlow**:
   - Listed and displayed sample images from the S3 bucket
   - Analyzed image characteristics (dimensions, aspect ratios, file sizes)
   - Visualized image statistics
   - Tracked all metrics and artifacts in MLFlow

2. **Data Preparation**:
   - Checked for existing labeled data
   - Created YOLO dataset structure in S3
   - Logged dataset information to MLFlow

3. **Ground Truth Labeling**:
   - Configured labeling job parameters
   - Created input manifest for Ground Truth
   - Set up Ground Truth labeling job for object detection
   - Implemented job monitoring functionality

4. **Experiment Tracking**:
   - Used MLFlow to track all data exploration and labeling activities
   - Saved visualizations as artifacts
   - Logged parameters and metrics for reproducibility

### Next Steps

1. **Monitor Labeling Job**: Check the progress of your Ground Truth labeling job
2. **Complete Labeling**: Ensure all images are properly labeled
3. **Convert Labels**: Convert Ground Truth output to YOLOv11 format
4. **Organize Training Data**: Place labeled data in the YOLO structure we created
5. **Proceed to Training**: Use the ML Engineer notebook for model training
6. **Review MLFlow**: Check all experiments in the SageMaker Studio MLFlow UI

### Ground Truth Integration Benefits

- **Quality Control**: Professional annotation with quality checks
- **Scalability**: Handle large datasets efficiently
- **Cost Management**: Budget controls and cost estimation
- **Workforce Management**: Private or public workforce options
- **Integration**: Seamless integration with SageMaker training pipelines

### MLFlow Integration Benefits

- **Complete Tracking**: All data exploration and labeling activities are tracked
- **Reproducibility**: Parameters and configurations are logged
- **Collaboration**: Team members can view and compare experiments
- **Artifact Management**: Visualizations and data summaries are stored
- **Lineage**: Track data from exploration to training

### Accessing Your Work

- **MLFlow UI**: Go to "Experiments and trials" > "MLflow" in SageMaker Studio
- **Ground Truth Console**: Monitor labeling jobs in the SageMaker console
- **S3 Data**: All data and artifacts are stored in your S3 bucket

For more detailed functionality, refer to the comprehensive notebooks in the `notebooks/data-labeling/` directory.