[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sci-ndp/ndp-ep-py/blob/main/docs/source/tutorials/s3_management.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sci-ndp/ndp-ep-py/main?filepath=docs%2Fsource%2Ftutorials%2Fs3_management.ipynb)

# S3 Buckets and Objects Management Tutorial

Welcome to the comprehensive tutorial on S3 buckets and objects management with the NDP EP Python client!

## 🎯 What You'll Learn

This tutorial covers complete S3 data management workflows:

- **🪣 Bucket Management**: Create, list, inspect, and delete S3 buckets
- **📁 Object Operations**: Upload, download, list, and delete objects
- **📊 Metadata Handling**: Work with object metadata and properties
- **🔗 Presigned URLs**: Generate secure temporary URLs for uploads/downloads
- **📈 Batch Operations**: Efficiently manage multiple files
- **🛡️ Error Handling**: Robust error handling and best practices
- **🧹 Resource Cleanup**: Proper cleanup of S3 resources

## 🔧 Use Cases

Perfect for:
- **Data Lake Management**: Managing large datasets in S3
- **File Storage**: Storing and retrieving files programmatically
- **Data Sharing**: Creating secure links for data sharing
- **Backup Systems**: Automated backup and restore workflows
- **Data Processing**: ETL pipelines with S3 integration

## ⚠️ Prerequisites

- Valid NDP EP API credentials (token)
- S3 permissions on the target API instance
- Basic understanding of S3 concepts (buckets, objects, keys)

## 🛡️ Safety Note

This tutorial creates and deletes S3 resources. Always test on development systems first and ensure proper cleanup.

In [None]:
# Install required libraries
!pip install ndp-ep

# Import required modules
import io
import json
import time
import getpass
from typing import List, Dict, Any, Optional
from ndp_ep import APIClient

print("✅ Libraries installed and imported successfully!")
print("📚 Ready to start S3 management tutorial")

## 1. 🔐 Authentication and Client Setup

First, let's configure the client with your credentials for S3 operations.

In [None]:
# Interactive API configuration
print("🔧 S3 Management Configuration")
print("=" * 35)

# Get API base URL
api_url = input("Enter API base URL [http://localhost:8000]: ").strip()
if not api_url:
    api_url = "http://localhost:8000"

print(f"📡 API URL: {api_url}")

# Get API token securely
print("\n🔐 Authentication")
print("Please enter your API token (it will be hidden):")
api_token = getpass.getpass("API Token: ")

if not api_token.strip():
    raise ValueError("❌ API token is required for S3 operations")

print("✅ Credentials configured securely")

In [None]:
# Initialize and test the API client
print("🚀 Initializing S3-enabled API Client...")

try:
    client = APIClient(base_url=api_url, token=api_token)
    
    # Test basic connection
    try:
        system_status = client.get_system_status()
        print("✅ API client initialized successfully")
        print(f"🌐 Connected to: {api_url}")
        print("🔑 Authentication verified")
        
        # Test S3 functionality by listing buckets
        try:
            buckets = client.list_buckets()
            print(f"🪣 S3 functionality confirmed - {len(buckets)} buckets found")
        except Exception as e:
            print(f"⚠️  S3 functionality test: {e}")
            print("💡 S3 features may require additional setup")
            
    except Exception as e:
        print(f"⚠️  API connection test failed: {e}")
        print("💡 Continuing in demo mode - some features may not work")
        print("✅ Client object created successfully")
    
except Exception as e:
    print(f"❌ Failed to initialize client: {e}")
    print("💡 Please check your credentials and API URL")
    raise

## 2. 📋 Helper Functions and Configuration

Let's create utility functions for our S3 operations.

In [None]:
# Configuration for the tutorial
TUTORIAL_PREFIX = "tutorial"
TUTORIAL_BUCKET = f"{TUTORIAL_PREFIX}-bucket-{int(time.time())}"
SAMPLE_FILES = {
    "data/sample.txt": "This is a sample text file for S3 tutorial.",
    "data/config.json": '{"environment": "tutorial", "version": "1.0"}',
    "logs/app.log": "2024-01-01 10:00:00 INFO Application started\n2024-01-01 10:01:00 INFO Processing data",
    "images/placeholder.txt": "This represents an image file in the tutorial"
}

print("📊 Tutorial Configuration")
print("=" * 30)
print(f"Bucket name: {TUTORIAL_BUCKET}")
print(f"Sample files: {len(SAMPLE_FILES)} files")
print(f"Prefix: {TUTORIAL_PREFIX}")

# Storage for tracking created resources
created_buckets = []
uploaded_objects = []
operation_log = []

In [None]:
def log_operation(operation: str, resource_type: str, 
                  resource_name: str, success: bool, 
                  details: str = "") -> None:
    """
    Log an S3 operation for tracking and debugging.
    
    Args:
        operation: Type of operation (create, upload, download, delete)
        resource_type: Type of resource (bucket, object)
        resource_name: Name of the resource
        success: Whether the operation was successful
        details: Additional details or error messages
    """
    timestamp = time.strftime("%H:%M:%S")
    status = "✅" if success else "❌"
    
    log_entry = {
        "timestamp": timestamp,
        "operation": operation,
        "resource_type": resource_type,
        "resource_name": resource_name,
        "success": success,
        "details": details
    }
    
    operation_log.append(log_entry)
    print(f"{status} [{timestamp}] {operation.title()} {resource_type}: {resource_name}")
    
    if details and not success:
        print(f"   └─ Error: {details}")
    elif details and success:
        print(f"   └─ {details}")


def format_file_size(size_bytes: int) -> str:
    """
    Format file size in human-readable format.
    
    Args:
        size_bytes: Size in bytes
        
    Returns:
        Formatted size string
    """
    for unit in ['B', 'KB', 'MB', 'GB']:
        if size_bytes < 1024.0:
            return f"{size_bytes:.1f} {unit}"
        size_bytes /= 1024.0
    return f"{size_bytes:.1f} TB"


def safe_s3_operation(func, *args, operation_name: str, **kwargs):
    """
    Safely execute an S3 operation with error handling.
    
    Args:
        func: Function to execute
        operation_name: Description of the operation
        *args, **kwargs: Arguments for the function
        
    Returns:
        Result of the function or None if failed
    """
    try:
        result = func(*args, **kwargs)
        return result
    except ValueError as e:
        print(f"❌ {operation_name} failed: {e}")
        return None
    except Exception as e:
        print(f"⚠️  Unexpected error in {operation_name}: {e}")
        return None

print("🔧 Helper functions defined successfully")
print("📝 Ready for S3 operations")

## 3. 🪣 Bucket Management

Let's start with basic bucket operations: listing, creating, and inspecting buckets.

In [None]:
# List existing buckets
print("🪣 Current S3 Buckets")
print("=" * 25)

try:
    existing_buckets = client.list_buckets()
    print(f"📋 Found {len(existing_buckets)} existing buckets:")
    
    for i, bucket in enumerate(existing_buckets, 1):
        bucket_name = bucket.get('name', bucket.get('Name', 'Unknown'))
        created_date = bucket.get('created', bucket.get('CreationDate', 'Unknown'))
        print(f"   {i}. {bucket_name} (Created: {created_date})")
    
    if not existing_buckets:
        print("   📭 No buckets found")
        
    log_operation("list", "buckets", "all", True, f"Found {len(existing_buckets)} buckets")
    
except Exception as e:
    print(f"❌ Failed to list buckets: {e}")
    log_operation("list", "buckets", "all", False, str(e))
    existing_buckets = []

In [None]:
# Create a new bucket for our tutorial
print(f"\n🔨 Creating Tutorial Bucket: {TUTORIAL_BUCKET}")
print("=" * 45)

try:
    # Check if bucket already exists
    bucket_exists = any(
        bucket.get('name', bucket.get('Name', '')) == TUTORIAL_BUCKET 
        for bucket in existing_buckets
    )
    
    if bucket_exists:
        print(f"ℹ️  Bucket '{TUTORIAL_BUCKET}' already exists")
        log_operation("check", "bucket", TUTORIAL_BUCKET, True, "Already exists")
    else:
        # Create the bucket (use name parameter for API compatibility)
        result = client.create_bucket(TUTORIAL_BUCKET, name=TUTORIAL_BUCKET)
        print(f"✅ Bucket created successfully")
        print(f"📍 Bucket name: {TUTORIAL_BUCKET}")
        
        if 'location' in result:
            print(f"🌍 Location: {result['location']}")
        
        created_buckets.append(TUTORIAL_BUCKET)
        log_operation("create", "bucket", TUTORIAL_BUCKET, True, "Successfully created")
    
except Exception as e:
    print(f"❌ Failed to create bucket: {e}")
    log_operation("create", "bucket", TUTORIAL_BUCKET, False, str(e))
    raise

In [None]:
# Get detailed information about our bucket
print(f"\n🔍 Inspecting Bucket: {TUTORIAL_BUCKET}")
print("=" * 35)

try:
    bucket_info = client.get_bucket_info(TUTORIAL_BUCKET)
    print("✅ Bucket information retrieved:")
    
    # Display bucket details
    for key, value in bucket_info.items():
        if key.lower() in ['name', 'created', 'location', 'region', 'versioning']:
            print(f"   📋 {key.title()}: {value}")
    
    log_operation("inspect", "bucket", TUTORIAL_BUCKET, True, "Information retrieved")
    
except Exception as e:
    print(f"❌ Failed to get bucket info: {e}")
    log_operation("inspect", "bucket", TUTORIAL_BUCKET, False, str(e))

## 4. 📁 Object Upload Operations

Now let's upload various types of files to demonstrate object management.

In [None]:
# Upload sample files to the bucket
print(f"📤 Uploading Sample Files to {TUTORIAL_BUCKET}")
print("=" * 45)

upload_start_time = time.time()
successful_uploads = 0
failed_uploads = 0

for object_key, content in SAMPLE_FILES.items():
    try:
        # Convert string content to bytes
        file_data = content.encode('utf-8')
        
        # Determine content type based on file extension
        if object_key.endswith('.json'):
            content_type = 'application/json'
        elif object_key.endswith('.txt') or object_key.endswith('.log'):
            content_type = 'text/plain'
        else:
            content_type = 'application/octet-stream'
        
        # Upload the object
        result = client.upload_object(
            bucket_name=TUTORIAL_BUCKET,
            object_key=object_key,
            file_data=file_data,
            content_type=content_type
        )
        
        file_size = len(file_data)
        print(f"✅ Uploaded: {object_key} ({format_file_size(file_size)})")
        
        uploaded_objects.append({
            'bucket': TUTORIAL_BUCKET,
            'key': object_key,
            'size': file_size,
            'content_type': content_type
        })
        
        successful_uploads += 1
        log_operation("upload", "object", object_key, True, 
                     f"Size: {format_file_size(file_size)}")
        
    except Exception as e:
        failed_uploads += 1
        print(f"❌ Failed to upload {object_key}: {e}")
        log_operation("upload", "object", object_key, False, str(e))

upload_duration = time.time() - upload_start_time

print(f"\n📊 Upload Summary:")
print(f"   ✅ Successful: {successful_uploads}/{len(SAMPLE_FILES)}")
print(f"   ❌ Failed: {failed_uploads}/{len(SAMPLE_FILES)}")
print(f"   ⏱️  Duration: {upload_duration:.2f} seconds")

if successful_uploads > 0:
    total_size = sum(obj['size'] for obj in uploaded_objects)
    print(f"   📦 Total uploaded: {format_file_size(total_size)}")
    print(f"   🚀 Average speed: {successful_uploads/upload_duration:.1f} files/second")

## 5. 📋 Object Listing and Discovery

Let's explore different ways to list and discover objects in our bucket.

In [None]:
# List all objects in the bucket
print(f"📋 Listing All Objects in {TUTORIAL_BUCKET}")
print("=" * 40)

try:
    all_objects = client.list_objects(TUTORIAL_BUCKET)
    print(f"📁 Found {len(all_objects)} objects:")
    
    for i, obj in enumerate(all_objects, 1):
        obj_key = obj.get('key', obj.get('Key', 'Unknown'))
        obj_size = obj.get('size', obj.get('Size', 0))
        obj_modified = obj.get('last_modified', obj.get('LastModified', 'Unknown'))
        
        print(f"   {i}. {obj_key}")
        print(f"      📦 Size: {format_file_size(obj_size)}")
        print(f"      📅 Modified: {obj_modified}")
    
    log_operation("list", "objects", "all", True, f"Found {len(all_objects)} objects")
    
except Exception as e:
    print(f"❌ Failed to list objects: {e}")
    log_operation("list", "objects", "all", False, str(e))
    all_objects = []

In [None]:
# List objects with prefix filtering
print(f"\n🔍 Filtering Objects by Prefix")
print("=" * 35)

prefixes_to_test = ["data/", "logs/", "images/"]

for prefix in prefixes_to_test:
    try:
        filtered_objects = client.list_objects(TUTORIAL_BUCKET, prefix=prefix)
        print(f"\n📂 Objects with prefix '{prefix}': {len(filtered_objects)} found")
        
        for obj in filtered_objects:
            obj_key = obj.get('key', obj.get('Key', 'Unknown'))
            obj_size = obj.get('size', obj.get('Size', 0))
            print(f"   📄 {obj_key} ({format_file_size(obj_size)})")
        
        log_operation("filter", "objects", prefix, True, 
                     f"Found {len(filtered_objects)} objects")
        
    except Exception as e:
        print(f"❌ Failed to filter objects with prefix '{prefix}': {e}")
        log_operation("filter", "objects", prefix, False, str(e))

## 6. 📥 Object Download and Metadata

Let's download objects and examine their metadata.

In [None]:
# Download and examine a sample object
sample_object_key = "data/sample.txt"

print(f"📥 Downloading Object: {sample_object_key}")
print("=" * 35)

try:
    # Download the object
    object_data = client.download_object(TUTORIAL_BUCKET, sample_object_key)
    
    # Decode and display content
    content = object_data.decode('utf-8')
    print(f"✅ Downloaded successfully")
    print(f"📦 Size: {format_file_size(len(object_data))}")
    print(f"📄 Content preview:")
    print(f"   {content[:100]}{'...' if len(content) > 100 else ''}")
    
    log_operation("download", "object", sample_object_key, True, 
                 f"Size: {format_file_size(len(object_data))}")
    
except Exception as e:
    print(f"❌ Failed to download object: {e}")
    log_operation("download", "object", sample_object_key, False, str(e))

In [None]:
# Get object metadata
print(f"\n🔍 Object Metadata: {sample_object_key}")
print("=" * 30)

try:
    metadata = client.get_object_metadata(TUTORIAL_BUCKET, sample_object_key)
    print("✅ Metadata retrieved:")
    
    # Display relevant metadata fields
    metadata_fields = [
        'content_type', 'content_length', 'last_modified', 
        'etag', 'content_encoding', 'cache_control'
    ]
    
    for field in metadata_fields:
        if field in metadata:
            value = metadata[field]
            if field == 'content_length':
                value = f"{value} bytes ({format_file_size(int(value))})"
            print(f"   📋 {field.replace('_', ' ').title()}: {value}")
    
    # Display custom metadata if any
    custom_metadata = {k: v for k, v in metadata.items() 
                      if k.startswith('x-amz-meta-') or k not in metadata_fields}
    
    if custom_metadata:
        print("   🏷️  Custom Metadata:")
        for key, value in custom_metadata.items():
            print(f"      {key}: {value}")
    
    log_operation("metadata", "object", sample_object_key, True, "Retrieved successfully")
    
except Exception as e:
    print(f"❌ Failed to get metadata: {e}")
    log_operation("metadata", "object", sample_object_key, False, str(e))

## 7. 🔗 Presigned URLs for Secure Sharing

Learn how to generate secure temporary URLs for sharing files.

In [None]:
# Generate presigned download URL
print(f"🔗 Generating Presigned URLs")
print("=" * 30)

target_object = "data/config.json"
expiration_time = 3600  # 1 hour

try:
    # Generate download URL
    download_url_data = client.generate_presigned_download_url(
        bucket_name=TUTORIAL_BUCKET,
        object_key=target_object,
        expiration=expiration_time
    )
    
    print(f"✅ Download URL generated for: {target_object}")
    print(f"⏰ Expires in: {expiration_time} seconds ({expiration_time//3600}h {(expiration_time%3600)//60}m)")
    
    if 'url' in download_url_data:
        url = download_url_data['url']
        print(f"🔗 URL: {url[:50]}...{url[-20:] if len(url) > 70 else url[50:]}")
    
    log_operation("presign_download", "object", target_object, True, 
                 f"Expires in {expiration_time}s")
    
except Exception as e:
    print(f"❌ Failed to generate download URL: {e}")
    log_operation("presign_download", "object", target_object, False, str(e))

In [None]:
# Generate presigned upload URL
print(f"\n📤 Generating Presigned Upload URL")
print("=" * 35)

upload_object_key = "uploads/new-file.txt"
upload_expiration = 1800  # 30 minutes

try:
    # Generate upload URL
    upload_url_data = client.generate_presigned_upload_url(
        bucket_name=TUTORIAL_BUCKET,
        object_key=upload_object_key,
        expiration=upload_expiration
    )
    
    print(f"✅ Upload URL generated for: {upload_object_key}")
    print(f"⏰ Expires in: {upload_expiration} seconds ({upload_expiration//60}m)")
    
    if 'url' in upload_url_data:
        url = upload_url_data['url']
        print(f"🔗 URL: {url[:50]}...{url[-20:] if len(url) > 70 else url[50:]}")
    
    if 'fields' in upload_url_data:
        print(f"📋 Form fields: {len(upload_url_data['fields'])} required")
    
    log_operation("presign_upload", "object", upload_object_key, True, 
                 f"Expires in {upload_expiration}s")
    
except Exception as e:
    print(f"❌ Failed to generate upload URL: {e}")
    log_operation("presign_upload", "object", upload_object_key, False, str(e))

## 8. 📊 Batch Operations and Advanced Management

Demonstrate efficient batch operations for managing multiple objects.

In [None]:
# Create additional files for batch operations
print("📦 Creating Additional Files for Batch Demo")
print("=" * 45)

batch_files = {}
for i in range(1, 6):
    file_key = f"batch/file_{i:02d}.txt"
    file_content = f"This is batch file number {i}\nGenerated for tutorial purposes\nTimestamp: {time.strftime('%Y-%m-%d %H:%M:%S')}"
    batch_files[file_key] = file_content

batch_start_time = time.time()
batch_successful = 0
batch_failed = 0

for object_key, content in batch_files.items():
    try:
        file_data = content.encode('utf-8')
        
        result = client.upload_object(
            bucket_name=TUTORIAL_BUCKET,
            object_key=object_key,
            file_data=file_data,
            content_type='text/plain'
        )
        
        batch_successful += 1
        uploaded_objects.append({
            'bucket': TUTORIAL_BUCKET,
            'key': object_key,
            'size': len(file_data),
            'content_type': 'text/plain'
        })
        
        print(f"✅ Batch upload: {object_key}")
        
    except Exception as e:
        batch_failed += 1
        print(f"❌ Batch upload failed: {object_key} - {e}")

batch_duration = time.time() - batch_start_time

print(f"\n📈 Batch Upload Results:")
print(f"   ✅ Successful: {batch_successful}/{len(batch_files)}")
print(f"   ❌ Failed: {batch_failed}/{len(batch_files)}")
print(f"   ⏱️  Duration: {batch_duration:.2f} seconds")
print(f"   🚀 Rate: {batch_successful/batch_duration:.1f} files/second")

## 9. 🧹 Cleanup and Resource Management

Let's properly clean up all the resources we created during this tutorial.

In [None]:
# Confirmation before cleanup
print("🧹 CLEANUP CONFIRMATION")
print("=" * 30)
print(f"About to delete:")
print(f"   📁 {len(uploaded_objects)} objects")
print(f"   🪣 1 bucket ({TUTORIAL_BUCKET})")
print(f"\nTotal resources: {len(uploaded_objects) + 1}")

# Get user confirmation
confirmation = input("\nProceed with cleanup? (yes/no): ").strip().lower()

if confirmation != 'yes':
    print("🚫 Cleanup cancelled by user")
    print("💡 Resources remain in S3 for manual cleanup")
else:
    print("\n✅ Proceeding with cleanup...")

In [None]:
# Delete all objects (only if user confirmed)
if confirmation == 'yes':
    print(f"\n🗑️  Deleting Objects from {TUTORIAL_BUCKET}")
    print("=" * 40)
    
    cleanup_start_time = time.time()
    deleted_objects = 0
    failed_deletions = 0
    
    # Get current objects (in case some were added/removed)
    try:
        current_objects = client.list_objects(TUTORIAL_BUCKET)
        objects_to_delete = [obj.get('key', obj.get('Key', '')) for obj in current_objects]
        
        print(f"📋 Found {len(objects_to_delete)} objects to delete")
        
        for i, object_key in enumerate(objects_to_delete, 1):
            try:
                client.delete_object(TUTORIAL_BUCKET, object_key)
                deleted_objects += 1
                print(f"✅ Deleted: {object_key}")
                log_operation("delete", "object", object_key, True, "Cleanup")
                
            except Exception as e:
                failed_deletions += 1
                print(f"❌ Failed to delete {object_key}: {e}")
                log_operation("delete", "object", object_key, False, str(e))
            
            # Small delay to avoid overwhelming the API
            if i % 5 == 0:
                time.sleep(0.1)
        
        cleanup_duration = time.time() - cleanup_start_time
        
        print(f"\n📈 Object Deletion Summary:")
        print(f"   ✅ Deleted: {deleted_objects}/{len(objects_to_delete)}")
        print(f"   ❌ Failed: {failed_deletions}/{len(objects_to_delete)}")
        print(f"   ⏱️  Duration: {cleanup_duration:.2f} seconds")
        
    except Exception as e:
        print(f"❌ Failed to list objects for deletion: {e}")
else:
    print("\n⏭️  Skipping object deletion (cleanup cancelled)")

In [None]:
# Delete the bucket (only if user confirmed and objects were deleted)
if confirmation == 'yes':
    print(f"\n🪣 Deleting Bucket: {TUTORIAL_BUCKET}")
    print("=" * 30)
    
    try:
        # Verify bucket is empty
        remaining_objects = client.list_objects(TUTORIAL_BUCKET)
        
        if len(remaining_objects) == 0:
            # Safe to delete bucket
            result = client.delete_bucket(TUTORIAL_BUCKET)
            print(f"✅ Bucket '{TUTORIAL_BUCKET}' deleted successfully")
            log_operation("delete", "bucket", TUTORIAL_BUCKET, True, "Cleanup complete")
        else:
            print(f"⚠️  Bucket '{TUTORIAL_BUCKET}' still contains {len(remaining_objects)} objects")
            print(f"   Bucket will not be deleted for safety")
            log_operation("delete", "bucket", TUTORIAL_BUCKET, False, 
                         f"Still contains {len(remaining_objects)} objects")
            
    except Exception as e:
        print(f"❌ Failed to delete bucket '{TUTORIAL_BUCKET}': {e}")
        log_operation("delete", "bucket", TUTORIAL_BUCKET, False, str(e))
else:
    print("\n⏭️  Skipping bucket deletion (cleanup cancelled)")