# Amazon S3 Operations with Boto3

## Learning Objectives

By the end of this notebook, you will be able to:

1. Create, list, and delete S3 buckets
2. Upload and download files to/from S3
3. List and filter objects in a bucket
4. Generate presigned URLs for temporary access
5. Manage object metadata and tags

---

## 1. S3 Fundamentals

### What is Amazon S3?

Amazon Simple Storage Service (S3) is an object storage service that offers:
- **Durability**: 99.999999999% (11 9's)
- **Availability**: 99.99%
- **Scalability**: Virtually unlimited storage
- **Security**: Encryption, access policies, versioning

### S3 Concepts

| Concept | Description | Example |
|---------|-------------|---------|
| **Bucket** | Container for objects | `my-company-data` |
| **Object** | File + metadata | `images/logo.png` |
| **Key** | Unique object identifier | `folder/subfolder/file.txt` |
| **Region** | Geographic location | `us-east-1` |

### S3 URL Formats

```
Virtual-hosted style (preferred):
  https://bucket-name.s3.region.amazonaws.com/key
  https://my-bucket.s3.us-east-1.amazonaws.com/images/photo.jpg

Path style (legacy):
  https://s3.region.amazonaws.com/bucket-name/key
  https://s3.us-east-1.amazonaws.com/my-bucket/images/photo.jpg

S3 URI:
  s3://bucket-name/key
  s3://my-bucket/images/photo.jpg
```

In [None]:
# Setup: Import libraries and create clients
import boto3
from botocore.exceptions import ClientError, NoCredentialsError
import json
from datetime import datetime
import os
from typing import List, Dict, Optional, Any

# Create S3 client and resource
# Note: These will fail without AWS credentials - that's expected
try:
    s3_client = boto3.client('s3', region_name='us-east-1')
    s3_resource = boto3.resource('s3', region_name='us-east-1')
    print("S3 client and resource created successfully")
except Exception as e:
    print(f"Could not create S3 client: {e}")
    print("Continuing with examples - they will show code patterns")

## 2. Bucket Operations

### Bucket Naming Rules

- 3-63 characters long
- Lowercase letters, numbers, and hyphens only
- Must start with a letter or number
- **Globally unique** across ALL AWS accounts
- Cannot be formatted as an IP address

In [None]:
import re

def validate_bucket_name(name: str) -> tuple[bool, str]:
    """
    Validate S3 bucket name according to AWS rules.
    
    Args:
        name: Proposed bucket name
        
    Returns:
        Tuple of (is_valid, message)
    """
    # Check length
    if len(name) < 3 or len(name) > 63:
        return False, "Name must be 3-63 characters long"
    
    # Check for lowercase and valid characters
    if not re.match(r'^[a-z0-9][a-z0-9.-]*[a-z0-9]$', name):
        return False, "Must start/end with letter or number, contain only lowercase, numbers, hyphens, periods"
    
    # Check for consecutive periods
    if '..' in name:
        return False, "Cannot contain consecutive periods"
    
    # Check for IP address format
    if re.match(r'^\d+\.\d+\.\d+\.\d+$', name):
        return False, "Cannot be formatted as IP address"
    
    return True, "Valid bucket name"

# Test bucket names
test_names = [
    "my-bucket",
    "My-Bucket",  # Invalid - uppercase
    "ab",          # Invalid - too short
    "my..bucket",  # Invalid - consecutive periods
    "192.168.1.1", # Invalid - IP address
    "my-company-data-2024",
]

print("Bucket Name Validation:")
print("=" * 50)
for name in test_names:
    is_valid, message = validate_bucket_name(name)
    status = "VALID" if is_valid else "INVALID"
    print(f"{name:25} -> {status}: {message}")

### Creating Buckets

In [None]:
def create_bucket(
    bucket_name: str,
    region: str = 'us-east-1',
    enable_versioning: bool = False
) -> Dict[str, Any]:
    """
    Create an S3 bucket with optional versioning.
    
    Args:
        bucket_name: Name of the bucket to create
        region: AWS region for the bucket
        enable_versioning: Whether to enable versioning
        
    Returns:
        Dictionary with creation result
    """
    s3 = boto3.client('s3', region_name=region)
    
    try:
        # Note: us-east-1 doesn't require LocationConstraint
        if region == 'us-east-1':
            response = s3.create_bucket(Bucket=bucket_name)
        else:
            response = s3.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={
                    'LocationConstraint': region
                }
            )
        
        # Enable versioning if requested
        if enable_versioning:
            s3.put_bucket_versioning(
                Bucket=bucket_name,
                VersioningConfiguration={'Status': 'Enabled'}
            )
        
        return {
            'success': True,
            'bucket_name': bucket_name,
            'region': region,
            'versioning': enable_versioning,
            'location': response.get('Location')
        }
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        error_message = e.response['Error']['Message']
        return {
            'success': False,
            'error_code': error_code,
            'error_message': error_message
        }

# Example usage (would require real AWS credentials)
print("Example: Creating a bucket")
print("-" * 40)
print("""
result = create_bucket(
    bucket_name='my-unique-bucket-12345',
    region='us-west-2',
    enable_versioning=True
)
print(result)

# Expected output:
# {
#     'success': True,
#     'bucket_name': 'my-unique-bucket-12345',
#     'region': 'us-west-2',
#     'versioning': True,
#     'location': '/my-unique-bucket-12345'
# }
""")

### Listing Buckets

In [None]:
def list_buckets(s3_client=None) -> List[Dict[str, Any]]:
    """
    List all S3 buckets in the account.
    
    Returns:
        List of bucket information dictionaries
    """
    if s3_client is None:
        s3_client = boto3.client('s3')
    
    try:
        response = s3_client.list_buckets()
        
        buckets = []
        for bucket in response['Buckets']:
            # Get bucket region
            try:
                location = s3_client.get_bucket_location(Bucket=bucket['Name'])
                region = location['LocationConstraint'] or 'us-east-1'
            except ClientError:
                region = 'unknown'
            
            buckets.append({
                'name': bucket['Name'],
                'created': bucket['CreationDate'].isoformat(),
                'region': region
            })
        
        return buckets
        
    except ClientError as e:
        print(f"Error listing buckets: {e}")
        return []

# Demonstration with mock data
print("Example: List buckets output")
print("-" * 40)
mock_buckets = [
    {'name': 'my-app-data', 'created': '2024-01-15T10:30:00', 'region': 'us-east-1'},
    {'name': 'my-app-backups', 'created': '2024-02-20T14:45:00', 'region': 'us-west-2'},
    {'name': 'my-app-logs', 'created': '2024-03-10T09:00:00', 'region': 'eu-west-1'},
]

for bucket in mock_buckets:
    print(f"Bucket: {bucket['name']}")
    print(f"  Region: {bucket['region']}")
    print(f"  Created: {bucket['created']}")
    print()

### Deleting Buckets

In [None]:
def delete_bucket(bucket_name: str, force: bool = False) -> Dict[str, Any]:
    """
    Delete an S3 bucket.
    
    Args:
        bucket_name: Name of the bucket to delete
        force: If True, delete all objects first
        
    Returns:
        Dictionary with deletion result
    """
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    
    try:
        if force:
            # Delete all objects (including versions)
            bucket.object_versions.delete()
            print(f"Deleted all objects from {bucket_name}")
        
        # Delete the bucket
        bucket.delete()
        
        return {
            'success': True,
            'bucket_name': bucket_name,
            'message': 'Bucket deleted successfully'
        }
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        
        if error_code == 'BucketNotEmpty':
            return {
                'success': False,
                'error_code': error_code,
                'message': 'Bucket is not empty. Use force=True to delete all objects first.'
            }
        
        return {
            'success': False,
            'error_code': error_code,
            'message': e.response['Error']['Message']
        }

print("Example: Deleting a bucket")
print("-" * 40)
print("""
# Delete empty bucket
result = delete_bucket('my-empty-bucket')

# Force delete (removes all objects first)
result = delete_bucket('my-bucket-with-data', force=True)

# WARNING: force=True will permanently delete all data!
""")

## 3. Object Operations

### Uploading Files

In [None]:
from pathlib import Path

def upload_file(
    file_path: str,
    bucket_name: str,
    object_key: Optional[str] = None,
    metadata: Optional[Dict[str, str]] = None,
    tags: Optional[Dict[str, str]] = None
) -> Dict[str, Any]:
    """
    Upload a file to S3.
    
    Args:
        file_path: Local path to the file
        bucket_name: Target bucket name
        object_key: S3 key (default: filename)
        metadata: Custom metadata dictionary
        tags: Tags dictionary
        
    Returns:
        Dictionary with upload result
    """
    s3 = boto3.client('s3')
    
    # Use filename as key if not specified
    if object_key is None:
        object_key = Path(file_path).name
    
    # Build extra args
    extra_args = {}
    if metadata:
        extra_args['Metadata'] = metadata
    if tags:
        tag_string = '&'.join([f"{k}={v}" for k, v in tags.items()])
        extra_args['Tagging'] = tag_string
    
    try:
        s3.upload_file(
            Filename=file_path,
            Bucket=bucket_name,
            Key=object_key,
            ExtraArgs=extra_args if extra_args else None
        )
        
        return {
            'success': True,
            'bucket': bucket_name,
            'key': object_key,
            's3_uri': f's3://{bucket_name}/{object_key}'
        }
        
    except ClientError as e:
        return {
            'success': False,
            'error': str(e)
        }
    except FileNotFoundError:
        return {
            'success': False,
            'error': f'File not found: {file_path}'
        }

print("Example: Upload file")
print("-" * 40)
print("""
result = upload_file(
    file_path='./data/report.pdf',
    bucket_name='my-bucket',
    object_key='reports/2024/report.pdf',
    metadata={'author': 'John Doe', 'department': 'Finance'},
    tags={'project': 'quarterly-reports', 'year': '2024'}
)

# Output:
# {
#     'success': True,
#     'bucket': 'my-bucket',
#     'key': 'reports/2024/report.pdf',
#     's3_uri': 's3://my-bucket/reports/2024/report.pdf'
# }
""")

In [None]:
# Upload with progress callback
import sys
import threading

class UploadProgressCallback:
    """Callback class to track upload progress."""
    
    def __init__(self, filename: str):
        self._filename = filename
        self._size = os.path.getsize(filename)
        self._seen_so_far = 0
        self._lock = threading.Lock()
    
    def __call__(self, bytes_amount: int):
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            sys.stdout.write(
                f"\r{self._filename}: {self._seen_so_far}/{self._size} bytes "
                f"({percentage:.1f}%)"
            )
            sys.stdout.flush()

def upload_with_progress(file_path: str, bucket: str, key: str):
    """
    Upload a file with progress tracking.
    
    Args:
        file_path: Local file path
        bucket: S3 bucket name
        key: S3 object key
    """
    s3 = boto3.client('s3')
    
    callback = UploadProgressCallback(file_path)
    s3.upload_file(file_path, bucket, key, Callback=callback)
    print()  # New line after progress

print("Upload with progress tracking:")
print("-" * 40)
print("""
upload_with_progress(
    file_path='./large_file.zip',
    bucket='my-bucket',
    key='uploads/large_file.zip'
)

# Output:
# large_file.zip: 52428800/104857600 bytes (50.0%)
""")

### Uploading Data Directly (Without Files)

In [None]:
import io

def upload_data(
    data: bytes | str,
    bucket_name: str,
    object_key: str,
    content_type: Optional[str] = None
) -> Dict[str, Any]:
    """
    Upload data directly to S3 without a local file.
    
    Args:
        data: Bytes or string data to upload
        bucket_name: Target bucket
        object_key: S3 key
        content_type: MIME type of the content
        
    Returns:
        Dictionary with upload result
    """
    s3 = boto3.client('s3')
    
    # Convert string to bytes if necessary
    if isinstance(data, str):
        data = data.encode('utf-8')
    
    try:
        extra_args = {}
        if content_type:
            extra_args['ContentType'] = content_type
        
        s3.put_object(
            Bucket=bucket_name,
            Key=object_key,
            Body=data,
            **extra_args
        )
        
        return {
            'success': True,
            'bucket': bucket_name,
            'key': object_key,
            'size': len(data)
        }
        
    except ClientError as e:
        return {'success': False, 'error': str(e)}

# Example: Upload JSON data
print("Example: Upload JSON directly")
print("-" * 40)
print("""
config = {
    'app_name': 'MyApp',
    'version': '1.0.0',
    'settings': {'debug': False, 'log_level': 'INFO'}
}

result = upload_data(
    data=json.dumps(config, indent=2),
    bucket_name='my-bucket',
    object_key='config/app_config.json',
    content_type='application/json'
)
""")

### Downloading Files

In [None]:
def download_file(
    bucket_name: str,
    object_key: str,
    local_path: str
) -> Dict[str, Any]:
    """
    Download a file from S3.
    
    Args:
        bucket_name: Source bucket
        object_key: S3 object key
        local_path: Local file path to save to
        
    Returns:
        Dictionary with download result
    """
    s3 = boto3.client('s3')
    
    # Create directory if needed
    Path(local_path).parent.mkdir(parents=True, exist_ok=True)
    
    try:
        s3.download_file(bucket_name, object_key, local_path)
        
        return {
            'success': True,
            'local_path': local_path,
            'size': os.path.getsize(local_path)
        }
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            return {'success': False, 'error': 'Object not found'}
        return {'success': False, 'error': str(e)}

def download_to_memory(bucket_name: str, object_key: str) -> bytes:
    """
    Download S3 object directly to memory.
    
    Args:
        bucket_name: Source bucket
        object_key: S3 object key
        
    Returns:
        Object content as bytes
    """
    s3 = boto3.client('s3')
    
    response = s3.get_object(Bucket=bucket_name, Key=object_key)
    return response['Body'].read()

print("Example: Download files")
print("-" * 40)
print("""
# Download to file
result = download_file(
    bucket_name='my-bucket',
    object_key='reports/2024/report.pdf',
    local_path='./downloads/report.pdf'
)

# Download to memory
data = download_to_memory('my-bucket', 'config/app_config.json')
config = json.loads(data)
""")

### Listing Objects

In [None]:
def list_objects(
    bucket_name: str,
    prefix: str = '',
    max_keys: int = 1000
) -> List[Dict[str, Any]]:
    """
    List objects in an S3 bucket.
    
    Args:
        bucket_name: Bucket to list
        prefix: Filter by prefix (folder path)
        max_keys: Maximum number of keys to return
        
    Returns:
        List of object information dictionaries
    """
    s3 = boto3.client('s3')
    objects = []
    
    paginator = s3.get_paginator('list_objects_v2')
    
    for page in paginator.paginate(
        Bucket=bucket_name,
        Prefix=prefix,
        PaginationConfig={'MaxItems': max_keys}
    ):
        for obj in page.get('Contents', []):
            objects.append({
                'key': obj['Key'],
                'size': obj['Size'],
                'last_modified': obj['LastModified'].isoformat(),
                'storage_class': obj.get('StorageClass', 'STANDARD')
            })
    
    return objects

# Mock example output
print("Example: List objects output")
print("-" * 40)
mock_objects = [
    {'key': 'data/users.json', 'size': 1024, 'last_modified': '2024-03-15T10:00:00', 'storage_class': 'STANDARD'},
    {'key': 'data/products.json', 'size': 2048, 'last_modified': '2024-03-15T11:00:00', 'storage_class': 'STANDARD'},
    {'key': 'backups/db_backup.sql', 'size': 5242880, 'last_modified': '2024-03-14T00:00:00', 'storage_class': 'GLACIER'},
]

for obj in mock_objects:
    size_kb = obj['size'] / 1024
    print(f"{obj['key']:30} {size_kb:>10.1f} KB  {obj['storage_class']}")

In [None]:
def list_folders(bucket_name: str, prefix: str = '') -> List[str]:
    """
    List 'folders' (common prefixes) in an S3 bucket.
    
    Args:
        bucket_name: Bucket to list
        prefix: Parent prefix to list folders under
        
    Returns:
        List of folder prefixes
    """
    s3 = boto3.client('s3')
    
    response = s3.list_objects_v2(
        Bucket=bucket_name,
        Prefix=prefix,
        Delimiter='/'  # This groups results by 'folder'
    )
    
    folders = []
    for cp in response.get('CommonPrefixes', []):
        folders.append(cp['Prefix'])
    
    return folders

print("Example: List folders")
print("-" * 40)
print("""
# Bucket structure:
# my-bucket/
#   data/
#     users.json
#     products.json
#   backups/
#     db_backup.sql
#   logs/
#     2024/
#       app.log

folders = list_folders('my-bucket')
# Result: ['data/', 'backups/', 'logs/']

folders = list_folders('my-bucket', prefix='logs/')
# Result: ['logs/2024/']
""")

## 4. Presigned URLs

Presigned URLs allow temporary access to private S3 objects without sharing credentials.

In [None]:
def generate_presigned_url(
    bucket_name: str,
    object_key: str,
    expiration: int = 3600,
    operation: str = 'get_object'
) -> Optional[str]:
    """
    Generate a presigned URL for S3 operations.
    
    Args:
        bucket_name: S3 bucket name
        object_key: S3 object key
        expiration: URL expiration time in seconds (default: 1 hour)
        operation: S3 operation ('get_object' or 'put_object')
        
    Returns:
        Presigned URL or None if error
    """
    s3 = boto3.client('s3')
    
    try:
        url = s3.generate_presigned_url(
            operation,
            Params={
                'Bucket': bucket_name,
                'Key': object_key
            },
            ExpiresIn=expiration
        )
        return url
        
    except ClientError as e:
        print(f"Error generating presigned URL: {e}")
        return None

print("Example: Presigned URLs")
print("-" * 40)
print("""
# Generate download URL (valid for 1 hour)
download_url = generate_presigned_url(
    bucket_name='my-bucket',
    object_key='reports/confidential.pdf',
    expiration=3600  # 1 hour
)

# Result:
# https://my-bucket.s3.amazonaws.com/reports/confidential.pdf?
#   X-Amz-Algorithm=AWS4-HMAC-SHA256&
#   X-Amz-Credential=...&
#   X-Amz-Date=...&
#   X-Amz-Expires=3600&
#   X-Amz-Signature=...

# Users can download using this URL without AWS credentials
""")

In [None]:
def generate_presigned_upload_url(
    bucket_name: str,
    object_key: str,
    expiration: int = 3600,
    content_type: Optional[str] = None,
    max_size: Optional[int] = None
) -> Dict[str, Any]:
    """
    Generate a presigned POST URL for uploads with conditions.
    
    Args:
        bucket_name: S3 bucket name
        object_key: S3 object key
        expiration: URL expiration time in seconds
        content_type: Required content type
        max_size: Maximum file size in bytes
        
    Returns:
        Dictionary with URL and fields for form upload
    """
    s3 = boto3.client('s3')
    
    conditions = []
    fields = {}
    
    if content_type:
        conditions.append({'Content-Type': content_type})
        fields['Content-Type'] = content_type
    
    if max_size:
        conditions.append(['content-length-range', 1, max_size])
    
    try:
        response = s3.generate_presigned_post(
            Bucket=bucket_name,
            Key=object_key,
            Fields=fields,
            Conditions=conditions,
            ExpiresIn=expiration
        )
        return response
        
    except ClientError as e:
        return {'error': str(e)}

print("Example: Presigned upload URL")
print("-" * 40)
print("""
# Generate upload URL (for user uploads)
upload_data = generate_presigned_upload_url(
    bucket_name='my-bucket',
    object_key='user-uploads/${filename}',
    expiration=300,  # 5 minutes
    content_type='image/jpeg',
    max_size=5*1024*1024  # 5 MB
)

# Use in HTML form:
# <form action="{upload_data['url']}" method="post" enctype="multipart/form-data">
#     {for field, value in upload_data['fields'].items()}
#         <input type="hidden" name="{field}" value="{value}">
#     {endfor}
#     <input type="file" name="file">
#     <input type="submit" value="Upload">
# </form>
""")

## 5. Object Metadata and Tags

In [None]:
def get_object_metadata(bucket_name: str, object_key: str) -> Dict[str, Any]:
    """
    Get metadata for an S3 object.
    
    Args:
        bucket_name: S3 bucket name
        object_key: S3 object key
        
    Returns:
        Dictionary with object metadata
    """
    s3 = boto3.client('s3')
    
    try:
        response = s3.head_object(Bucket=bucket_name, Key=object_key)
        
        return {
            'content_type': response.get('ContentType'),
            'content_length': response.get('ContentLength'),
            'last_modified': response.get('LastModified'),
            'etag': response.get('ETag'),
            'metadata': response.get('Metadata', {}),
            'storage_class': response.get('StorageClass', 'STANDARD')
        }
        
    except ClientError as e:
        return {'error': str(e)}

def set_object_tags(
    bucket_name: str,
    object_key: str,
    tags: Dict[str, str]
) -> bool:
    """
    Set tags on an S3 object.
    
    Args:
        bucket_name: S3 bucket name
        object_key: S3 object key
        tags: Dictionary of tags
        
    Returns:
        True if successful
    """
    s3 = boto3.client('s3')
    
    tag_set = [{'Key': k, 'Value': v} for k, v in tags.items()]
    
    try:
        s3.put_object_tagging(
            Bucket=bucket_name,
            Key=object_key,
            Tagging={'TagSet': tag_set}
        )
        return True
    except ClientError:
        return False

print("Example: Object metadata and tags")
print("-" * 40)
print("""
# Get metadata
metadata = get_object_metadata('my-bucket', 'data/users.json')
# {
#     'content_type': 'application/json',
#     'content_length': 1024,
#     'last_modified': datetime(2024, 3, 15, 10, 0, 0),
#     'etag': '"abc123..."',
#     'metadata': {'author': 'John', 'version': '1.0'},
#     'storage_class': 'STANDARD'
# }

# Set tags
set_object_tags(
    'my-bucket',
    'data/users.json',
    {'environment': 'production', 'data-type': 'pii', 'retention': '7-years'}
)
""")

---

## Exercises

### Exercise 1: S3 File Manager Class

Create a class that provides a simple interface for common S3 operations.

In [None]:
# Exercise 1: Your code here
from typing import List, Dict, Any, Optional

class S3FileManager:
    """Simple interface for S3 file operations."""
    
    def __init__(self, bucket_name: str, region: str = 'us-east-1'):
        """Initialize the file manager."""
        pass
    
    def upload(self, local_path: str, s3_path: str) -> bool:
        """Upload a file."""
        pass
    
    def download(self, s3_path: str, local_path: str) -> bool:
        """Download a file."""
        pass
    
    def list(self, prefix: str = '') -> List[str]:
        """List files with optional prefix."""
        pass
    
    def delete(self, s3_path: str) -> bool:
        """Delete a file."""
        pass
    
    def exists(self, s3_path: str) -> bool:
        """Check if a file exists."""
        pass

# Test your class
# manager = S3FileManager('my-bucket')
# manager.upload('./data.json', 'uploads/data.json')

<details>
<summary>Click to see solution</summary>

```python
import boto3
from botocore.exceptions import ClientError
from pathlib import Path
from typing import List, Dict, Any, Optional

class S3FileManager:
    """Simple interface for S3 file operations."""
    
    def __init__(self, bucket_name: str, region: str = 'us-east-1'):
        """
        Initialize the file manager.
        
        Args:
            bucket_name: S3 bucket to manage
            region: AWS region
        """
        self.bucket_name = bucket_name
        self.region = region
        self.client = boto3.client('s3', region_name=region)
        self.resource = boto3.resource('s3', region_name=region)
        self.bucket = self.resource.Bucket(bucket_name)
    
    def upload(self, local_path: str, s3_path: str) -> bool:
        """
        Upload a file to S3.
        
        Args:
            local_path: Local file path
            s3_path: S3 object key
            
        Returns:
            True if successful
        """
        try:
            self.client.upload_file(local_path, self.bucket_name, s3_path)
            return True
        except (ClientError, FileNotFoundError) as e:
            print(f"Upload failed: {e}")
            return False
    
    def download(self, s3_path: str, local_path: str) -> bool:
        """
        Download a file from S3.
        
        Args:
            s3_path: S3 object key
            local_path: Local file path
            
        Returns:
            True if successful
        """
        try:
            # Create directory if needed
            Path(local_path).parent.mkdir(parents=True, exist_ok=True)
            self.client.download_file(self.bucket_name, s3_path, local_path)
            return True
        except ClientError as e:
            print(f"Download failed: {e}")
            return False
    
    def list(self, prefix: str = '') -> List[str]:
        """
        List files with optional prefix.
        
        Args:
            prefix: S3 prefix to filter by
            
        Returns:
            List of object keys
        """
        keys = []
        for obj in self.bucket.objects.filter(Prefix=prefix):
            keys.append(obj.key)
        return keys
    
    def delete(self, s3_path: str) -> bool:
        """
        Delete a file from S3.
        
        Args:
            s3_path: S3 object key
            
        Returns:
            True if successful
        """
        try:
            self.client.delete_object(Bucket=self.bucket_name, Key=s3_path)
            return True
        except ClientError as e:
            print(f"Delete failed: {e}")
            return False
    
    def exists(self, s3_path: str) -> bool:
        """
        Check if a file exists in S3.
        
        Args:
            s3_path: S3 object key
            
        Returns:
            True if exists
        """
        try:
            self.client.head_object(Bucket=self.bucket_name, Key=s3_path)
            return True
        except ClientError:
            return False
    
    def get_url(self, s3_path: str, expiration: int = 3600) -> Optional[str]:
        """
        Get a presigned URL for a file.
        
        Args:
            s3_path: S3 object key
            expiration: URL expiration in seconds
            
        Returns:
            Presigned URL
        """
        try:
            return self.client.generate_presigned_url(
                'get_object',
                Params={'Bucket': self.bucket_name, 'Key': s3_path},
                ExpiresIn=expiration
            )
        except ClientError:
            return None

# Example usage
manager = S3FileManager('my-bucket', 'us-west-2')
print(f"Managing bucket: {manager.bucket_name}")
```
</details>

### Exercise 2: Sync Local Directory to S3

Create a function that syncs a local directory to an S3 prefix (like `aws s3 sync`).

In [None]:
# Exercise 2: Your code here
import hashlib

def sync_to_s3(
    local_dir: str,
    bucket_name: str,
    s3_prefix: str = '',
    delete: bool = False
) -> Dict[str, List[str]]:
    """
    Sync a local directory to S3.
    
    Args:
        local_dir: Local directory path
        bucket_name: S3 bucket name
        s3_prefix: S3 prefix (folder)
        delete: If True, delete S3 objects not in local
        
    Returns:
        Dictionary with 'uploaded', 'skipped', 'deleted' lists
    """
    # Your implementation here
    pass

# Test
# result = sync_to_s3('./my_data', 'my-bucket', 'data/')

<details>
<summary>Click to see solution</summary>

```python
import boto3
import hashlib
from pathlib import Path
from typing import Dict, List
from botocore.exceptions import ClientError

def sync_to_s3(
    local_dir: str,
    bucket_name: str,
    s3_prefix: str = '',
    delete: bool = False
) -> Dict[str, List[str]]:
    """
    Sync a local directory to S3.
    
    Args:
        local_dir: Local directory path
        bucket_name: S3 bucket name
        s3_prefix: S3 prefix (folder)
        delete: If True, delete S3 objects not in local
        
    Returns:
        Dictionary with 'uploaded', 'skipped', 'deleted' lists
    """
    s3 = boto3.client('s3')
    local_path = Path(local_dir)
    
    result = {
        'uploaded': [],
        'skipped': [],
        'deleted': []
    }
    
    def get_md5(file_path: str) -> str:
        """Calculate MD5 hash of a file."""
        hash_md5 = hashlib.md5()
        with open(file_path, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b''):
                hash_md5.update(chunk)
        return hash_md5.hexdigest()
    
    def get_s3_etag(bucket: str, key: str) -> str:
        """Get ETag of S3 object."""
        try:
            response = s3.head_object(Bucket=bucket, Key=key)
            # Remove quotes from ETag
            return response['ETag'].strip('"')
        except ClientError:
            return ''
    
    # Get all local files
    local_files = {}
    for file_path in local_path.rglob('*'):
        if file_path.is_file():
            relative_path = file_path.relative_to(local_path)
            s3_key = f"{s3_prefix}{relative_path}".replace('\\', '/')
            local_files[s3_key] = str(file_path)
    
    # Upload new or changed files
    for s3_key, file_path in local_files.items():
        local_md5 = get_md5(file_path)
        s3_etag = get_s3_etag(bucket_name, s3_key)
        
        if local_md5 != s3_etag:
            s3.upload_file(file_path, bucket_name, s3_key)
            result['uploaded'].append(s3_key)
        else:
            result['skipped'].append(s3_key)
    
    # Delete files in S3 that don't exist locally
    if delete:
        paginator = s3.get_paginator('list_objects_v2')
        for page in paginator.paginate(Bucket=bucket_name, Prefix=s3_prefix):
            for obj in page.get('Contents', []):
                if obj['Key'] not in local_files:
                    s3.delete_object(Bucket=bucket_name, Key=obj['Key'])
                    result['deleted'].append(obj['Key'])
    
    return result

# Example usage
print("Syncing ./my_data to s3://my-bucket/data/")
# result = sync_to_s3('./my_data', 'my-bucket', 'data/', delete=True)
# print(f"Uploaded: {len(result['uploaded'])} files")
# print(f"Skipped: {len(result['skipped'])} files")
# print(f"Deleted: {len(result['deleted'])} files")
```
</details>

### Exercise 3: Batch Delete with Prefix

Create a function to efficiently delete all objects matching a prefix.

In [None]:
# Exercise 3: Your code here

def batch_delete(bucket_name: str, prefix: str, dry_run: bool = True) -> Dict[str, Any]:
    """
    Delete all objects matching a prefix.
    
    Args:
        bucket_name: S3 bucket
        prefix: Prefix to match
        dry_run: If True, only list what would be deleted
        
    Returns:
        Dictionary with deletion results
    """
    # Your implementation here
    pass

# Test
# result = batch_delete('my-bucket', 'old-data/', dry_run=True)

<details>
<summary>Click to see solution</summary>

```python
import boto3
from typing import Dict, Any, List
from botocore.exceptions import ClientError

def batch_delete(bucket_name: str, prefix: str, dry_run: bool = True) -> Dict[str, Any]:
    """
    Delete all objects matching a prefix.
    
    Args:
        bucket_name: S3 bucket
        prefix: Prefix to match
        dry_run: If True, only list what would be deleted
        
    Returns:
        Dictionary with deletion results
    """
    s3 = boto3.client('s3')
    
    result = {
        'dry_run': dry_run,
        'bucket': bucket_name,
        'prefix': prefix,
        'objects_found': 0,
        'objects_deleted': 0,
        'errors': []
    }
    
    # Collect objects to delete
    objects_to_delete = []
    paginator = s3.get_paginator('list_objects_v2')
    
    for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
        for obj in page.get('Contents', []):
            objects_to_delete.append({'Key': obj['Key']})
    
    result['objects_found'] = len(objects_to_delete)
    
    if dry_run:
        print(f"DRY RUN: Would delete {len(objects_to_delete)} objects")
        for obj in objects_to_delete[:10]:  # Show first 10
            print(f"  - {obj['Key']}")
        if len(objects_to_delete) > 10:
            print(f"  ... and {len(objects_to_delete) - 10} more")
        return result
    
    # Delete in batches of 1000 (S3 limit)
    batch_size = 1000
    
    for i in range(0, len(objects_to_delete), batch_size):
        batch = objects_to_delete[i:i + batch_size]
        
        try:
            response = s3.delete_objects(
                Bucket=bucket_name,
                Delete={'Objects': batch}
            )
            
            result['objects_deleted'] += len(response.get('Deleted', []))
            
            for error in response.get('Errors', []):
                result['errors'].append({
                    'key': error['Key'],
                    'code': error['Code'],
                    'message': error['Message']
                })
                
        except ClientError as e:
            result['errors'].append({'error': str(e)})
    
    return result

# Example
print("Batch delete example:")
# result = batch_delete('my-bucket', 'old-data/', dry_run=True)
# print(f"Found: {result['objects_found']} objects")
```
</details>

### Exercise 4: Generate Report of Bucket Usage

Create a function that generates a usage report for a bucket.

In [None]:
# Exercise 4: Your code here

def generate_bucket_report(bucket_name: str) -> Dict[str, Any]:
    """
    Generate a usage report for an S3 bucket.
    
    Returns:
        Dictionary with:
        - total_objects: count
        - total_size: bytes
        - by_prefix: breakdown by top-level prefix
        - by_extension: breakdown by file extension
        - by_storage_class: breakdown by storage class
    """
    # Your implementation here
    pass

<details>
<summary>Click to see solution</summary>

```python
import boto3
from collections import defaultdict
from typing import Dict, Any
from pathlib import Path

def generate_bucket_report(bucket_name: str) -> Dict[str, Any]:
    """
    Generate a usage report for an S3 bucket.
    
    Returns:
        Dictionary with detailed bucket statistics
    """
    s3 = boto3.client('s3')
    
    report = {
        'bucket_name': bucket_name,
        'total_objects': 0,
        'total_size_bytes': 0,
        'by_prefix': defaultdict(lambda: {'count': 0, 'size': 0}),
        'by_extension': defaultdict(lambda: {'count': 0, 'size': 0}),
        'by_storage_class': defaultdict(lambda: {'count': 0, 'size': 0}),
    }
    
    paginator = s3.get_paginator('list_objects_v2')
    
    for page in paginator.paginate(Bucket=bucket_name):
        for obj in page.get('Contents', []):
            key = obj['Key']
            size = obj['Size']
            storage_class = obj.get('StorageClass', 'STANDARD')
            
            report['total_objects'] += 1
            report['total_size_bytes'] += size
            
            # By prefix (top-level folder)
            prefix = key.split('/')[0] if '/' in key else '(root)'
            report['by_prefix'][prefix]['count'] += 1
            report['by_prefix'][prefix]['size'] += size
            
            # By extension
            ext = Path(key).suffix.lower() or '(no extension)'
            report['by_extension'][ext]['count'] += 1
            report['by_extension'][ext]['size'] += size
            
            # By storage class
            report['by_storage_class'][storage_class]['count'] += 1
            report['by_storage_class'][storage_class]['size'] += size
    
    # Convert defaultdicts to regular dicts
    report['by_prefix'] = dict(report['by_prefix'])
    report['by_extension'] = dict(report['by_extension'])
    report['by_storage_class'] = dict(report['by_storage_class'])
    
    # Add human-readable size
    def human_size(size_bytes):
        for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
            if size_bytes < 1024.0:
                return f"{size_bytes:.2f} {unit}"
            size_bytes /= 1024.0
        return f"{size_bytes:.2f} PB"
    
    report['total_size_human'] = human_size(report['total_size_bytes'])
    
    return report

def print_report(report: Dict[str, Any]):
    """Print a formatted bucket report."""
    print(f"\nBucket Report: {report['bucket_name']}")
    print("=" * 50)
    print(f"Total Objects: {report['total_objects']:,}")
    print(f"Total Size: {report['total_size_human']}")
    
    print("\nBy Prefix:")
    for prefix, stats in sorted(report['by_prefix'].items(), 
                                 key=lambda x: x[1]['size'], reverse=True)[:5]:
        print(f"  {prefix:20} {stats['count']:>8,} files  {stats['size']:>15,} bytes")
    
    print("\nBy Extension:")
    for ext, stats in sorted(report['by_extension'].items(), 
                              key=lambda x: x[1]['size'], reverse=True)[:5]:
        print(f"  {ext:20} {stats['count']:>8,} files  {stats['size']:>15,} bytes")
    
    print("\nBy Storage Class:")
    for storage, stats in report['by_storage_class'].items():
        print(f"  {storage:20} {stats['count']:>8,} files  {stats['size']:>15,} bytes")

# Example
# report = generate_bucket_report('my-bucket')
# print_report(report)
```
</details>

### Exercise 5: Implement Copy Between Buckets

Create a function to copy objects between buckets, optionally changing storage class.

In [None]:
# Exercise 5: Your code here

def copy_between_buckets(
    source_bucket: str,
    source_prefix: str,
    dest_bucket: str,
    dest_prefix: str,
    storage_class: str = 'STANDARD'
) -> Dict[str, Any]:
    """
    Copy objects between S3 buckets.
    
    Args:
        source_bucket: Source bucket name
        source_prefix: Source prefix
        dest_bucket: Destination bucket name
        dest_prefix: Destination prefix
        storage_class: Storage class for copied objects
        
    Returns:
        Dictionary with copy results
    """
    # Your implementation here
    pass

<details>
<summary>Click to see solution</summary>

```python
import boto3
from typing import Dict, Any
from botocore.exceptions import ClientError

def copy_between_buckets(
    source_bucket: str,
    source_prefix: str,
    dest_bucket: str,
    dest_prefix: str,
    storage_class: str = 'STANDARD'
) -> Dict[str, Any]:
    """
    Copy objects between S3 buckets.
    
    Args:
        source_bucket: Source bucket name
        source_prefix: Source prefix
        dest_bucket: Destination bucket name
        dest_prefix: Destination prefix
        storage_class: Storage class for copied objects
            Options: STANDARD, STANDARD_IA, ONEZONE_IA, 
                     INTELLIGENT_TIERING, GLACIER, DEEP_ARCHIVE
        
    Returns:
        Dictionary with copy results
    """
    s3 = boto3.client('s3')
    
    result = {
        'copied': [],
        'failed': [],
        'total_size': 0
    }
    
    # List source objects
    paginator = s3.get_paginator('list_objects_v2')
    
    for page in paginator.paginate(Bucket=source_bucket, Prefix=source_prefix):
        for obj in page.get('Contents', []):
            source_key = obj['Key']
            
            # Calculate destination key
            relative_key = source_key[len(source_prefix):]
            dest_key = f"{dest_prefix}{relative_key}"
            
            try:
                # Copy object
                copy_source = {
                    'Bucket': source_bucket,
                    'Key': source_key
                }
                
                s3.copy_object(
                    CopySource=copy_source,
                    Bucket=dest_bucket,
                    Key=dest_key,
                    StorageClass=storage_class
                )
                
                result['copied'].append({
                    'source': f"{source_bucket}/{source_key}",
                    'destination': f"{dest_bucket}/{dest_key}",
                    'size': obj['Size']
                })
                result['total_size'] += obj['Size']
                
            except ClientError as e:
                result['failed'].append({
                    'source': f"{source_bucket}/{source_key}",
                    'error': str(e)
                })
    
    return result

# Example usage
print("Copy between buckets example:")
print("""
result = copy_between_buckets(
    source_bucket='production-data',
    source_prefix='2024/01/',
    dest_bucket='backup-bucket',
    dest_prefix='archive/2024/01/',
    storage_class='GLACIER'
)

print(f"Copied: {len(result['copied'])} objects")
print(f"Failed: {len(result['failed'])} objects")
print(f"Total size: {result['total_size']} bytes")
""")
```
</details>

---

## Summary

In this notebook, we covered:

1. **S3 Fundamentals**
   - Buckets, objects, and keys
   - URL formats and naming rules

2. **Bucket Operations**
   - Creating buckets with versioning
   - Listing and deleting buckets

3. **Object Operations**
   - Uploading files and data
   - Downloading to file or memory
   - Listing and filtering objects

4. **Presigned URLs**
   - Temporary download URLs
   - Secure upload URLs

5. **Metadata and Tags**
   - Custom metadata
   - Object tagging for management

### Key Takeaways

- Use **presigned URLs** for secure temporary access
- **Paginate** when listing large buckets
- Use **batch operations** for deleting multiple objects
- Consider **storage classes** for cost optimization
- Always **validate bucket names** before creation

### AWS Cost Warning

> **S3 Pricing**: You pay for:
> - Storage (per GB/month)
> - Requests (PUT, GET, LIST, etc.)
> - Data transfer OUT
> 
> **Tips**:
> - Use lifecycle policies to transition to cheaper storage
> - Delete unused objects
> - Use Intelligent-Tiering for unpredictable access patterns

---

## Next Steps

Continue to [03_other_services.ipynb](03_other_services.ipynb) to learn about other AWS services:
- DynamoDB for NoSQL databases
- Lambda for serverless computing
- SQS and SNS for messaging