# AWS S3 Operations - Comprehensive Guide

This notebook provides a complete walkthrough of AWS S3 operations using boto3 SDK.

## Prerequisites - Manual AWS Setup

Before running this notebook, the following setup was completed manually in AWS Console:

### Step 1: S3 Bucket Creation
- Logged into AWS account
- Created S3 bucket named **"real-learn-s3"** in **us-east-1** region
- Configuration:
  - Public access: **BLOCKED** (security best practice)
  - Versioning: **ENABLED** (maintains file history)
- Uploaded initial test file: **bookings.csv**

### Step 2: IAM User Configuration
- Created IAM user with **programmatic access** (no console access needed)
- Attached policy: **AmazonS3FullAccess** (full S3 permissions)
- Generated access credentials:
  - Access Key ID
  - Secret Access Key
- Saved credentials securely in `.env` file (never commit to git)

### Step 3: Environment File
Created `.env` file with the following variables:
```
AWS_BUCKET_NAME=real-learn-s3
AWS_ACCESS_KEY=your-access-key-id
AWS_SECRET_KEY=your-secret-access-key
AWS_REGION=us-east-1
```

### Step 4: Python Environment
- Installed boto3 library for AWS SDK
- Installed python-dotenv for secure credential loading
- Created initial Python script to verify S3 connection and list objects

Now we're ready to explore S3 operations programmatically.

---

## What You'll Learn

1. **S3 Fundamentals** - Understanding buckets and objects
2. **Bucket Operations** - List, create, and manage buckets
3. **Object Operations** - Upload, download, list, and delete files
4. **Advanced Features** - Metadata, presigned URLs, progress tracking
5. **Error Handling** - Robust exception management
6. **Best Practices** - Security and performance optimization

## S3 Architecture Overview

```
┌─────────────────────────────────────────────────────────┐
│  AWS S3 STRUCTURE                                       │
│                                                         │
│  Your AWS Account                                       │
│  └─► Buckets (Globally unique names)                    │
│      ├─► Bucket 1: my-app-data-2024                     │
│      │   ├─► Object: file1.txt                          │
│      │   ├─► Object: images/photo.jpg                   │
│      │   └─► Object: data/2024/records.csv              │
│      │                                                  │
│      └─► Bucket 2: backups-prod                         │
│          ├─► Object: backup-2024-01.tar.gz              │
│          └─► Object: logs/app.log                       │
│                                                         │
│  Key Concepts:                                          │
│  • Bucket = Container (like a root folder)              │
│  • Object = File with metadata                          │
│  • Key = Full path/name (e.g., "folder/file.txt")       │
│  • Region = Geographic location                         │
│  • Storage Class = Cost/access tier                     │
└─────────────────────────────────────────────────────────┘
```

In [18]:
import boto3
from dotenv import load_dotenv
load_dotenv()
import os

## 1. Environment Setup

Loading AWS credentials securely from environment variables using `.env` file.

**Security Best Practice:** Never hardcode credentials in your code!

In [19]:
BUCKET_NAME = os.getenv("AWS_BUCKET_NAME")    
ACCESS_KEY = os.getenv("AWS_ACCESS_KEY")
SECRET_KEY = os.getenv("AWS_SECRET_KEY")
AWS_REGION = os.getenv("AWS_REGION")

## 2. AWS Credentials Configuration

Required environment variables in your `.env` file:
```
AWS_BUCKET_NAME=your-bucket-name
AWS_ACCESS_KEY=your-access-key-id
AWS_SECRET_KEY=your-secret-access-key
AWS_REGION=us-east-1
```

## 3. Initialize S3 Client

boto3 provides two interfaces:
- **Client**: Low-level service access (more control)
- **Resource**: Higher-level object-oriented interface (easier to use)

We'll use the **client** interface for maximum flexibility.

In [20]:
# create an s3 client
from botocore.config import Config

# Configure S3 client with Signature Version 4 and regional endpoint
# Regional endpoint is required for presigned URLs to work correctly

s3_client = boto3.client('s3', 
                         endpoint_url=f'https://s3.{AWS_REGION}.amazonaws.com',
                         config=Config(signature_version='s3v4'),
                         region_name=AWS_REGION,
                         aws_secret_access_key=SECRET_KEY,
                         aws_access_key_id=ACCESS_KEY,
                         )

## 4. Bucket Operations

### Operation Flow
```
┌─────────────────────────────────────────┐
│  BUCKET LIFECYCLE                       │
│                                         │
│  1. CREATE → Bucket exists in S3        │
│  2. LIST → View all your buckets        │
│  3. CONFIGURE → Set permissions/policies│
│  4. USE → Upload/download objects       │
│  5. DELETE → Remove bucket (if empty)   │
└─────────────────────────────────────────┘
```

## 5. Create Bucket

Create a new S3 bucket in your account.

**Important Rules:**
- Bucket names must be globally unique across ALL AWS accounts
- Only lowercase letters, numbers, hyphens, and dots allowed
- Must be 3-63 characters long
- Cannot start or end with a hyphen
- Region matters for latency and compliance

**Regions:**
- `us-east-1` is the default and doesn't require LocationConstraint
- Other regions need explicit LocationConstraint configuration

In [21]:
from botocore.exceptions import ClientError

def create_bucket(bucket_name, region=None):
    """
    Create an S3 bucket in a specified region
    
    Args:
        bucket_name (str): Name for the bucket (must be globally unique)
        region (str): AWS region (if None, uses default from client)
    
    Returns:
        bool: True if bucket created, False otherwise
    """
    try:
        if region is None or region == 'us-east-1':
            # us-east-1 doesn't require LocationConstraint
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            # Other regions require LocationConstraint
            s3_client.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
        print(f"SUCCESS: Bucket '{bucket_name}' created successfully")
        return True
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'BucketAlreadyExists':
            print(f"ERROR: Bucket '{bucket_name}' already exists (owned by someone else)")
        elif error_code == 'BucketAlreadyOwnedByYou':
            print(f"INFO: Bucket '{bucket_name}' already exists and is owned by you")
        else:
            print(f"ERROR: Failed to create bucket - {e}")
        return False

# Example usage (uncomment to test):
create_bucket(f"{BUCKET_NAME}-test", region=AWS_REGION)

INFO: Bucket 'real-learn-s3-test' already exists and is owned by you


False

## 6. Upload File to S3

Upload files from your local filesystem to an S3 bucket.

**Key Parameters:**
- `file_name`: Path to local file
- `bucket`: Target S3 bucket name
- `object_name`: Key (path/name) in S3 (if None, uses file_name)

**Use Cases:**
- Backup files to cloud storage
- Store application data
- Host static website content
- Archive logs and reports

**File Types:** Any file type supported (images, videos, documents, code, etc.)

In [22]:
def upload_file(file_name, bucket, object_name=None):
    """
    Upload a file to an S3 bucket
    
    Args:
        file_name (str): Path to file to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (if None, uses file_name)
    
    Returns:
        bool: True if upload successful, False otherwise
    """
    # If S3 object_name not specified, use file_name
    if object_name is None:
        object_name = os.path.basename(file_name)
    
    try:
        s3_client.upload_file(file_name, bucket, object_name)
        print(f"SUCCESS: '{file_name}' uploaded to '{bucket}/{object_name}'")
        return True
    except FileNotFoundError:
        print(f"ERROR: File '{file_name}' not found")
        return False
    except ClientError as e:
        print(f"ERROR: Failed to upload file - {e}")
        return False

# Example usage (uncomment to test):
upload_file('data/hosts.csv', BUCKET_NAME, 'data/hosts.csv')

SUCCESS: 'data/hosts.csv' uploaded to 'real-learn-s3/data/hosts.csv'


True

## 7. Download File from S3

Download objects from S3 to your local filesystem.

**Key Parameters:**
- `bucket`: Source S3 bucket name
- `object_name`: Key (path/name) in S3
- `file_name`: Destination path on local filesystem

**Important Notes:**
- Creates directories automatically if they don't exist
- Overwrites existing local files without warning
- Preserves file content but not S3 metadata

**Common Pattern:**
```python
# Download backup
download_file('my-bucket', 'backups/db-2024.sql', './local-backups/db.sql')

# Download with same name
download_file('my-bucket', 'report.pdf', 'report.pdf')
```

In [23]:
def download_file(bucket, object_name, file_name):
    """
    Download a file from an S3 bucket
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name to download
        file_name (str): Local file path to save
    
    Returns:
        bool: True if download successful, False otherwise
    """
    try:
        # Create directory if it doesn't exist
        os.makedirs(os.path.dirname(file_name), exist_ok=True)
        
        s3_client.download_file(bucket, object_name, file_name)
        print(f"SUCCESS: '{bucket}/{object_name}' downloaded to '{file_name}'")
        return True
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to download file - {e}")
        return False
    except Exception as e:
        print(f"ERROR: Unexpected error - {e}")
        return False

# Example usage (uncomment to test):
download_file(BUCKET_NAME, 'raw/bookings.csv', './downloads/bookings.csv')

SUCCESS: 'real-learn-s3/raw/bookings.csv' downloaded to './downloads/bookings.csv'


True

## 8. Upload String Content to S3

Upload string/text data directly to S3 without creating a local file first.

**Use Cases:**
- Store JSON data from API responses
- Save generated text, logs, or reports
- Write configuration files
- Store processed data without local disk I/O

**Advantages:**
- No temporary files needed
- Faster for small text content
- Memory-efficient for string data
- Direct encoding control (UTF-8 by default)

**Example Scenarios:**
```python
# Save JSON data
upload_string('{"status": "ok"}', 'my-bucket', 'config.json')

# Save log entry
upload_string('2024-01-28: System started', 'logs-bucket', 'app.log')
```

In [24]:
def upload_string(content, bucket, object_name):
    """
    Upload string content directly to S3
    
    Args:
        content (str): String content to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (key)
    
    Returns:
        bool: True if upload successful, False otherwise
    """
    try:
        s3_client.put_object(
            Bucket=bucket,
            Key=object_name,
            Body=content.encode('utf-8')
        )
        print(f"SUCCESS: String content uploaded to '{bucket}/{object_name}'")
        return True
    except ClientError as e:
        print(f"ERROR: Failed to upload string - {e}")
        return False

# Example usage (uncomment to test):
upload_string('Hello from S3!', BUCKET_NAME, 'test/hello.txt')

SUCCESS: String content uploaded to 'real-learn-s3/test/hello.txt'


True

## 9. Read Object Content from S3

Read S3 object content directly into memory without downloading to a file.

**Use Cases:**
- Read configuration files
- Load small datasets for processing
- Retrieve API responses or JSON data
- Stream log files for analysis

**Important Notes:**
- Best for small to medium files (avoid for GB-sized files)
- Returns content as bytes (decode to string if needed)
- Entire file loaded into memory
- More efficient than download + read for small files

**Performance Tips:**
- For large files: Use download_file() or streaming
- For text files: Decode with proper encoding (UTF-8 default)
- For binary files: Use raw bytes without decoding

In [25]:
def read_object(bucket, object_name):
    """
    Read S3 object content directly into memory
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name to read
    
    Returns:
        str: File content as string, or None if error
    """
    try:
        response = s3_client.get_object(Bucket=bucket, Key=object_name)
        content = response['Body'].read().decode('utf-8')
        print(f"SUCCESS: Read {len(content)} characters from '{bucket}/{object_name}'")
        return content
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'NoSuchKey':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to read object - {e}")
        return None
    except Exception as e:
        print(f"ERROR: Unexpected error - {e}")
        return None

# Example usage (uncomment to test):
content = read_object(BUCKET_NAME, 'test/hello.txt')
if content:
    print(f"Content: {content}")

SUCCESS: Read 14 characters from 'real-learn-s3/test/hello.txt'
Content: Hello from S3!


## 10. List Objects with Details

List all objects in a bucket with comprehensive metadata.

**Retrieved Information:**
- Object Key (name/path)
- Size (bytes, KB, MB)
- Last Modified date
- Storage Class (STANDARD, GLACIER, etc.)
- ETag (MD5 hash for data integrity)

**Use Cases:**
- Audit bucket contents
- Monitor storage usage
- Track file modifications
- Verify data integrity
- Generate inventory reports

**Performance Notes:**
- Returns up to 1000 objects per request
- For buckets with more objects, use pagination (not shown here for simplicity)
- Can filter by prefix to list specific folders

In [26]:
from datetime import datetime

def list_objects_detailed(bucket, prefix=''):
    """
    List all objects in a bucket with detailed metadata
    
    Args:
        bucket (str): Bucket name
        prefix (str): Filter objects by prefix (folder path)
    
    Returns:
        list: List of object metadata dictionaries, or empty list if error
    """
    try:
        response = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)
        
        if 'Contents' not in response:
            print(f"No objects found in bucket '{bucket}' with prefix '{prefix}'")
            return []
        
        objects = []
        print(f"Objects in '{bucket}/{prefix}':")
        
        for obj in response['Contents']:
            # Convert size to human-readable format
            size_bytes = obj['Size']
            if size_bytes < 1024:
                size_str = f"{size_bytes} B"
            elif size_bytes < 1024**2:
                size_str = f"{size_bytes/1024:.2f} KB"
            else:
                size_str = f"{size_bytes/(1024**2):.2f} MB"
            
            # Format last modified date
            last_modified = obj['LastModified'].strftime('%Y-%m-%d %H:%M:%S')
            
            print(f"Key: {obj['Key']}")
            print(f"Size: {size_str} ({size_bytes:,} bytes)")
            print(f"Last Modified: {last_modified}")
            print(f"Storage Class: {obj.get('StorageClass', 'STANDARD')}")
            print(f"ETag: {obj['ETag']}")
            
            objects.append(obj)
        
        print(f"Total: {len(objects)} object(s)")
        return objects
        
    except ClientError as e:
        print(f"ERROR: Failed to list objects - {e}")
        return []

# Example usage (uncomment to test):
list_objects_detailed(BUCKET_NAME)

Objects in 'real-learn-s3/':
Key: data/hosts-with-meta.csv
Size: 12.78 KB (13,083 bytes)
Last Modified: 2026-01-29 05:56:48
Storage Class: STANDARD
ETag: "7588197a4f4c485949e7bfc641356122"
Key: data/hosts.csv
Size: 12.78 KB (13,083 bytes)
Last Modified: 2026-01-29 06:02:40
Storage Class: STANDARD
ETag: "7588197a4f4c485949e7bfc641356122"
Key: raw/
Size: 0 B (0 bytes)
Last Modified: 2026-01-29 03:49:08
Storage Class: STANDARD
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Key: raw/bookings.csv
Size: 501.35 KB (513,378 bytes)
Last Modified: 2026-01-29 03:49:39
Storage Class: STANDARD
ETag: "203775ebda6b0e99de614895de78159f"
Key: test/hello.txt
Size: 14 B (14 bytes)
Last Modified: 2026-01-29 06:02:41
Storage Class: STANDARD
ETag: "e19169950b1b59f05e3412e2f3975a3b"
Total: 5 object(s)


[{'Key': 'data/hosts-with-meta.csv',
  'LastModified': datetime.datetime(2026, 1, 29, 5, 56, 48, tzinfo=tzutc()),
  'ETag': '"7588197a4f4c485949e7bfc641356122"',
  'ChecksumAlgorithm': ['CRC32'],
  'ChecksumType': 'FULL_OBJECT',
  'Size': 13083,
  'StorageClass': 'STANDARD'},
 {'Key': 'data/hosts.csv',
  'LastModified': datetime.datetime(2026, 1, 29, 6, 2, 40, tzinfo=tzutc()),
  'ETag': '"7588197a4f4c485949e7bfc641356122"',
  'ChecksumAlgorithm': ['CRC32'],
  'ChecksumType': 'FULL_OBJECT',
  'Size': 13083,
  'StorageClass': 'STANDARD'},
 {'Key': 'raw/',
  'LastModified': datetime.datetime(2026, 1, 29, 3, 49, 8, tzinfo=tzutc()),
  'ETag': '"d41d8cd98f00b204e9800998ecf8427e"',
  'ChecksumAlgorithm': ['CRC64NVME'],
  'ChecksumType': 'FULL_OBJECT',
  'Size': 0,
  'StorageClass': 'STANDARD'},
 {'Key': 'raw/bookings.csv',
  'LastModified': datetime.datetime(2026, 1, 29, 3, 49, 39, tzinfo=tzutc()),
  'ETag': '"203775ebda6b0e99de614895de78159f"',
  'ChecksumAlgorithm': ['CRC64NVME'],
  'Checks

In [27]:
list_objects_detailed(BUCKET_NAME, prefix='data/')

Objects in 'real-learn-s3/data/':
Key: data/hosts-with-meta.csv
Size: 12.78 KB (13,083 bytes)
Last Modified: 2026-01-29 05:56:48
Storage Class: STANDARD
ETag: "7588197a4f4c485949e7bfc641356122"
Key: data/hosts.csv
Size: 12.78 KB (13,083 bytes)
Last Modified: 2026-01-29 06:02:40
Storage Class: STANDARD
ETag: "7588197a4f4c485949e7bfc641356122"
Total: 2 object(s)


[{'Key': 'data/hosts-with-meta.csv',
  'LastModified': datetime.datetime(2026, 1, 29, 5, 56, 48, tzinfo=tzutc()),
  'ETag': '"7588197a4f4c485949e7bfc641356122"',
  'ChecksumAlgorithm': ['CRC32'],
  'ChecksumType': 'FULL_OBJECT',
  'Size': 13083,
  'StorageClass': 'STANDARD'},
 {'Key': 'data/hosts.csv',
  'LastModified': datetime.datetime(2026, 1, 29, 6, 2, 40, tzinfo=tzutc()),
  'ETag': '"7588197a4f4c485949e7bfc641356122"',
  'ChecksumAlgorithm': ['CRC32'],
  'ChecksumType': 'FULL_OBJECT',
  'Size': 13083,
  'StorageClass': 'STANDARD'}]

## 11. Delete Object from S3

Delete a specific object (file) from an S3 bucket.

**Important Warnings:**
- Deletion is PERMANENT (unless versioning is enabled)
- No confirmation prompt - deletes immediately
- Cannot be undone for non-versioned buckets
- Returns success even if object doesn't exist

**Best Practices:**
- Always verify object key before deletion
- Enable versioning for important buckets
- Use lifecycle policies for automated cleanup
- Consider archiving to Glacier before deletion

**Safety Tips:**
```python
# List objects first to verify
list_objects_detailed(bucket, prefix='folder/')

# Then delete specific object
delete_object(bucket, 'folder/file.txt')
```

In [28]:
def delete_object(bucket, object_name):
    """
    Delete an object from S3 bucket
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name (key) to delete
    
    Returns:
        bool: True if deletion successful, False otherwise
    """
    try:
        s3_client.delete_object(Bucket=bucket, Key=object_name)
        print(f"SUCCESS: Object '{object_name}' deleted from bucket '{bucket}'")
        return True
    except ClientError as e:
        print(f"ERROR: Failed to delete object - {e}")
        return False

# Example usage (uncomment to test with caution):
delete_object(BUCKET_NAME, 'test/hello.txt')

# list the object to confirm deletion
list_objects_detailed(BUCKET_NAME, prefix='test/')

SUCCESS: Object 'test/hello.txt' deleted from bucket 'real-learn-s3'


No objects found in bucket 'real-learn-s3' with prefix 'test/'


[]

## 12. Object Metadata Operations

S3 allows you to attach custom metadata to objects for organization and tracking.

**What is Metadata?**
- Key-value pairs attached to S3 objects
- Two types: System metadata (AWS-managed) and User metadata (custom)
- User metadata keys must start with `x-amz-meta-` prefix
- Useful for categorization, tracking, and application logic

**System Metadata (AWS-managed):**
- Content-Type (MIME type)
- Content-Length (file size)
- Last-Modified (timestamp)
- ETag (version identifier)

**User Metadata (Custom):**
- Any custom key-value pairs
- Examples: author, version, department, project-id
- Maximum 2KB total metadata size
- Cannot be updated after upload (must re-upload object)

**Use Cases:**
- Tag files by department, project, or owner
- Store application-specific data
- Track file versions or processing status
- Add searchable attributes

In [29]:
def get_object_metadata(bucket, object_name):
    """
    Retrieve metadata for an S3 object
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name (key)
    
    Returns:
        dict: Metadata dictionary, or None if error
    """
    try:
        response = s3_client.head_object(Bucket=bucket, Key=object_name)
        
        print(f"Metadata for '{bucket}/{object_name}':")
        
        # System metadata
        print("SYSTEM METADATA:")
        print(f"Content-Type: {response.get('ContentType', 'N/A')}")
        print(f"Content-Length: {response.get('ContentLength', 0):,} bytes")
        print(f"Last-Modified: {response.get('LastModified', 'N/A')}")
        print(f"ETag: {response.get('ETag', 'N/A')}")
        print(f"Storage-Class: {response.get('StorageClass', 'STANDARD')}")
        
        # User metadata (custom)
        user_metadata = response.get('Metadata', {})
        if user_metadata:
            print("USER METADATA (Custom):")
            for key, value in user_metadata.items():
                print(f"  {key}: {value}")
        else:
            print("USER METADATA: None")
        
        return response
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to get metadata - {e}")
        return None

# Example usage (uncomment to test):
get_object_metadata(BUCKET_NAME, 'data/hosts.csv')

Metadata for 'real-learn-s3/data/hosts.csv':
SYSTEM METADATA:
Content-Type: binary/octet-stream
Content-Length: 13,083 bytes
Last-Modified: 2026-01-29 06:02:40+00:00
ETag: "7588197a4f4c485949e7bfc641356122"
Storage-Class: STANDARD
USER METADATA: None


{'ResponseMetadata': {'RequestId': 'YAPXGBHT668TXHYK',
  'HostId': '5Df/VbMimaH+k0twYiW3ciArqmCZeEdlJA92EUXDlTrbmGYs8By2ZTwXcJnBh/LeaSJYCga7o8E=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '5Df/VbMimaH+k0twYiW3ciArqmCZeEdlJA92EUXDlTrbmGYs8By2ZTwXcJnBh/LeaSJYCga7o8E=',
   'x-amz-request-id': 'YAPXGBHT668TXHYK',
   'date': 'Thu, 29 Jan 2026 06:02:41 GMT',
   'last-modified': 'Thu, 29 Jan 2026 06:02:40 GMT',
   'etag': '"7588197a4f4c485949e7bfc641356122"',
   'x-amz-server-side-encryption': 'AES256',
   'accept-ranges': 'bytes',
   'content-type': 'binary/octet-stream',
   'content-length': '13083',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'AcceptRanges': 'bytes',
 'LastModified': datetime.datetime(2026, 1, 29, 6, 2, 40, tzinfo=tzutc()),
 'ContentLength': 13083,
 'ETag': '"7588197a4f4c485949e7bfc641356122"',
 'ContentType': 'binary/octet-stream',
 'ServerSideEncryption': 'AES256',
 'Metadata': {}}

In [30]:
def upload_with_metadata(file_name, bucket, object_name=None, metadata=None):
    """
    Upload a file to S3 with custom metadata
    
    Args:
        file_name (str): Path to file to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (if None, uses file_name)
        metadata (dict): Custom metadata key-value pairs
    
    Returns:
        bool: True if upload successful, False otherwise
    
    Example metadata:
        {
            'author': 'John Doe',
            'department': 'Engineering',
            'version': '1.0',
            'project': 'data-analysis'
        }
    """
    if object_name is None:
        object_name = os.path.basename(file_name)
    
    if metadata is None:
        metadata = {}
    
    try:
        # Upload with metadata
        s3_client.upload_file(
            file_name, 
            bucket, 
            object_name,
            ExtraArgs={'Metadata': metadata}
        )
        
        print(f"SUCCESS: '{file_name}' uploaded to '{bucket}/{object_name}'")
        if metadata:
            print("With metadata:")
            for key, value in metadata.items():
                print(f"  {key}: {value}")
        return True
        
    except FileNotFoundError:
        print(f"ERROR: File '{file_name}' not found")
        return False
    except ClientError as e:
        print(f"ERROR: Failed to upload file - {e}")
        return False

# Example usage (uncomment to test):
metadata = {
    'author': 'Data Team',
    'department': 'Analytics',
    'version': '1.0',
    'description': 'CSV data file'
}
upload_with_metadata('data/hosts.csv', BUCKET_NAME, 'data/hosts-with-meta.csv', metadata)

SUCCESS: 'data/hosts.csv' uploaded to 'real-learn-s3/data/hosts-with-meta.csv'
With metadata:
  author: Data Team
  department: Analytics
  version: 1.0
  description: CSV data file


True

In [31]:
# # Verify metadata was saved
get_object_metadata(BUCKET_NAME, 'data/hosts-with-meta.csv')

Metadata for 'real-learn-s3/data/hosts-with-meta.csv':
SYSTEM METADATA:
Content-Type: binary/octet-stream
Content-Length: 13,083 bytes
Last-Modified: 2026-01-29 06:02:41+00:00
ETag: "7588197a4f4c485949e7bfc641356122"
Storage-Class: STANDARD
USER METADATA (Custom):
  department: Analytics
  version: 1.0
  author: Data Team
  description: CSV data file


{'ResponseMetadata': {'RequestId': 'YAPTCYMZN7420R0T',
  'HostId': 'LT1zUH6REHmFVXEiMlCtpl4ScRDkk/IO5CskhA0B/AQE5tayMw+cJWoHpdvJsHsa43hcXd2WhHs=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'LT1zUH6REHmFVXEiMlCtpl4ScRDkk/IO5CskhA0B/AQE5tayMw+cJWoHpdvJsHsa43hcXd2WhHs=',
   'x-amz-request-id': 'YAPTCYMZN7420R0T',
   'date': 'Thu, 29 Jan 2026 06:02:41 GMT',
   'last-modified': 'Thu, 29 Jan 2026 06:02:41 GMT',
   'etag': '"7588197a4f4c485949e7bfc641356122"',
   'x-amz-server-side-encryption': 'AES256',
   'x-amz-meta-department': 'Analytics',
   'x-amz-meta-version': '1.0',
   'x-amz-meta-author': 'Data Team',
   'x-amz-meta-description': 'CSV data file',
   'accept-ranges': 'bytes',
   'content-type': 'binary/octet-stream',
   'content-length': '13083',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'AcceptRanges': 'bytes',
 'LastModified': datetime.datetime(2026, 1, 29, 6, 2, 41, tzinfo=tzutc()),
 'ContentLength': 13083,
 'ETag': '"7588197a4f4c485949e7bfc641356122"',
 'C

## 13. Presigned URLs

Generate temporary, secure URLs that grant time-limited access to S3 objects without requiring AWS credentials.

**What are Presigned URLs?**
- Temporary URLs with embedded authentication
- Allow public access to private S3 objects for a limited time
- No AWS credentials needed by the recipient
- Expires after specified duration (default 1 hour, max 7 days)

**How They Work:**
```
1. Your app generates presigned URL with your credentials
2. URL includes signature and expiration time
3. Share URL with user (email, web page, etc.)
4. User accesses S3 directly using the URL
5. URL expires after set time period
```

**Use Cases:**
- Share private files temporarily (reports, images, videos)
- Allow users to download files without authentication
- Enable temporary file uploads from web forms
- Distribute time-limited content (tickets, invoices)
- Securely share large files without email attachments

**Security Benefits:**
- No credential sharing required
- Automatic expiration prevents long-term access
- Can revoke by deleting original object
- Audit trail in CloudTrail logs

In [32]:
def generate_presigned_download_url(bucket, object_name, expiration=3600):
    """
    Generate a presigned URL for downloading an S3 object
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name (key)
        expiration (int): URL expiration time in seconds (default 3600 = 1 hour)
    
    Returns:
        str: Presigned URL, or None if error
    
    Common expiration times:
        - 3600 = 1 hour (default)
        - 7200 = 2 hours
        - 86400 = 24 hours
        - 604800 = 7 days (maximum)
    
    Note: Uses AWS Signature Version 4 (required by S3)
    """
    try:
        url = s3_client.generate_presigned_url(
            'get_object',
            Params={
                'Bucket': bucket,
                'Key': object_name
            },
            ExpiresIn=expiration
        )
        
        print(f"SUCCESS: Presigned URL generated for '{bucket}/{object_name}'")
        print(f"Expires in: {expiration} seconds ({expiration/3600:.1f} hours)")
        print(f"\nURL (valid for {expiration/3600:.1f} hours):")
        print(url)
        print("\nAnyone with this URL can download the file until it expires.")
        
        return url
        
    except ClientError as e:
        print(f"ERROR: Failed to generate presigned URL - {e}")
        return None


# Example usage (uncomment to test):
url = generate_presigned_download_url(BUCKET_NAME, 'data/hosts.csv', expiration=3600)

SUCCESS: Presigned URL generated for 'real-learn-s3/data/hosts.csv'
Expires in: 3600 seconds (1.0 hours)

URL (valid for 1.0 hours):
https://s3.us-east-2.amazonaws.com/real-learn-s3/data/hosts.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWPVGU3OO6BQJAVE3%2F20260129%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20260129T060241Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=34b2f2b03a62ad3d171556fdb530ae4c94cfe538df7939ec323ff64ab1e19f71

Anyone with this URL can download the file until it expires.


In [33]:
# # Test the URL (uncomment to download using the URL)
import requests
response = requests.get(url)
if response.status_code == 200:
    print(f"Successfully downloaded {len(response.content)} bytes")

Successfully downloaded 13083 bytes


In [34]:
def generate_presigned_upload_url(bucket, object_name, expiration=3600):
    """
    Generate a presigned URL for uploading to S3
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name (key) where file will be uploaded
        expiration (int): URL expiration time in seconds (default 3600 = 1 hour)
    
    Returns:
        str: Presigned URL, or None if error
    
    Use case:
        Allow users to upload files directly to S3 without AWS credentials
        Common in web forms, mobile apps, and file sharing applications
    
    Note: Uses AWS Signature Version 4 (required by S3)
    """
    try:
        url = s3_client.generate_presigned_url(
            'put_object',
            Params={
                'Bucket': bucket,
                'Key': object_name
            },
            ExpiresIn=expiration
        )
        
        print(f"SUCCESS: Presigned upload URL generated for '{bucket}/{object_name}'")
        print(f"Expires in: {expiration} seconds ({expiration/3600:.1f} hours)")
        print(f"URL (valid for {expiration/3600:.1f} hours):")
        print(url)
        print("Anyone with this URL can upload a file to this location until it expires.")
        print("Upload using HTTP PUT method.")
        
        return url
        
    except ClientError as e:
        print(f"ERROR: Failed to generate presigned upload URL - {e}")
        return None

# Example usage (uncomment to test):

upload_url = generate_presigned_upload_url(BUCKET_NAME, 'uploads/user-file.txt', expiration=3600)

SUCCESS: Presigned upload URL generated for 'real-learn-s3/uploads/user-file.txt'
Expires in: 3600 seconds (1.0 hours)
URL (valid for 1.0 hours):
https://s3.us-east-2.amazonaws.com/real-learn-s3/uploads/user-file.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWPVGU3OO6BQJAVE3%2F20260129%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20260129T060241Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=4784f860ac1afb85460b93871c632ecb1257e1ef8cc56804fb70db593b0208a5
Anyone with this URL can upload a file to this location until it expires.
Upload using HTTP PUT method.


In [35]:
# # Test the upload URL (uncomment to upload using the URL)
import requests
test_data = b"This is test content uploaded via presigned URL"
response = requests.put(upload_url, data=test_data)
if response.status_code == 200:
    print("File uploaded successfully via presigned URL!")
    # Verify the upload
    get_object_metadata(BUCKET_NAME, 'uploads/user-file.txt')

File uploaded successfully via presigned URL!
Metadata for 'real-learn-s3/uploads/user-file.txt':
SYSTEM METADATA:
Content-Type: binary/octet-stream
Content-Length: 47 bytes
Last-Modified: 2026-01-29 06:02:42+00:00
ETag: "603b54cde85cdaad11167f9df423d785"
Storage-Class: STANDARD
USER METADATA: None


## 14. Progress Tracking for Large Files

Monitor upload and download progress for large files to provide user feedback.

**Why Progress Tracking?**
- Essential for large file transfers (videos, backups, datasets)
- Prevents user frustration by showing operation status
- Helps detect stalled or slow transfers
- Enables progress bars in applications

**How It Works:**
- Uses callback functions that execute during transfer
- Callback receives bytes transferred so far
- Can calculate percentage, speed, and ETA
- Works with upload_file() and download_file() operations

**Use Cases:**
- Web applications with file upload forms
- Backup systems with large data transfers
- Data migration tools
- Video/media upload platforms
- Batch file processing systems

**Implementation:**
boto3 provides a `Callback` parameter that accepts a function called periodically during transfer with the number of bytes transferred.

In [36]:
class ProgressTracker:
    """
    Track upload/download progress with visual feedback
    """
    def __init__(self, filename, filesize):
        self.filename = filename
        self.filesize = filesize
        self.bytes_transferred = 0
        self.start_time = None
        
    def __call__(self, bytes_amount):
        """
        Called by boto3 during transfer
        
        Args:
            bytes_amount (int): Bytes transferred in this chunk
        """
        import time
        
        if self.start_time is None:
            self.start_time = time.time()
        
        self.bytes_transferred += bytes_amount
        
        # Calculate progress percentage
        percentage = (self.bytes_transferred / self.filesize) * 100
        
        # Calculate transfer speed
        elapsed_time = time.time() - self.start_time
        if elapsed_time > 0:
            speed_mbps = (self.bytes_transferred / (1024 * 1024)) / elapsed_time
        else:
            speed_mbps = 0
        
        # Calculate ETA
        if speed_mbps > 0:
            remaining_mb = (self.filesize - self.bytes_transferred) / (1024 * 1024)
            eta_seconds = remaining_mb / speed_mbps
            eta_str = f"{int(eta_seconds)}s"
        else:
            eta_str = "calculating..."
        
        # Display progress
        print(f"\r{self.filename}: {percentage:.1f}% | "
              f"{self.bytes_transferred/(1024*1024):.2f}/{self.filesize/(1024*1024):.2f} MB | "
              f"Speed: {speed_mbps:.2f} MB/s | ETA: {eta_str}", end='')
        
        # Print newline when complete
        if self.bytes_transferred >= self.filesize:
            print()  # New line after completion

In [37]:
def upload_file_with_progress(file_name, bucket, object_name=None):
    """
    Upload a file to S3 with progress tracking
    
    Args:
        file_name (str): Path to file to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (if None, uses file_name)
    
    Returns:
        bool: True if upload successful, False otherwise
    
    Example output:
        hosts.csv: 45.2% | 5.43/12.00 MB | Speed: 2.31 MB/s | ETA: 3s
    """
    if object_name is None:
        object_name = os.path.basename(file_name)
    
    try:
        # Get file size
        filesize = os.path.getsize(file_name)
        
        # Create progress tracker
        progress = ProgressTracker(os.path.basename(file_name), filesize)
        
        # Upload with progress callback
        s3_client.upload_file(
            file_name, 
            bucket, 
            object_name,
            Callback=progress
        )
        
        print(f"SUCCESS: '{file_name}' uploaded to '{bucket}/{object_name}'")
        return True
        
    except FileNotFoundError:
        print(f"ERROR: File '{file_name}' not found")
        return False
    except ClientError as e:
        print(f"ERROR: Failed to upload file - {e}")
        return False

# Example usage (uncomment to test with a larger file):
upload_file_with_progress('data/hosts.csv', BUCKET_NAME, 'data/hosts-progress-test.csv')

hosts.csv: 100.0% | 0.01/0.01 MB | Speed: 4025.54 MB/s | ETA: 0s
SUCCESS: 'data/hosts.csv' uploaded to 'real-learn-s3/data/hosts-progress-test.csv'


True

In [39]:
def download_file_with_progress(bucket, object_name, file_name):
    """
    Download a file from S3 with progress tracking
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name to download
        file_name (str): Local file path to save
    
    Returns:
        bool: True if download successful, False otherwise
    
    Example output:
        hosts.csv: 78.5% | 9.42/12.00 MB | Speed: 3.45 MB/s | ETA: 1s
    """
    try:
        # Create directory if it doesn't exist
        os.makedirs(os.path.dirname(file_name), exist_ok=True)
        
        # Get file size from S3
        response = s3_client.head_object(Bucket=bucket, Key=object_name)
        filesize = response['ContentLength']
        
        # Create progress tracker
        progress = ProgressTracker(os.path.basename(file_name), filesize)
        
        # Download with progress callback
        s3_client.download_file(
            bucket, 
            object_name, 
            file_name,
            Callback=progress
        )
        
        print(f"SUCCESS: '{bucket}/{object_name}' downloaded to '{file_name}'")
        return True
        
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to download file - {e}")
        return False
    except Exception as e:
        print(f"ERROR: Unexpected error - {e}")
        return False

# Example usage (uncomment to test):
download_file_with_progress(BUCKET_NAME, 'data/hosts.csv', './downloads/hosts-progress-test.csv')

hosts-progress-test.csv: 100.0% | 0.01/0.01 MB | Speed: 13083.00 MB/s | ETA: 0s
SUCCESS: 'real-learn-s3/data/hosts.csv' downloaded to './downloads/hosts-progress-test.csv'


True

### Progress Tracking Demo

Test the progress tracking with a file upload and download.

**Note:** Progress updates work best with larger files (several MB or more). For very small files, the transfer may complete too quickly to see detailed progress.

In [40]:
# Test upload with progress tracking
print("Testing upload with progress tracking:")
upload_file_with_progress('data/hosts.csv', BUCKET_NAME, 'data/hosts-progress-demo.csv')

Testing upload with progress tracking:
hosts.csv: 100.0% | 0.01/0.01 MB | Speed: 5814.67 MB/s | ETA: 0s
SUCCESS: 'data/hosts.csv' uploaded to 'real-learn-s3/data/hosts-progress-demo.csv'


True

In [41]:
# Test download with progress tracking
print("Testing download with progress tracking:")
download_file_with_progress(BUCKET_NAME, 'data/hosts.csv', './downloads/hosts-progress-demo.csv')

Testing download with progress tracking:
hosts-progress-demo.csv: 100.0% | 0.01/0.01 MB | Speed: 4025.54 MB/s | ETA: 0s
SUCCESS: 'real-learn-s3/data/hosts.csv' downloaded to './downloads/hosts-progress-demo.csv'


True

## Summary - What We've Learned

This notebook covered comprehensive S3 operations in a progressive, hands-on approach:

### Phase 1: Basic Operations
- **create_bucket()** - Create S3 buckets with region awareness
- **upload_file()** - Upload files from local filesystem
- **download_file()** - Download objects to local storage

### Phase 2: Content Operations
- **upload_string()** - Direct text upload without local files
- **read_object()** - Read content directly into memory

### Phase 3: Management Operations
- **list_objects_detailed()** - Comprehensive metadata listing
- **delete_object()** - Safe object deletion

### Phase 4: Advanced Features
**Part 1 - Metadata:**
- **get_object_metadata()** - Retrieve system and user metadata
- **upload_with_metadata()** - Attach custom key-value pairs

**Part 2 - Presigned URLs:**
- **generate_presigned_download_url()** - Temporary download URLs
- **generate_presigned_upload_url()** - Temporary upload URLs

**Part 3 - Progress Tracking:**
- **ProgressTracker** class - Real-time transfer monitoring
- **upload_file_with_progress()** - Upload with progress feedback
- **download_file_with_progress()** - Download with progress feedback