# AWS S3 Operations - Comprehensive Guide

This notebook provides a complete walkthrough of AWS S3 operations using boto3 SDK.

## Prerequisites - Manual AWS Setup

Before running this notebook, the following setup was completed manually in AWS Console:

### Step 1: S3 Bucket Creation
- Logged into AWS account
- Created S3 bucket named **"real-learn-s3"** in **us-east-1** region
- Configuration:
  - Public access: **BLOCKED** (security best practice)
  - Versioning: **ENABLED** (maintains file history)
- Uploaded initial test file: **bookings.csv**

### Step 2: IAM User Configuration
- Created IAM user with **programmatic access** (no console access needed)
- Attached policy: **AmazonS3FullAccess** (full S3 permissions)
- Generated access credentials:
  - Access Key ID
  - Secret Access Key
- Saved credentials securely in `.env` file (never commit to git)

### Step 3: Environment File
Created `.env` file with the following variables:
```
AWS_BUCKET_NAME=real-learn-s3
AWS_ACCESS_KEY=your-access-key-id
AWS_SECRET_KEY=your-secret-access-key
AWS_REGION=us-east-1
```

### Step 4: Python Environment
- Installed boto3 library for AWS SDK
- Installed python-dotenv for secure credential loading
- Created initial Python script to verify S3 connection and list objects

Now we're ready to explore S3 operations programmatically.

---

## What You'll Learn

1. **S3 Fundamentals** - Understanding buckets and objects
2. **Bucket Operations** - List, create, and manage buckets
3. **Object Operations** - Upload, download, list, and delete files
4. **Advanced Features** - Metadata, presigned URLs, progress tracking
5. **Error Handling** - Robust exception management
6. **Best Practices** - Security and performance optimization

## S3 Architecture Overview

```
┌─────────────────────────────────────────────────────────┐
│  AWS S3 STRUCTURE                                       │
│                                                         │
│  Your AWS Account                                       │
│  └─► Buckets (Globally unique names)                    │
│      ├─► Bucket 1: my-app-data-2024                     │
│      │   ├─► Object: file1.txt                          │
│      │   ├─► Object: images/photo.jpg                   │
│      │   └─► Object: data/2024/records.csv              │
│      │                                                  │
│      └─► Bucket 2: backups-prod                         │
│          ├─► Object: backup-2024-01.tar.gz              │
│          └─► Object: logs/app.log                       │
│                                                         │
│  Key Concepts:                                          │
│  • Bucket = Container (like a root folder)              │
│  • Object = File with metadata                          │
│  • Key = Full path/name (e.g., "folder/file.txt")       │
│  • Region = Geographic location                         │
│  • Storage Class = Cost/access tier                     │
└─────────────────────────────────────────────────────────┘
```

In [6]:
import boto3
from dotenv import load_dotenv
load_dotenv()
import os

## 1. Environment Setup

Loading AWS credentials securely from environment variables using `.env` file.

**Security Best Practice:** Never hardcode credentials in your code!

In [9]:
BUCKET_NAME = os.getenv("AWS_BUCKET_NAME")    
ACCESS_KEY = os.getenv("AWS_ACCESS_KEY")
SECRET_KEY = os.getenv("AWS_SECRET_KEY")
AWS_REGION = os.getenv("AWS_REGION")

## 2. AWS Credentials Configuration

Required environment variables in your `.env` file:
```
AWS_BUCKET_NAME=your-bucket-name
AWS_ACCESS_KEY=your-access-key-id
AWS_SECRET_KEY=your-secret-access-key
AWS_REGION=us-east-1
```

## 3. Initialize S3 Client

boto3 provides two interfaces:
- **Client**: Low-level service access (more control)
- **Resource**: Higher-level object-oriented interface (easier to use)

We'll use the **client** interface for maximum flexibility.

In [10]:
# create an s3 client
s3_client = boto3.client('s3', region_name=AWS_REGION, 
                         aws_access_key_id=ACCESS_KEY, 
                         aws_secret_access_key=SECRET_KEY)

## 4. Bucket Operations

### Operation Flow
```
┌─────────────────────────────────────────┐
│  BUCKET LIFECYCLE                       │
│                                         │
│  1. CREATE → Bucket exists in S3        │
│  2. LIST → View all your buckets        │
│  3. CONFIGURE → Set permissions/policies│
│  4. USE → Upload/download objects       │
│  5. DELETE → Remove bucket (if empty)   │
└─────────────────────────────────────────┘
```

## 5. Create Bucket

Create a new S3 bucket in your account.

**Important Rules:**
- Bucket names must be globally unique across ALL AWS accounts
- Only lowercase letters, numbers, hyphens, and dots allowed
- Must be 3-63 characters long
- Cannot start or end with a hyphen
- Region matters for latency and compliance

**Regions:**
- `us-east-1` is the default and doesn't require LocationConstraint
- Other regions need explicit LocationConstraint configuration

In [11]:
from botocore.exceptions import ClientError

def create_bucket(bucket_name, region=None):
    """
    Create an S3 bucket in a specified region
    
    Args:
        bucket_name (str): Name for the bucket (must be globally unique)
        region (str): AWS region (if None, uses default from client)
    
    Returns:
        bool: True if bucket created, False otherwise
    """
    try:
        if region is None or region == 'us-east-1':
            # us-east-1 doesn't require LocationConstraint
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            # Other regions require LocationConstraint
            s3_client.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
        print(f"SUCCESS: Bucket '{bucket_name}' created successfully")
        return True
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'BucketAlreadyExists':
            print(f"ERROR: Bucket '{bucket_name}' already exists (owned by someone else)")
        elif error_code == 'BucketAlreadyOwnedByYou':
            print(f"INFO: Bucket '{bucket_name}' already exists and is owned by you")
        else:
            print(f"ERROR: Failed to create bucket - {e}")
        return False

# Example usage (uncomment to test):
create_bucket(f"{BUCKET_NAME}-test", region=AWS_REGION)

SUCCESS: Bucket 'real-learn-s3-test' created successfully


True

## 6. Upload File to S3

Upload files from your local filesystem to an S3 bucket.

**Key Parameters:**
- `file_name`: Path to local file
- `bucket`: Target S3 bucket name
- `object_name`: Key (path/name) in S3 (if None, uses file_name)

**Use Cases:**
- Backup files to cloud storage
- Store application data
- Host static website content
- Archive logs and reports

**File Types:** Any file type supported (images, videos, documents, code, etc.)

In [18]:
def upload_file(file_name, bucket, object_name=None):
    """
    Upload a file to an S3 bucket
    
    Args:
        file_name (str): Path to file to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (if None, uses file_name)
    
    Returns:
        bool: True if upload successful, False otherwise
    """
    # If S3 object_name not specified, use file_name
    if object_name is None:
        object_name = os.path.basename(file_name)
    
    try:
        s3_client.upload_file(file_name, bucket, object_name)
        print(f"SUCCESS: '{file_name}' uploaded to '{bucket}/{object_name}'")
        return True
    except FileNotFoundError:
        print(f"ERROR: File '{file_name}' not found")
        return False
    except ClientError as e:
        print(f"ERROR: Failed to upload file - {e}")
        return False

# Example usage (uncomment to test):
upload_file('data/hosts.csv', BUCKET_NAME, 'data/hosts.csv')

SUCCESS: 'data/hosts.csv' uploaded to 'real-learn-s3/data/hosts.csv'


True

## 7. Download File from S3

Download objects from S3 to your local filesystem.

**Key Parameters:**
- `bucket`: Source S3 bucket name
- `object_name`: Key (path/name) in S3
- `file_name`: Destination path on local filesystem

**Important Notes:**
- Creates directories automatically if they don't exist
- Overwrites existing local files without warning
- Preserves file content but not S3 metadata

**Common Pattern:**
```python
# Download backup
download_file('my-bucket', 'backups/db-2024.sql', './local-backups/db.sql')

# Download with same name
download_file('my-bucket', 'report.pdf', 'report.pdf')
```

In [14]:
def download_file(bucket, object_name, file_name):
    """
    Download a file from an S3 bucket
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name to download
        file_name (str): Local file path to save
    
    Returns:
        bool: True if download successful, False otherwise
    """
    try:
        # Create directory if it doesn't exist
        os.makedirs(os.path.dirname(file_name), exist_ok=True)
        
        s3_client.download_file(bucket, object_name, file_name)
        print(f"SUCCESS: '{bucket}/{object_name}' downloaded to '{file_name}'")
        return True
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to download file - {e}")
        return False
    except Exception as e:
        print(f"ERROR: Unexpected error - {e}")
        return False

# Example usage (uncomment to test):
download_file(BUCKET_NAME, 'raw/bookings.csv', './downloads/bookings.csv')

SUCCESS: 'real-learn-s3/raw/bookings.csv' downloaded to './downloads/bookings.csv'


True

## 8. Upload String Content to S3

Upload string/text data directly to S3 without creating a local file first.

**Use Cases:**
- Store JSON data from API responses
- Save generated text, logs, or reports
- Write configuration files
- Store processed data without local disk I/O

**Advantages:**
- No temporary files needed
- Faster for small text content
- Memory-efficient for string data
- Direct encoding control (UTF-8 by default)

**Example Scenarios:**
```python
# Save JSON data
upload_string('{"status": "ok"}', 'my-bucket', 'config.json')

# Save log entry
upload_string('2024-01-28: System started', 'logs-bucket', 'app.log')
```

In [15]:
def upload_string(content, bucket, object_name):
    """
    Upload string content directly to S3
    
    Args:
        content (str): String content to upload
        bucket (str): Bucket name
        object_name (str): S3 object name (key)
    
    Returns:
        bool: True if upload successful, False otherwise
    """
    try:
        s3_client.put_object(
            Bucket=bucket,
            Key=object_name,
            Body=content.encode('utf-8')
        )
        print(f"SUCCESS: String content uploaded to '{bucket}/{object_name}'")
        return True
    except ClientError as e:
        print(f"ERROR: Failed to upload string - {e}")
        return False

# Example usage (uncomment to test):
upload_string('Hello from S3!', BUCKET_NAME, 'test/hello.txt')

SUCCESS: String content uploaded to 'real-learn-s3/test/hello.txt'


True

## 9. Read Object Content from S3

Read S3 object content directly into memory without downloading to a file.

**Use Cases:**
- Read configuration files
- Load small datasets for processing
- Retrieve API responses or JSON data
- Stream log files for analysis

**Important Notes:**
- Best for small to medium files (avoid for GB-sized files)
- Returns content as bytes (decode to string if needed)
- Entire file loaded into memory
- More efficient than download + read for small files

**Performance Tips:**
- For large files: Use download_file() or streaming
- For text files: Decode with proper encoding (UTF-8 default)
- For binary files: Use raw bytes without decoding

In [16]:
def read_object(bucket, object_name):
    """
    Read S3 object content directly into memory
    
    Args:
        bucket (str): Bucket name
        object_name (str): S3 object name to read
    
    Returns:
        str: File content as string, or None if error
    """
    try:
        response = s3_client.get_object(Bucket=bucket, Key=object_name)
        content = response['Body'].read().decode('utf-8')
        print(f"SUCCESS: Read {len(content)} characters from '{bucket}/{object_name}'")
        return content
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'NoSuchKey':
            print(f"ERROR: Object '{object_name}' not found in bucket '{bucket}'")
        else:
            print(f"ERROR: Failed to read object - {e}")
        return None
    except Exception as e:
        print(f"ERROR: Unexpected error - {e}")
        return None

# Example usage (uncomment to test):
content = read_object(BUCKET_NAME, 'test/hello.txt')
if content:
    print(f"Content: {content}")

SUCCESS: Read 14 characters from 'real-learn-s3/test/hello.txt'
Content: Hello from S3!
