# S3 File Versioning

**Duration:** 30 minutes  
**Level:** Intermediate

**Note:** This notebook requires S3 access. Examples use mock S3 backend for demonstration.

## What You'll Learn

- S3 versioning capabilities
- Accessing previous versions
- Version history and metadata
- Comparing versions
- Compacting redundant versions
- Avoiding duplicate versions

## What is Versioning?

S3 versioning automatically preserves all versions of files. Every time you modify a file:
- Old version is kept
- New version is created
- You can access any past version
- You can rollback changes

**Important:** Versioning must be enabled on the S3 bucket first!

Let's explore! ⏰

In [None]:
from genro_storage import StorageManager

# Note: This example uses memory backend for demonstration
# In real usage, configure with S3:
# storage.configure([{
#     'name': 's3',
#     'type': 's3',
#     'bucket': 'my-bucket',
#     'region': 'us-east-1'
# }])

storage = StorageManager()
storage.configure([{'name': 'mem', 'type': 'memory'}])

print("✓ Storage ready")
print("\nNote: Memory backend doesn't support versioning.")
print("This notebook shows the API; S3 examples require real S3 access.")

## 1. Checking Versioning Capability

First, check if backend supports versioning:

In [None]:
# Check capabilities
node = storage.node('mem:test.txt')
caps = node.capabilities

print("Backend capabilities:")
print(f"  Versioning: {caps.versioning}")
print(f"  Version listing: {caps.version_listing}")
print(f"  Version access: {caps.version_access}")

# For S3 with versioning enabled:
# s3_node = storage.node('s3:document.pdf')
# if s3_node.capabilities.versioning:
#     print("✓ Versioning available!")

## 2. Version Count

Check how many versions exist:

In [None]:
# Memory backend returns 0
node = storage.node('mem:file.txt')
node.write_text('Version 1')

print(f"Version count: {node.version_count}")

# For S3:
# s3_file = storage.node('s3:config.json')
# s3_file.write_text('{"v": 1}')
# s3_file.write_text('{"v": 2}')
# s3_file.write_text('{"v": 3}')
# print(f"S3 versions: {s3_file.version_count}")  # Would show 3+

## 3. Listing Versions (S3 Only)

Get metadata for all versions:

In [None]:
# Example S3 version listing:
example_versions = [
    {
        'version_id': 'abc123...',
        'last_modified': '2024-01-15T10:30:00Z',
        'size': 1024,
        'etag': 'a1b2c3...',
        'is_latest': False
    },
    {
        'version_id': 'def456...',
        'last_modified': '2024-01-15T14:20:00Z',
        'size': 2048,
        'etag': 'd4e5f6...',
        'is_latest': True
    }
]

print("Example version list:")
for i, v in enumerate(example_versions):
    marker = "⭐" if v['is_latest'] else "  "
    print(f"{marker} Version {-i-1}: {v['last_modified']} ({v['size']} bytes)")

# Real S3 usage:
# for v in s3_file.versions:
#     print(f"Version: {v['version_id']}, Modified: {v['last_modified']}")

## 4. Accessing Previous Versions

Read old versions using negative indexing:

In [None]:
# Example S3 version access:
print("Example version access:")
print("\n# Read current version (latest)")
print("with s3_file.open() as f:")
print("    current = f.read()")

print("\n# Read previous version")
print("with s3_file.open(version=-2) as f:")
print("    previous = f.read()")

print("\n# Read oldest version")
print("with s3_file.open(version=0) as f:")
print("    original = f.read()")

print("\n# Read by version ID")
print("with s3_file.open(version='abc123...') as f:")
print("    specific = f.read()")

## 5. Time Travel with as_of

Access version from specific date:

In [None]:
from datetime import datetime, timedelta

# Example time travel:
print("Example time-based access:")
print("\n# How was the file yesterday?")
print("yesterday = datetime.now() - timedelta(days=1)")
print("with s3_file.open(as_of=yesterday) as f:")
print("    old_content = f.read()")

print("\n# Version from specific date")
print("specific_date = datetime(2024, 1, 15, 10, 30)")
print("with s3_file.open(as_of=specific_date) as f:")
print("    content = f.read()")

## 6. Comparing Versions with diffnode

Compare different versions:

In [None]:
# Create example versions
v1 = storage.node('mem:config_v1.txt')
v1.write_text('timeout: 30\nretries: 3\n')

v2 = storage.node('mem:config_v2.txt')
v2.write_text('timeout: 60\nretries: 5\n')

# Compare
diff = storage.diffnode(v1, v2)
print("Changes between versions:")
print(diff.read_text())

# In S3, you would create version-specific nodes:
# current = storage.node('s3:config.txt')
# previous = storage.node('s3:config.txt', version=-2)
# diff = storage.diffnode(previous, current)
# print(diff.read_text())

## 7. Avoiding Duplicate Versions

Use `skip_if_unchanged` to prevent unnecessary versions:

In [None]:
# Example usage:
print("Example: Avoiding duplicate versions")
print("\n# Try writing same content")
print("s3_file.write_text('config data', skip_if_unchanged=True)")
print("# Returns False - no new version created")

print("\n# Write different content")
print("s3_file.write_text('new config', skip_if_unchanged=True)")
print("# Returns True - new version created")

print("\n✓ Saves storage costs by avoiding duplicate versions!")

## 8. Compacting Versions

Remove consecutive duplicate versions:

In [None]:
# Example compaction:
print("Example: Version compaction")
print("\nBefore compaction:")
print("  v1: content A (etag: xxx)")
print("  v2: content A (etag: xxx)  ← duplicate")
print("  v3: content B (etag: yyy)")
print("  v4: content B (etag: yyy)  ← duplicate")
print("  v5: content A (etag: xxx)  ← kept (not consecutive)")

print("\n# Dry run to see what would be removed")
print("count = s3_file.compact_versions(dry_run=True)")
print("print(f'Would remove {count} versions')")

print("\n# Actually compact")
print("removed = s3_file.compact_versions()")
print("print(f'Removed {removed} duplicate versions')")

print("\nAfter compaction:")
print("  v1: content A (etag: xxx)")
print("  v3: content B (etag: yyy)")
print("  v5: content A (etag: xxx)")
print("\n✓ Space saved, history preserved!")

## 9. Practical: Config Version Tracking

Complete example for tracking config changes:

In [None]:
class ConfigVersionTracker:
    """Track configuration changes with versions"""
    
    def __init__(self, storage, config_path):
        self.storage = storage
        self.config_node = storage.node(config_path)
    
    def update_config(self, new_config, description=""):
        """Update config only if changed"""
        changed = self.config_node.write_text(
            new_config,
            skip_if_unchanged=True
        )
        
        if changed:
            print(f"✓ Config updated: {description}")
            return True
        else:
            print(f"⊘ Config unchanged, no new version")
            return False
    
    def get_history(self):
        """Get version history"""
        if not self.config_node.capabilities.versioning:
            return []
        
        return self.config_node.versions
    
    def rollback(self, version=-2):
        """Rollback to previous version"""
        if not self.config_node.capabilities.version_access:
            raise ValueError("Backend doesn't support version access")
        
        # Read old version
        with self.config_node.open(version=version) as f:
            old_content = f.read()
        
        # Write as new version
        self.config_node.write_text(old_content)
        print(f"✓ Rolled back to version {version}")

# Example usage (would work with S3):
# tracker = ConfigVersionTracker(storage, 's3:app/config.json')
# tracker.update_config('{"timeout": 30}', "Initial config")
# tracker.update_config('{"timeout": 60}', "Increase timeout")
# tracker.rollback()  # Undo last change

print("ConfigVersionTracker class defined")
print("✓ Ready for S3 version tracking")

## 10. Practical: Automated Backup with Cleanup

Backup with automatic old version cleanup:

In [None]:
def backup_with_cleanup(source_node, backup_node):
    """
    Backup file and clean up duplicate versions.
    Only for S3 with versioning enabled.
    """
    # Copy to backup
    source_node.copy(backup_node)
    print(f"✓ Backed up: {source_node.basename}")
    
    # Check if versioning available
    if not backup_node.capabilities.versioning:
        print("  ⚠ Versioning not available, skipping cleanup")
        return
    
    # Compact versions
    removed = backup_node.compact_versions()
    if removed > 0:
        print(f"  ✓ Removed {removed} duplicate versions")
    else:
        print(f"  ✓ No duplicates to remove")

# Example:
# important = storage.node('local:important.txt')
# backup = storage.node('s3:backups/important.txt')
# backup_with_cleanup(important, backup)

print("backup_with_cleanup function defined")
print("✓ Ready for automated backups with cleanup")

## 11. Try It Yourself! 🎯

**Exercise 1:** Create a function to compare current with N days ago:

In [None]:
def compare_with_past(node, days_ago=7):
    """
    Compare current version with version from N days ago.
    Return the diff as string.
    """
    # Your code here
    pass

**Exercise 2:** Implement version statistics:

In [None]:
def version_statistics(node):
    """
    Return statistics about versions:
    - Total versions
    - Total size
    - Average size
    - Oldest/newest timestamps
    """
    # Your code here
    pass

**Exercise 3:** Auto-cleanup old versions:

In [None]:
def cleanup_old_versions(node, keep_count=10):
    """
    Keep only the N most recent versions.
    Delete older versions.
    """
    # Your code here
    pass

## Summary

You've learned S3 versioning:

- ✓ Checking versioning capabilities
- ✓ Accessing previous versions (index, ID, date)
- ✓ Listing version history
- ✓ Comparing versions with diffnode
- ✓ Avoiding duplicate versions
- ✓ Compacting redundant versions
- ✓ Practical version tracking patterns

## Key Concepts

- **Versioning** must be enabled on S3 bucket
- **Every write** creates a new version
- **Negative indexing**: -1 (latest), -2 (previous), etc.
- **Version IDs** are unique identifiers
- **as_of** allows time-based access
- **Compaction** removes consecutive duplicates
- **skip_if_unchanged** prevents duplicates

## Best Practices

✅ **Do:**
- Use `skip_if_unchanged` for frequent writes
- Compact versions periodically
- Set lifecycle policies on bucket
- Monitor version count and costs

❌ **Don't:**
- Enable versioning for temporary/cache data
- Forget about storage costs
- Modify old versions (they're read-only)

## What's Next?

Continue to:

- **[07_advanced_features.ipynb](07_advanced_features.ipynb)** - Advanced integrations
- **[08_real_world_examples.ipynb](08_real_world_examples.ipynb)** - Complete use cases

Happy versioning! 🕐