# Advanced Features and Integrations

**Duration:** 35 minutes  
**Level:** Advanced

Explore advanced features for tool integration and web serving.

## What You'll Learn

- `local_path()` - Work with external tools
- `call()` - Execute commands with automatic file handling
- `serve()` - Serve files through web frameworks
- Cloud metadata
- Presigned URLs
- Base64 encoding
- MIME type detection

Let's dive into advanced territory! 🚀

In [None]:
from genro_storage import StorageManager
import tempfile
import os

storage = StorageManager()
temp_dir = tempfile.mkdtemp()

storage.configure([
    {'name': 'mem', 'type': 'memory'},
    {'name': 'local', 'type': 'local', 'path': temp_dir}
])

print("✓ Storage ready")

## 1. local_path() Context Manager

Get a local filesystem path for external tools:

In [None]:
# Create a file in memory
mem_file = storage.node('mem:data.txt')
mem_file.write_text('Hello from memory storage')

# Get local path (downloads if needed)
with mem_file.local_path(mode='r') as local_path:
    print(f"Temporary local path: {local_path}")
    print(f"File exists: {os.path.exists(local_path)}")
    
    # Use with any tool that needs a file path
    with open(local_path, 'r') as f:
        content = f.read()
        print(f"Content: {content}")

print("\n✓ Temp file automatically cleaned up after context")

## 2. local_path() Write Mode

Create a local file and auto-upload:

In [None]:
# Create a new file
output = storage.node('mem:output.txt')

with output.local_path(mode='w') as local_path:
    print(f"Writing to: {local_path}")
    
    # Write using standard Python
    with open(local_path, 'w') as f:
        f.write('Generated content\n')
        f.write('More lines...\n')

# File is automatically uploaded
print(f"\n✓ File uploaded to storage")
print(f"Content: {output.read_text()}")

## 3. local_path() Read-Write Mode

Download, modify, and re-upload:

In [None]:
# Create initial file
doc = storage.node('mem:document.txt')
doc.write_text('Line 1\nLine 2\n')

# Modify in place
with doc.local_path(mode='rw') as local_path:
    # Read current content
    with open(local_path, 'r') as f:
        lines = f.readlines()
    
    # Modify
    lines.append('Line 3\n')
    
    # Write back
    with open(local_path, 'w') as f:
        f.writelines(lines)

print("Modified content:")
print(doc.read_text())

## 4. call() Method - Simple Command

Execute external commands with automatic file handling:

In [None]:
# Create input file
input_file = storage.node('local:input.txt')
input_file.write_text('apple\nbanana\ncherry\n')

# Create output location
output_file = storage.node('local:sorted.txt')

# Sort using Unix sort command
try:
    input_file.call(
        'sort {input} > {output}',
        input=input_file,
        output=output_file,
        shell=True
    )
    
    print("Sorted content:")
    print(output_file.read_text())
except Exception as e:
    print(f"Note: Sort command may not be available: {e}")

## 5. call() with Multiple Files

Process multiple inputs:

In [None]:
# Create two files
file1 = storage.node('local:file1.txt')
file1.write_text('Content from file 1\n')

file2 = storage.node('local:file2.txt')
file2.write_text('Content from file 2\n')

merged = storage.node('local:merged.txt')

# Concatenate using cat
try:
    file1.call(
        'cat {f1} {f2} > {out}',
        f1=file1,
        f2=file2,
        out=merged,
        shell=True
    )
    
    print("Merged:")
    print(merged.read_text())
except Exception as e:
    print(f"Note: cat command may not be available: {e}")

## 6. MIME Type Detection

Automatic MIME type based on extension:

In [None]:
# Different file types
files = [
    ('image.jpg', 'JPG data'),
    ('document.pdf', 'PDF content'),
    ('video.mp4', 'Video data'),
    ('data.json', '{"key": "value"}'),
    ('style.css', 'body { color: red; }'),
    ('script.js', 'console.log("hi");'),
    ('page.html', '<h1>Hello</h1>'),
    ('archive.zip', 'ZIP data'),
]

print("MIME types:")
for filename, content in files:
    node = storage.node(f'mem:{filename}')
    node.write_text(content)
    print(f"  {filename:20s} -> {node.mimetype}")

## 7. Base64 Encoding

Encode files as base64 or data URIs:

In [None]:
# Create small image-like file
img = storage.node('mem:icon.png')
img.write_bytes(b'\x89PNG\r\n\x1a\n' + b'fake image data')

# Get as data URI (for HTML)
data_uri = img.to_base64(data_uri=True)
print("Data URI (for <img src=...>):")
print(data_uri[:80] + "...")

# Get just base64
b64 = img.to_base64(data_uri=False)
print(f"\nBase64 only:")
print(b64[:60] + "...")

## 8. MD5 Hash

Content-based file identification:

In [None]:
# Create files
file_a = storage.node('mem:a.txt')
file_a.write_text('Same content')

file_b = storage.node('mem:b.txt')
file_b.write_text('Same content')

file_c = storage.node('mem:c.txt')
file_c.write_text('Different content')

print("MD5 hashes:")
print(f"  file_a: {file_a.md5hash}")
print(f"  file_b: {file_b.md5hash}")
print(f"  file_c: {file_c.md5hash}")

print(f"\na and b same: {file_a.md5hash == file_b.md5hash}")
print(f"a and c same: {file_a.md5hash == file_c.md5hash}")

## 9. Cloud Metadata (S3/GCS/Azure)

Store custom metadata with files:

In [None]:
# Example S3 metadata usage:
print("Example: S3 metadata")
print("\n# Set metadata")
print("s3_file = storage.node('s3:document.pdf')")
print("s3_file.write_bytes(pdf_data)")
print("s3_file.set_metadata({")
print("    'author': 'John Doe',")
print("    'department': 'Engineering',")
print("    'classification': 'internal'")
print("})")

print("\n# Read metadata")
print("metadata = s3_file.get_metadata()")
print("print(metadata['author'])  # 'John Doe'")

# Memory backend doesn't support metadata
mem_file = storage.node('mem:test.txt')
print(f"\nMemory backend metadata support: {mem_file.capabilities.metadata}")

## 10. Presigned URLs (S3/GCS)

Generate temporary download links:

In [None]:
# Example S3 presigned URL:
print("Example: S3 presigned URL")
print("\n# Generate 1-hour link")
print("s3_file = storage.node('s3:private/document.pdf')")
print("url = s3_file.url(expires_in=3600)  # 1 hour")
print("print(url)")
print("# https://bucket.s3.amazonaws.com/private/document.pdf?...")

print("\n# Share with user")
print("send_email(user, f'Download: {url}')")

# Memory backend doesn't support URLs
mem_file = storage.node('mem:test.txt')
print(f"\nMemory backend URL support: {mem_file.capabilities.presigned_urls}")

## 11. serve() for Web Frameworks

Stream files through WSGI applications:

In [None]:
# Example Flask integration:
print("Example: Flask file serving")
print("""
from flask import Flask
app = Flask(__name__)

@app.route('/files/<path:filename>')
def serve_file(filename):
    node = storage.node(f's3:files/{filename}')
    return node.serve(
        mimetype='auto',          # Auto-detect MIME type
        as_attachment=False,      # Display inline
        cache_timeout=3600,       # Cache for 1 hour
        add_etags=True,          # Enable ETag caching
        conditional=True          # Support If-Modified-Since
    )

@app.route('/download/<path:filename>')
def download_file(filename):
    node = storage.node(f's3:files/{filename}')
    return node.serve(
        as_attachment=True,
        attachment_filename='document.pdf'
    )
""")

print("\n✓ Efficient streaming with ETag support")

## 12. fill_from_url() - Download from Web

Download content from URL and save:

In [None]:
# Example URL download:
print("Example: Download from URL")
print("""
# Download and save
remote_file = storage.node('s3:downloads/readme.md')
remote_file.fill_from_url(
    'https://raw.githubusercontent.com/genropy/genro-storage/main/README.md',
    timeout=30
)

print(f"Downloaded: {remote_file.size} bytes")
""")

print("\n✓ Useful for fetching remote resources")

## 13. Practical: Image Processing Pipeline

Complete example with external tool:

In [None]:
def process_image(source_node, dest_node, width=200):
    """
    Resize image using ImageMagick (if available).
    This is a template - actual implementation needs ImageMagick.
    """
    print(f"Processing: {source_node.basename}")
    
    # Check MIME type
    if not source_node.mimetype.startswith('image/'):
        raise ValueError(f"Not an image: {source_node.mimetype}")
    
    # Option 1: Using call()
    # source_node.call(
    #     'convert {input} -resize {width}x {output}',
    #     input=source_node,
    #     output=dest_node,
    #     width=width
    # )
    
    # Option 2: Using local_path()
    # with source_node.local_path('r') as src_path:
    #     with dest_node.local_path('w') as dst_path:
    #         subprocess.run([
    #             'convert', src_path,
    #             '-resize', f'{width}x',
    #             dst_path
    #         ])
    
    print(f"✓ Would resize to {width}px width")
    print(f"✓ Output: {dest_node.fullpath}")

# Example usage:
# original = storage.node('s3:photos/vacation.jpg')
# thumbnail = storage.node('s3:thumbnails/vacation_thumb.jpg')
# process_image(original, thumbnail, width=200)

print("Image processing pipeline defined")

## 14. Practical: Document Converter

Convert between formats:

In [None]:
def convert_markdown_to_pdf(md_node, pdf_node):
    """
    Convert Markdown to PDF using pandoc (if available).
    """
    print(f"Converting: {md_node.basename} -> {pdf_node.basename}")
    
    # Using call() with pandoc:
    # md_node.call(
    #     'pandoc {input} -o {output} --pdf-engine=xelatex',
    #     input=md_node,
    #     output=pdf_node,
    #     timeout=60
    # )
    
    print("✓ Would convert MD -> PDF")
    print(f"Input: {md_node.fullpath}")
    print(f"Output: {pdf_node.fullpath}")

# Example:
# markdown = storage.node('local:README.md')
# pdf = storage.node('s3:docs/README.pdf')
# convert_markdown_to_pdf(markdown, pdf)

print("Document converter defined")

## 15. Try It Yourself! 🎯

**Exercise 1:** Create a thumbnail generator:

In [None]:
def generate_thumbnails(image_dir, thumb_dir, sizes=[100, 200, 400]):
    """
    Generate multiple thumbnail sizes for all images.
    """
    # Your code here
    pass

**Exercise 2:** Video processor with metadata:

In [None]:
def process_video(video_node, output_node):
    """
    Process video and set metadata:
    - Convert to H.264
    - Extract duration
    - Set metadata with codec, duration, size
    """
    # Your code here
    pass

**Exercise 3:** Smart cache with expiry:

In [None]:
def cached_download(url, cache_dir, max_age_hours=24):
    """
    Download URL and cache locally.
    Reuse cache if less than max_age_hours old.
    Return the cached node.
    """
    # Your code here
    pass

## 16. Cleanup

In [None]:
import shutil

if os.path.exists(temp_dir):
    shutil.rmtree(temp_dir)

print("✓ Cleanup complete")

## Summary

You've mastered advanced features:

- ✓ `local_path()` for external tools
- ✓ `call()` for command execution
- ✓ `serve()` for web frameworks
- ✓ MIME type detection
- ✓ Base64 encoding
- ✓ MD5 hashing
- ✓ Cloud metadata (S3/GCS/Azure)
- ✓ Presigned URLs
- ✓ URL downloads

## Key Methods

**Tool Integration:**
- `local_path(mode='r'|'w'|'rw')` - Get filesystem path
- `call(command, **kwargs)` - Execute external command

**Web Integration:**
- `serve(**kwargs)` - Stream via WSGI
- `url(expires_in=3600)` - Presigned URL
- `fill_from_url(url)` - Download from web

**Utilities:**
- `mimetype` - MIME type detection
- `md5hash` - Content hash
- `to_base64()` - Base64 encoding
- `get_metadata()` / `set_metadata()` - Cloud metadata

## Best Practices

✅ **Do:**
- Use `call()` for simple command substitution
- Use `local_path()` for complex workflows
- Set timeouts on external commands
- Handle errors from external tools
- Use presigned URLs for large files

❌ **Don't:**
- Use `shell=True` with user input (security!)
- Load large files into memory (use streaming)
- Forget to set cache headers for web serving

## What's Next?

Continue to:

- **[08_real_world_examples.ipynb](08_real_world_examples.ipynb)** - Complete real-world use cases

You're almost done! 🎉