# Customer.IO Batch Operations

This notebook demonstrates how to use batch operations in Customer.IO for high-volume data processing.
Batch operations allow you to send multiple API requests in a single call, improving performance and reducing API calls.

## Setup and Imports

Import the necessary functions and initialize the Customer.IO client.

In [ ]:
# Setup and imports
import os
import json
from datetime import datetime
from utils.api_client import CustomerIOClient
from utils.people_manager import identify_user
from utils.batch_manager import (
    send_batch,
    create_batch_operations,
    validate_batch_size,
    split_oversized_batch,
    MAX_BATCH_SIZE_BYTES,
    MAX_OPERATION_SIZE_BYTES
)
from utils.exceptions import CustomerIOError, ValidationError

# Initialize client
API_KEY = os.getenv('CUSTOMERIO_API_KEY', 'your-api-key-here')
client = CustomerIOClient(api_key=API_KEY, region='us')

print("Customer.IO batch operations loaded successfully")
print(f"Max batch size: {MAX_BATCH_SIZE_BYTES:,} bytes")
print(f"Max operation size: {MAX_OPERATION_SIZE_BYTES:,} bytes")

## Understanding Batch Operations

Batch operations in Customer.IO:
- Send multiple operations in a single API request
- Support identify, track, group, page, and screen operations
- Maximum batch size: 500KB
- Maximum individual operation size: 32KB
- Improve performance for bulk data imports

## Basic Batch Operations

Create and send simple batch operations.

In [None]:
# Example: Batch identify multiple users
users_data = [
    {
        "user_id": "batch_user_001",
        "traits": {
            "email": "user1@example.com",
            "name": "Alice Smith",
            "plan": "free"
        }
    },
    {
        "user_id": "batch_user_002",
        "traits": {
            "email": "user2@example.com",
            "name": "Bob Johnson",
            "plan": "premium"
        }
    },
    {
        "user_id": "batch_user_003",
        "traits": {
            "email": "user3@example.com",
            "name": "Carol Davis",
            "plan": "enterprise"
        }
    }
]

# Create batch operations
operations = create_batch_operations("identify", users_data)
print(f"Created {len(operations)} identify operations")

# Send batch
try:
    result = send_batch(client, operations)
    print("Batch sent successfully")
except CustomerIOError as e:
    print(f"Error sending batch: {e}")

In [None]:
# Example: Batch track multiple events
events_data = [
    {
        "user_id": "batch_user_001",
        "event": "Account Created",
        "properties": {"source": "organic", "referrer": "google"}
    },
    {
        "user_id": "batch_user_001",
        "event": "Tutorial Started",
        "properties": {"tutorial_name": "Getting Started"}
    },
    {
        "user_id": "batch_user_002",
        "event": "Subscription Upgraded",
        "properties": {"old_plan": "free", "new_plan": "premium"}
    }
]

# Create and send event batch
event_operations = create_batch_operations("track", events_data)

try:
    result = send_batch(client, event_operations)
    print(f"Sent {len(event_operations)} events in batch")
except CustomerIOError as e:
    print(f"Error sending events: {e}")

## Mixed Operation Batches

Combine different types of operations in a single batch.

In [None]:
# Create mixed operations manually
mixed_operations = [
    # Identify a user
    {
        "type": "identify",
        "userId": "mixed_user_001",
        "traits": {
            "email": "mixed@example.com",
            "name": "Mixed User"
        }
    },
    # Track an event for the same user
    {
        "type": "track",
        "userId": "mixed_user_001",
        "event": "Onboarding Started",
        "properties": {
            "step": 1,
            "total_steps": 5
        }
    },
    # Add user to a group
    {
        "type": "group",
        "userId": "mixed_user_001",
        "groupId": "company_xyz",
        "traits": {
            "name": "XYZ Corporation",
            "industry": "Technology"
        }
    },
    # Track a page view
    {
        "type": "page",
        "userId": "mixed_user_001",
        "name": "Dashboard",
        "properties": {
            "url": "https://app.example.com/dashboard",
            "title": "User Dashboard"
        }
    }
]

try:
    result = send_batch(client, mixed_operations)
    print(f"Sent mixed batch with {len(mixed_operations)} operations")
except CustomerIOError as e:
    print(f"Error sending mixed batch: {e}")

## Batch Size Validation

Validate batch sizes before sending to avoid errors.

In [None]:
# Example: Check batch size before sending
def check_batch_size(operations):
    """Check and report batch size."""
    batch_json = json.dumps({"batch": operations})
    batch_size = len(batch_json.encode('utf-8'))
    
    print(f"Batch contains {len(operations)} operations")
    print(f"Total batch size: {batch_size:,} bytes")
    print(f"Size limit: {MAX_BATCH_SIZE_BYTES:,} bytes")
    print(f"Usage: {(batch_size / MAX_BATCH_SIZE_BYTES * 100):.1f}%")
    
    return batch_size <= MAX_BATCH_SIZE_BYTES

# Create a larger batch
large_batch_data = [
    {
        "user_id": f"user_{i:04d}",
        "traits": {
            "email": f"user{i}@example.com",
            "name": f"User {i}",
            "created_at": datetime.now().isoformat(),
            "metadata": {"batch_import": True, "source": "csv"}
        }
    }
    for i in range(100)
]

large_operations = create_batch_operations("identify", large_batch_data)

if check_batch_size(large_operations):
    print("\nBatch is within size limits")
else:
    print("\nBatch exceeds size limits!")

In [None]:
# Example: Validate batch using built-in function
try:
    validate_batch_size(large_operations)
    print("Batch validation passed")
except ValidationError as e:
    print(f"Batch validation failed: {e}")

## Handling Large Batches

Split oversized batches into smaller chunks automatically.

In [None]:
# Create a very large dataset that will exceed batch limits
very_large_data = []
for i in range(1000):
    very_large_data.append({
        "user_id": f"bulk_user_{i:05d}",
        "traits": {
            "email": f"bulk{i}@example.com",
            "name": f"Bulk User {i}",
            "description": "A" * 1000,  # Large field to increase size
            "metadata": {
                "import_batch": "2024-01-15",
                "source": "legacy_system",
                "migrated": True
            }
        }
    })

# Create operations
oversized_operations = create_batch_operations("identify", very_large_data)
print(f"Created {len(oversized_operations)} operations")

# Split into smaller batches
batches = split_oversized_batch(oversized_operations)
print(f"\nSplit into {len(batches)} batches:")
for i, batch in enumerate(batches):
    batch_size = len(json.dumps({"batch": batch}).encode('utf-8'))
    print(f"  Batch {i+1}: {len(batch)} operations, {batch_size:,} bytes")

In [ ]:
# Send split batches
def send_split_batches(client, batches):
    """Send multiple batches with progress tracking."""
    successful = 0
    failed = 0
    
    for i, batch in enumerate(batches):
        try:
            send_batch(client, batch)
            successful += 1
            print(f"Batch {i+1}/{len(batches)} sent successfully")
        except CustomerIOError as e:
            failed += 1
            print(f"Batch {i+1}/{len(batches)} failed: {e}")
    
    print(f"\nSummary: {successful} successful, {failed} failed")
    return successful, failed

# Send the split batches
# Uncomment to actually send (be careful with large volumes!)
# send_split_batches(client, batches[:2])  # Send only first 2 for demo

## Real-World Example: User Import

Complete workflow for importing users from an external system.

In [None]:
class UserImporter:
    """Helper class for bulk user imports."""
    
    def __init__(self, client, batch_size=100):
        self.client = client
        self.batch_size = batch_size
        self.stats = {
            "total": 0,
            "processed": 0,
            "failed": 0,
            "batches_sent": 0
        }
    
    def import_users(self, users_data):
        """Import users in optimized batches."""
        self.stats["total"] = len(users_data)
        
        # Process users in chunks
        for i in range(0, len(users_data), self.batch_size):
            chunk = users_data[i:i + self.batch_size]
            self._process_chunk(chunk)
        
        return self.stats
    
    def _process_chunk(self, chunk):
        """Process a chunk of users."""
        # Create identify operations
        identify_ops = create_batch_operations("identify", [
            {
                "user_id": user["id"],
                "traits": user["attributes"]
            }
            for user in chunk
        ])
        
        # Create initial event operations
        event_ops = create_batch_operations("track", [
            {
                "user_id": user["id"],
                "event": "User Imported",
                "properties": {
                    "import_source": "bulk_import",
                    "import_date": datetime.now().isoformat()
                }
            }
            for user in chunk
        ])
        
        # Combine operations
        all_operations = identify_ops + event_ops
        
        # Validate and potentially split
        try:
            validate_batch_size(all_operations)
            batches = [all_operations]
        except ValidationError:
            batches = split_oversized_batch(all_operations)
        
        # Send batches
        for batch in batches:
            try:
                send_batch(self.client, batch)
                self.stats["processed"] += len(batch) // 2  # Divide by 2 since we have identify + track
                self.stats["batches_sent"] += 1
            except CustomerIOError as e:
                self.stats["failed"] += len(batch) // 2
                print(f"Batch failed: {e}")

In [None]:
# Example usage of UserImporter
sample_import_data = [
    {
        "id": f"import_{i:04d}",
        "attributes": {
            "email": f"import{i}@example.com",
            "name": f"Imported User {i}",
            "signup_date": "2024-01-01",
            "plan": "free" if i % 3 == 0 else "premium",
            "source": "salesforce"
        }
    }
    for i in range(250)  # Import 250 users
]

importer = UserImporter(client, batch_size=50)
stats = importer.import_users(sample_import_data)

print("Import Summary:")
print(f"  Total users: {stats['total']}")
print(f"  Processed: {stats['processed']}")
print(f"  Failed: {stats['failed']}")
print(f"  Batches sent: {stats['batches_sent']}")

## Real-World Example: Event Migration

Migrate historical events from another system.

In [ ]:
def migrate_historical_events(client, events_data, events_per_batch=200):
    """Migrate historical events with proper timestamps."""
    
    # Sort events by timestamp to maintain order
    sorted_events = sorted(events_data, key=lambda x: x["timestamp"])
    
    total_batches = (len(sorted_events) + events_per_batch - 1) // events_per_batch
    print(f"Migrating {len(sorted_events)} events in {total_batches} batches")
    
    for batch_num in range(total_batches):
        start_idx = batch_num * events_per_batch
        end_idx = min((batch_num + 1) * events_per_batch, len(sorted_events))
        batch_events = sorted_events[start_idx:end_idx]
        
        # Create track operations with timestamps
        operations = [
            {
                "type": "track",
                "userId": event["user_id"],
                "event": event["event_name"],
                "properties": event["properties"],
                "timestamp": event["timestamp"]
            }
            for event in batch_events
        ]
        
        try:
            send_batch(client, operations)
            print(f"Batch {batch_num + 1}/{total_batches} migrated")
        except CustomerIOError as e:
            print(f"Batch {batch_num + 1} failed: {e}")

# Example historical events
historical_events = [
    {
        "user_id": "hist_user_001",
        "event_name": "Account Created",
        "timestamp": "2023-01-15T10:00:00Z",
        "properties": {"source": "organic"}
    },
    {
        "user_id": "hist_user_001",
        "event_name": "Feature Used",
        "timestamp": "2023-01-16T14:30:00Z",
        "properties": {"feature": "dashboard"}
    },
    {
        "user_id": "hist_user_002",
        "event_name": "Subscription Started",
        "timestamp": "2023-02-01T09:00:00Z",
        "properties": {"plan": "premium", "period": "monthly"}
    }
]

# Migrate the events
# migrate_historical_events(client, historical_events, events_per_batch=10)

## Batch Operations with Context

Add context information to all operations in a batch.

In [None]:
# Example: Add context to batch operations
context_operations = [
    {
        "type": "identify",
        "userId": "context_user_001",
        "traits": {"email": "context1@example.com"}
    },
    {
        "type": "track",
        "userId": "context_user_001",
        "event": "Button Clicked",
        "properties": {"button_name": "signup"}
    }
]

# Add context about the import
batch_context = {
    "app": {
        "name": "Data Importer",
        "version": "2.0.0"
    },
    "library": {
        "name": "customer-io-python",
        "version": "1.0.0"
    },
    "import": {
        "source": "legacy_database",
        "import_id": "import_20240115_001",
        "imported_at": datetime.now().isoformat()
    }
}

try:
    result = send_batch(client, context_operations, context=batch_context)
    print("Batch with context sent successfully")
except CustomerIOError as e:
    print(f"Error: {e}")

## Error Handling and Retry Logic

Implement robust error handling for batch operations.

In [None]:
import time

class BatchProcessor:
    """Batch processor with retry logic."""
    
    def __init__(self, client, max_retries=3, retry_delay=1):
        self.client = client
        self.max_retries = max_retries
        self.retry_delay = retry_delay
    
    def send_with_retry(self, operations):
        """Send batch with automatic retry on failure."""
        for attempt in range(self.max_retries):
            try:
                result = send_batch(self.client, operations)
                print(f"Batch sent successfully on attempt {attempt + 1}")
                return result
            except CustomerIOError as e:
                if attempt < self.max_retries - 1:
                    wait_time = self.retry_delay * (2 ** attempt)  # Exponential backoff
                    print(f"Attempt {attempt + 1} failed: {e}")
                    print(f"Retrying in {wait_time} seconds...")
                    time.sleep(wait_time)
                else:
                    print(f"All {self.max_retries} attempts failed")
                    raise
    
    def process_large_dataset(self, operations_list, batch_size=100):
        """Process large dataset with batching and retry."""
        results = {
            "successful_batches": 0,
            "failed_batches": 0,
            "total_operations": len(operations_list)
        }
        
        # Process in batches
        for i in range(0, len(operations_list), batch_size):
            batch = operations_list[i:i + batch_size]
            
            try:
                self.send_with_retry(batch)
                results["successful_batches"] += 1
            except CustomerIOError as e:
                results["failed_batches"] += 1
                print(f"Failed to process batch after retries: {e}")
        
        return results

# Example usage
processor = BatchProcessor(client, max_retries=3, retry_delay=0.5)

# Create test operations
test_operations = [
    {
        "type": "identify",
        "userId": f"retry_user_{i:03d}",
        "traits": {"test": True}
    }
    for i in range(10)
]

# Process with retry
# processor.send_with_retry(test_operations)

## Best Practices

### Batch Size Optimization
- Keep batches under 500KB total size
- Aim for 100-500 operations per batch for optimal performance
- Monitor individual operation sizes (32KB limit)

### Data Preparation
- Validate data before creating batches
- Sort events by timestamp when order matters
- Remove duplicate operations before batching

### Error Handling
- Implement retry logic with exponential backoff
- Log failed batches for manual review
- Consider partial batch processing on failures

### Performance
- Use batch operations for imports > 10 records
- Process large datasets in chunks
- Monitor API rate limits (3000 requests/3 seconds)

### Monitoring
- Track batch success/failure rates
- Monitor average batch sizes
- Set up alerts for failed imports

## Next Steps

Now that you understand batch operations, explore:

- **01_people_management.ipynb** - Batch user identification
- **02_event_tracking.ipynb** - Batch event tracking
- **03_objects_and_relationships.ipynb** - Batch object operations

For production use:
- Implement comprehensive error handling
- Set up monitoring for batch job performance
- Create data validation pipelines before import