# Workflow Tiers: STANDARD, CHUNKED, STREAMING

> Understand the different processing tiers and when each is used

**20 minutes** | **Level: Intermediate**

---

## What You'll Learn

By the end of this notebook, you will be able to:

- Understand the four workflow tiers: STANDARD, CHUNKED, STREAMING, STREAMING_CHECKPOINT
- Know the automatic tier selection thresholds based on dataset size
- Override automatic selection with `WorkflowConfig(tier=...)`
- Choose the appropriate tier for your dataset and memory constraints

---

## Learning Path

**You are here:** Workflow System > **Workflow Tiers**

```
fit() Quickstart --> [You are here: Workflow Tiers] --> Optimization Goals --> Presets
```

**Recommended flow:**
- **Previous:** [01_fit_quickstart.ipynb](01_fit_quickstart.ipynb) - Basic fit() usage
- **Next:** [03_optimization_goals.ipynb](03_optimization_goals.ipynb) - FAST, ROBUST, QUALITY goals

---

## Before You Begin

**Required knowledge:**
- Basic familiarity with `fit()` or `curve_fit()`
- Understanding of dataset size and memory constraints

**Required software:**
- NLSQ >= 0.3.4
- Python >= 3.12

---

## Why This Matters

Different dataset sizes require different processing strategies:

- **Small datasets** (< 10K points): Fit entirely in memory, no special handling needed
- **Medium datasets** (10K - 10M points): May need chunking to manage memory
- **Large datasets** (10M - 100M points): Require streaming to avoid memory overflow
- **Massive datasets** (> 100M points): Need checkpointing for fault tolerance

NLSQ's workflow system automatically selects the best tier, but understanding the tiers
helps you make informed decisions when memory is constrained.

---

## Quick Start (30 seconds)

See workflow tiers in action:

In [None]:
# Configure matplotlib for inline plotting (MUST come before imports)
%matplotlib inline

In [None]:
from nlsq import WorkflowConfig, WorkflowTier

# View all available tiers
for tier in WorkflowTier:
    print(f"  {tier.name}: {tier}")

---

## Setup

In [None]:
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np

from nlsq import WorkflowConfig, WorkflowTier, OptimizationGoal, fit
from nlsq.workflow import (
    DatasetSizeTier,
    MemoryTier,
    auto_select_workflow,
)
from nlsq.large_dataset import MemoryEstimator, get_memory_tier

# Set random seed for reproducibility
np.random.seed(42)

---

## Tutorial Content

### Section 1: The Four Workflow Tiers

NLSQ provides four workflow tiers, each optimized for different dataset sizes and memory constraints.

In [None]:
# Display tier information
tier_info = {
    WorkflowTier.STANDARD: {
        "description": "Standard curve_fit() for small datasets",
        "dataset_size": "< 10K points",
        "memory": "O(N) - loads all data into memory",
        "use_case": "Most common use case, full precision",
    },
    WorkflowTier.CHUNKED: {
        "description": "LargeDatasetFitter with automatic chunking",
        "dataset_size": "10K - 10M points",
        "memory": "O(chunk_size) - processes data in chunks",
        "use_case": "Medium-to-large datasets, memory-constrained",
    },
    WorkflowTier.STREAMING: {
        "description": "AdaptiveHybridStreamingOptimizer for huge datasets",
        "dataset_size": "10M - 100M points",
        "memory": "O(batch_size) - mini-batch gradient descent",
        "use_case": "Large datasets with limited memory",
    },
    WorkflowTier.STREAMING_CHECKPOINT: {
        "description": "Streaming with automatic checkpointing",
        "dataset_size": "> 100M points",
        "memory": "O(batch_size) + checkpoint storage",
        "use_case": "Massive datasets, fault tolerance required",
    },
}

print("Workflow Tiers Overview")
print("=" * 70)
for tier, info in tier_info.items():
    print(f"\n{tier.name}:")
    print(f"  Description: {info['description']}")
    print(f"  Dataset Size: {info['dataset_size']}")
    print(f"  Memory: {info['memory']}")
    print(f"  Use Case: {info['use_case']}")

### Section 2: Automatic Tier Selection

The `WorkflowSelector` automatically chooses the appropriate tier based on:
1. Dataset size (number of points)
2. Available memory (CPU + GPU)
3. Optimization goal (FAST, ROBUST, QUALITY, MEMORY_EFFICIENT)

In [None]:
# Dataset size thresholds for tier selection
print("Dataset Size Tiers and Thresholds")
print("=" * 50)

for size_tier in DatasetSizeTier:
    max_pts = size_tier.max_points
    tol = size_tier.tolerance
    if max_pts == float("inf"):
        print(f"{size_tier.name:12s}: > 100M points, tolerance = {tol:.0e}")
    else:
        print(f"{size_tier.name:12s}: < {max_pts/1e6:.0f}M points, tolerance = {tol:.0e}")

In [None]:
# Memory tier thresholds
print("\nMemory Tiers")
print("=" * 50)

for mem_tier in MemoryTier:
    print(f"{mem_tier.name:10s}: {mem_tier.description}")

# Check current system memory
available_memory = MemoryEstimator.get_available_memory_gb()
current_tier = get_memory_tier(available_memory)
print(f"\nCurrent system: {available_memory:.1f} GB available -> {current_tier.name}")

In [None]:
# Demonstrate automatic tier selection for different dataset sizes
test_sizes = [1_000, 50_000, 500_000, 5_000_000, 50_000_000, 500_000_000]
n_params = 5

print("Automatic Tier Selection (based on current memory)")
print("=" * 70)
print(f"Available memory: {available_memory:.1f} GB")
print()

for n_points in test_sizes:
    config = auto_select_workflow(n_points, n_params)
    config_type = type(config).__name__
    
    # Determine tier from config type
    if "GlobalOptimization" in config_type:
        tier = "STANDARD (with multi-start)"
    elif "LDMemory" in config_type:
        tier = "STANDARD or CHUNKED"
    elif "HybridStreaming" in config_type:
        tier = "STREAMING or STREAMING_CHECKPOINT"
    else:
        tier = config_type
    
    if n_points >= 1_000_000:
        size_str = f"{n_points/1_000_000:.0f}M"
    elif n_points >= 1_000:
        size_str = f"{n_points/1_000:.0f}K"
    else:
        size_str = str(n_points)
    
    print(f"{size_str:>8s} points -> {tier}")

### Section 3: Tier Selection Decision Tree

The following diagram shows how tiers are selected based on dataset size and memory.

In [None]:
# Create tier selection decision tree visualization
fig, ax = plt.subplots(figsize=(14, 10))
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)
ax.axis('off')

# Title
ax.text(5, 9.5, "Workflow Tier Selection Decision Tree", ha='center', fontsize=16, fontweight='bold')

# Root node
ax.add_patch(plt.Rectangle((3.5, 8.2), 3, 0.8, fill=True, facecolor='lightblue', edgecolor='black'))
ax.text(5, 8.6, "Dataset Size?", ha='center', va='center', fontsize=11)

# Level 1 branches
# Small
ax.plot([4.2, 2, 2], [8.2, 7.5, 7.0], 'k-', linewidth=1)
ax.text(2.5, 7.7, "< 10K", fontsize=9)
ax.add_patch(plt.Rectangle((0.5, 6.2), 3, 0.8, fill=True, facecolor='lightgreen', edgecolor='black'))
ax.text(2, 6.6, "STANDARD", ha='center', va='center', fontsize=10, fontweight='bold')

# Medium
ax.plot([5, 5], [8.2, 7.0], 'k-', linewidth=1)
ax.text(5.3, 7.5, "10K - 10M", fontsize=9)
ax.add_patch(plt.Rectangle((3.5, 6.2), 3, 0.8, fill=True, facecolor='lightyellow', edgecolor='black'))
ax.text(5, 6.6, "Memory Check", ha='center', va='center', fontsize=10)

# Large
ax.plot([5.8, 8, 8], [8.2, 7.5, 7.0], 'k-', linewidth=1)
ax.text(7.2, 7.7, "> 10M", fontsize=9)
ax.add_patch(plt.Rectangle((6.5, 6.2), 3, 0.8, fill=True, facecolor='lightyellow', edgecolor='black'))
ax.text(8, 6.6, "Memory Check", ha='center', va='center', fontsize=10)

# Level 2 - Medium dataset branches
ax.plot([4.2, 3, 3], [6.2, 5.5, 5.0], 'k-', linewidth=1)
ax.text(3.3, 5.6, "> 16GB", fontsize=9)
ax.add_patch(plt.Rectangle((1.5, 4.2), 3, 0.8, fill=True, facecolor='lightgreen', edgecolor='black'))
ax.text(3, 4.6, "STANDARD", ha='center', va='center', fontsize=10, fontweight='bold')

ax.plot([5.8, 7, 7], [6.2, 5.5, 5.0], 'k-', linewidth=1)
ax.text(6.5, 5.6, "< 16GB", fontsize=9)
ax.add_patch(plt.Rectangle((5.5, 4.2), 3, 0.8, fill=True, facecolor='orange', edgecolor='black'))
ax.text(7, 4.6, "CHUNKED", ha='center', va='center', fontsize=10, fontweight='bold')

# Level 2 - Large dataset branches
ax.plot([7.2, 6, 6], [6.2, 5.5, 3.0], 'k-', linewidth=1)
ax.text(6.3, 5.6, "> 64GB", fontsize=9)
ax.add_patch(plt.Rectangle((4.5, 2.2), 3, 0.8, fill=True, facecolor='orange', edgecolor='black'))
ax.text(6, 2.6, "CHUNKED", ha='center', va='center', fontsize=10, fontweight='bold')

ax.plot([8.8, 9.5, 9.5], [6.2, 5.5, 3.0], 'k-', linewidth=1)
ax.text(9.2, 5.6, "< 64GB", fontsize=9)
ax.add_patch(plt.Rectangle((8, 2.2), 1.8, 0.8, fill=True, facecolor='salmon', edgecolor='black'))
ax.text(8.9, 2.6, "STREAMING", ha='center', va='center', fontsize=9, fontweight='bold')

# Additional note for massive datasets
ax.add_patch(plt.Rectangle((0.5, 0.5), 9, 1.2, fill=True, facecolor='lightgray', edgecolor='black', alpha=0.3))
ax.text(5, 1.1, "For > 100M points: STREAMING_CHECKPOINT (adds fault tolerance)", 
        ha='center', va='center', fontsize=10, style='italic')

plt.tight_layout()
plt.savefig("figures/02_tier_decision_tree.png", dpi=300, bbox_inches="tight")
plt.show()

### Section 4: Manual Tier Override

You can override the automatic tier selection using `WorkflowConfig`.

In [None]:
# Create configs with explicit tiers
config_standard = WorkflowConfig(tier=WorkflowTier.STANDARD)
config_chunked = WorkflowConfig(tier=WorkflowTier.CHUNKED)
config_streaming = WorkflowConfig(tier=WorkflowTier.STREAMING)
config_checkpoint = WorkflowConfig(tier=WorkflowTier.STREAMING_CHECKPOINT)

print("Manual Tier Override Examples")
print("=" * 50)
print(f"config_standard.tier = {config_standard.tier}")
print(f"config_chunked.tier = {config_chunked.tier}")
print(f"config_streaming.tier = {config_streaming.tier}")
print(f"config_checkpoint.tier = {config_checkpoint.tier}")

In [None]:
# Define a test model
def exponential_decay(x, a, b, c):
    """Exponential decay: y = a * exp(-b * x) + c"""
    return a * jnp.exp(-b * x) + c

# Generate test data
n_samples = 1000
x_data = np.linspace(0, 5, n_samples)
true_a, true_b, true_c = 3.0, 1.2, 0.5
y_true = true_a * np.exp(-true_b * x_data) + true_c
y_data = y_true + 0.1 * np.random.randn(n_samples)

print(f"Test dataset: {n_samples} points")
print(f"True parameters: a={true_a}, b={true_b}, c={true_c}")

In [None]:
# Force CHUNKED tier even for small dataset (demonstration)
# In practice, auto-selection would use STANDARD for 1000 points

print("\nUsing STANDARD tier (auto-selected for small data):")
popt_standard, _ = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
)
print(f"  Result: a={popt_standard[0]:.4f}, b={popt_standard[1]:.4f}, c={popt_standard[2]:.4f}")

# Note: For small datasets, manually forcing CHUNKED or STREAMING 
# would require using curve_fit_large directly

### Section 5: Memory Usage Comparison

Each tier has different memory characteristics. Let's visualize the theoretical memory usage.

In [None]:
# Memory usage estimation for different tiers
def estimate_memory_usage(n_points, n_params, tier):
    """Estimate memory usage in GB for a given tier."""
    bytes_per_point = 8 * (3 + n_params)  # x, y, residual + jacobian
    
    if tier == WorkflowTier.STANDARD:
        # All data in memory
        return n_points * bytes_per_point / 1e9
    elif tier == WorkflowTier.CHUNKED:
        # Chunk size typically 100K-1M
        chunk_size = min(1_000_000, n_points)
        return chunk_size * bytes_per_point / 1e9
    elif tier in (WorkflowTier.STREAMING, WorkflowTier.STREAMING_CHECKPOINT):
        # Batch size typically 50K
        batch_size = 50_000
        return batch_size * bytes_per_point / 1e9
    else:
        return 0

In [None]:
# Compare memory usage across dataset sizes
dataset_sizes = np.logspace(3, 9, 50)  # 1K to 1B points
n_params = 5

memory_standard = [estimate_memory_usage(int(n), n_params, WorkflowTier.STANDARD) for n in dataset_sizes]
memory_chunked = [estimate_memory_usage(int(n), n_params, WorkflowTier.CHUNKED) for n in dataset_sizes]
memory_streaming = [estimate_memory_usage(int(n), n_params, WorkflowTier.STREAMING) for n in dataset_sizes]

# Plot memory comparison
fig, ax = plt.subplots(figsize=(12, 7))

ax.loglog(dataset_sizes, memory_standard, 'b-', linewidth=2, label='STANDARD')
ax.loglog(dataset_sizes, memory_chunked, 'orange', linewidth=2, label='CHUNKED')
ax.loglog(dataset_sizes, memory_streaming, 'r-', linewidth=2, label='STREAMING')

# Add memory threshold lines
ax.axhline(y=16, color='gray', linestyle='--', alpha=0.5, label='16 GB limit')
ax.axhline(y=64, color='gray', linestyle=':', alpha=0.5, label='64 GB limit')

# Add tier transition zones
ax.axvline(x=10_000, color='green', linestyle='--', alpha=0.3)
ax.axvline(x=10_000_000, color='orange', linestyle='--', alpha=0.3)
ax.axvline(x=100_000_000, color='red', linestyle='--', alpha=0.3)

ax.text(3000, 100, "STANDARD\nzone", fontsize=9, ha='center')
ax.text(300_000, 100, "CHUNKED\nzone", fontsize=9, ha='center')
ax.text(30_000_000, 100, "STREAMING\nzone", fontsize=9, ha='center')

ax.set_xlabel("Dataset Size (points)")
ax.set_ylabel("Peak Memory Usage (GB)")
ax.set_title("Memory Usage by Workflow Tier")
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3, which='both')
ax.set_xlim(1e3, 1e9)
ax.set_ylim(1e-3, 1e3)

plt.tight_layout()
plt.savefig("figures/02_memory_comparison.png", dpi=300, bbox_inches="tight")
plt.show()

print("Interpretation:")
print("  - STANDARD: Memory grows linearly with dataset size")
print("  - CHUNKED: Memory capped at chunk size (~1M points)")
print("  - STREAMING: Memory capped at batch size (~50K points)")

---

## Key Takeaways

After completing this notebook, remember:

1. **Four tiers available:** STANDARD, CHUNKED, STREAMING, STREAMING_CHECKPOINT

2. **Automatic selection based on:**
   - Dataset size (primary factor)
   - Available memory (CPU + GPU)
   - Optimization goal

3. **Memory trade-offs:**
   - STANDARD: O(N) memory, best precision
   - CHUNKED: O(chunk_size) memory, good precision
   - STREAMING: O(batch_size) memory, streaming convergence

4. **Override when needed:**
   ```python
   config = WorkflowConfig(tier=WorkflowTier.STREAMING)
   ```

---

## Common Questions

**Q: When should I manually override the tier?**

A: Override when you know your memory constraints better than the auto-detector. For example, if you're running alongside other processes that consume memory, force a lower-memory tier.

**Q: Does CHUNKED give the same results as STANDARD?**

A: Nearly identical. CHUNKED processes data in chunks and refines parameters progressively. For well-conditioned problems, results are typically within 0.1% of STANDARD.

**Q: When is STREAMING_CHECKPOINT needed?**

A: For multi-hour fits on massive datasets where fault tolerance is important. Checkpointing allows resuming from the last saved state if the job is interrupted.

---

## Related Resources

**Next steps:**
- [03_optimization_goals.ipynb](03_optimization_goals.ipynb) - FAST, ROBUST, QUALITY goals
- [06_auto_selection.ipynb](06_auto_selection.ipynb) - Deep dive into WorkflowSelector

**Further reading:**
- [Large Dataset Guide](https://nlsq.readthedocs.io/large-datasets/)

---

## Glossary

**Chunking:** Processing data in fixed-size portions to manage memory usage.

**Streaming:** Processing data in mini-batches using gradient-based optimization.

**Checkpointing:** Saving optimization state periodically to enable recovery from failures.

**Memory Tier:** Classification of available system memory (LOW, MEDIUM, HIGH, VERY_HIGH).

In [None]:
# Final summary
print("Summary")
print("=" * 60)
print()
print("Workflow Tiers:")
print("  STANDARD:            < 10K points, full precision")
print("  CHUNKED:             10K - 10M points, memory-managed")
print("  STREAMING:           10M - 100M points, mini-batch")
print("  STREAMING_CHECKPOINT: > 100M points, fault-tolerant")
print()
print("Override syntax:")
print("  config = WorkflowConfig(tier=WorkflowTier.CHUNKED)")
print()
print(f"Current system memory: {available_memory:.1f} GB ({current_tier.name})")