# 4-Layer Defense Strategy for L-BFGS Warmup Divergence Prevention

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/imewei/NLSQ/blob/main/examples/notebooks/05_feature_demos/defense_layers_demo.ipynb)

**Level**: Intermediate | **Time**: 25-30 minutes | **Version**: 0.3.6+

---

## Overview

This notebook demonstrates NLSQ's **4-layer defense strategy** that prevents Adam optimizer divergence during the warmup phase when initial parameters are already near optimal.

### What You'll Learn

- Understanding each defense layer and when it activates
- Using telemetry to monitor defense layer behavior
- Configuring defense sensitivity for different scenarios
- Using preset configurations (strict, relaxed, scientific)
- Troubleshooting common defense layer issues

### Why Defense Layers?

The hybrid streaming optimizer uses Adam for initial warmup before switching to Gauss-Newton. However, if initial parameters are already close to optimal, Adam's aggressive updates can push parameters **away** from the optimum, causing divergence.

The 4-layer defense strategy prevents this by:
1. **Layer 1**: Detecting warm starts and skipping warmup
2. **Layer 2**: Adapting learning rate based on initial fit quality
3. **Layer 3**: Aborting if loss increases beyond tolerance
4. **Layer 4**: Clipping update magnitude to prevent large jumps

In [1]:
# @title Install NLSQ (run once in Colab)
import sys

if 'google.colab' in sys.modules:
    print("Running in Google Colab - installing NLSQ...")
    !pip install -q nlsq
    print("NLSQ installed successfully!")
else:
    print("Not running in Colab - assuming NLSQ is already installed")

Not running in Colab - assuming NLSQ is already installed


In [2]:
# Configure matplotlib for inline plotting
%matplotlib inline

In [3]:
import jax.numpy as jnp
import matplotlib.pyplot as plt
import numpy as np

from nlsq import (
    HybridStreamingConfig,
    curve_fit,
    get_defense_telemetry,
    reset_defense_telemetry,
)

np.random.seed(42)
print("Setup complete!")

Setup complete!


---

## 1. Understanding the 4 Defense Layers

Let's set up a test model and explore each defense layer.

In [4]:
def exponential_decay(x, a, b, c):
    """Three-parameter exponential decay: y = a * exp(-b * x) + c"""
    return a * jnp.exp(-b * x) + c

# Generate synthetic data
true_params = np.array([5.0, 0.5, 1.0])
x = np.linspace(0, 10, 500)  # Reduced for faster execution
y_true = exponential_decay(x, *true_params)
y = y_true + np.random.normal(0, 0.1, len(x))

print(f"Dataset: {len(x)} samples")
print(f"True parameters: a={true_params[0]}, b={true_params[1]}, c={true_params[2]}")

Dataset: 500 samples
True parameters: a=5.0, b=0.5, c=1.0


### Layer 1: Warm Start Detection

**Purpose**: Skip L-BFGS warmup entirely when initial parameters are already near optimal.

**How it works**: Computes `relative_loss = initial_loss / y_variance`. If `relative_loss < warm_start_threshold` (default 1%), warmup is skipped.

**When it helps**: Refinement workflows, iterative fitting, warm starts from previous fits.

In [5]:
# Demonstrate Layer 1: Near-optimal initial guess
print("=" * 60)
print("Layer 1: Warm Start Detection Demo")
print("=" * 60)

# Reset telemetry to track this specific fit
reset_defense_telemetry()

# Use parameters very close to true values (simulating a refinement scenario)
near_optimal_p0 = true_params + np.random.normal(0, 0.01, 3)
print(f"\nNear-optimal p0: {near_optimal_p0}")
print(f"True params:     {true_params}")

# Fit with hybrid_streaming
popt, pcov = curve_fit(
    exponential_decay,
    x, y,
    p0=near_optimal_p0,
    method='hybrid_streaming',
    verbose=0,
)

# Check telemetry
telemetry = get_defense_telemetry()
summary = telemetry.get_summary()
layer1_triggers = summary['layer1']['triggers']

print(f"\nFitted params: {popt}")
print("\n--- Telemetry ---")
print(f"Layer 1 triggers: {layer1_triggers}")
print(f"Total warmup calls: {summary['total_warmup_calls']}")

if layer1_triggers > 0:
    print("\n-> Layer 1 detected warm start and skipped L-BFGS warmup!")
else:
    print("\n-> Initial guess not close enough for warm start detection")

Layer 1: Warm Start Detection Demo

Near-optimal p0: [5.00926178 0.51909417 0.98601432]
True params:     [5.  0.5 1. ]


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=8.1283e-03)


  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.9s
  Initial JTJ complete: cost=6.195290e+00, time=0.9s


  GN iter 1/100: cost=4.789448e+00, grad_norm=4.423933e+01, reduction=1.405842e+00, Δ=1.0000, time=2.3s


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.575944e-01, reduction=8.949255e-04, Δ=1.0000, time=0.3s


  GN iter 3/100: cost=4.788553e+00, grad_norm=2.941278e-04, reduction=6.766934e-10, Δ=1.0000, time=0.4s



Fitted params: [4.97480135 0.49754671 1.00100805]

--- Telemetry ---
Layer 1 triggers: 1
Total warmup calls: 1

-> Layer 1 detected warm start and skipped L-BFGS warmup!


### Layer 2: Adaptive Step Size

**Purpose**: Automatically select learning rate based on initial loss quality.

**Learning rate tiers**:
- **Refinement** (1e-6): `relative_loss < 0.1` - ultra-conservative for excellent fits
- **Careful** (1e-5): `0.1 <= relative_loss < 1.0` - conservative for good fits
- **Exploration** (0.001): `relative_loss >= 1.0` - standard rate for poor fits

**When it helps**: Multi-scale parameter problems, mixed starting point quality.

In [6]:
# Demonstrate Layer 2: Different LR modes
print("=" * 60)
print("Layer 2: Adaptive Step Size Demo")
print("=" * 60)

# Test with different initial guess qualities
test_cases = [
    ("Excellent fit (refinement LR)", true_params * np.array([1.02, 1.01, 0.99])),
    ("Good fit (careful LR)", true_params * np.array([1.2, 0.8, 1.3])),
    ("Poor fit (exploration LR)", np.array([1.0, 0.1, 5.0])),
]

for name, p0 in test_cases:
    reset_defense_telemetry()

    # Disable Layer 1 to see Layer 2 in action
    config = HybridStreamingConfig(
        enable_warm_start_detection=False,  # Force warmup to run
        warmup_iterations=50,  # Short warmup for demo
    )

    popt, _ = curve_fit(
        exponential_decay, x, y, p0=p0,
        method='hybrid_streaming',
        config=config,
        verbose=0,
    )

    telemetry = get_defense_telemetry()
    lr_counts = telemetry.layer2_lr_mode_counts

    print(f"\n{name}")
    print(f"  p0: {p0}")
    print(f"  LR mode counts: {lr_counts}")

INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.8137e-03)


Layer 2: Adaptive Step Size Demo
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=5.193323e+00, time=0.2s


  GN iter 1/100: cost=4.788592e+00, grad_norm=1.704892e+01, reduction=4.047309e-01, Δ=1.0000, time=0.4s


  GN iter 2/100: cost=4.788553e+00, grad_norm=2.317917e-01, reduction=3.916702e-05, Δ=1.0000, time=0.4s


  GN iter 3/100: cost=4.788553e+00, grad_norm=3.017041e-04, reduction=9.035031e-10, Δ=1.0000, time=0.4s

Excellent fit (refinement LR)
  p0: [5.1   0.505 0.99 ]
  LR mode counts: {'refinement': 0, 'careful': 0, 'exploration': 0, 'fixed': 0}


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=2.717884e-09, reduction=1.776357e-15, Δ=1.0000, time=0.3s

Good fit (careful LR)
  p0: [6.  0.4 1.3]
  LR mode counts: {'refinement': 0, 'careful': 1, 'exploration': 0, 'fixed': 0}


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=5.713902e-08, reduction=-8.881784e-16, Δ=0.5000, time=0.3s

Poor fit (exploration LR)
  p0: [1.  0.1 5. ]
  LR mode counts: {'refinement': 0, 'careful': 0, 'exploration': 1, 'fixed': 0}


### Layer 3: Cost-Increase Guard

**Purpose**: Abort warmup immediately if loss increases beyond tolerance.

**How it works**: After each Adam step, checks if `current_loss > initial_loss * (1 + cost_increase_tolerance)`. If triggered, returns the **best parameters found** (not the diverged ones).

**Default tolerance**: 5% (configurable)

**When it helps**: Preventing divergence from near-optimal starting points.

In [7]:
# Demonstrate Layer 3: Cost Guard Protection
print("=" * 60)
print("Layer 3: Cost-Increase Guard Demo")
print("=" * 60)

# Scenario: Aggressive LR that would cause divergence
# Layer 3 should catch this and return best params
reset_defense_telemetry()

# Use a config that might cause divergence (high LR, disabled other defenses)
config = HybridStreamingConfig(
    enable_warm_start_detection=False,
    enable_adaptive_warmup_lr=False,  # Use fixed (high) LR
    warmup_learning_rate=0.1,  # Very aggressive LR
    enable_cost_guard=True,  # But cost guard is ON
    cost_increase_tolerance=0.05,  # Abort if loss increases >5%
    enable_step_clipping=False,
    warmup_iterations=100,
)

# Start from excellent initial guess
excellent_p0 = true_params * np.array([1.01, 0.99, 1.005])
print(f"Starting from near-optimal p0: {excellent_p0}")
print(f"With aggressive LR={config.warmup_learning_rate}")

popt, _ = curve_fit(
    exponential_decay, x, y, p0=excellent_p0,
    method='hybrid_streaming',
    config=config,
    verbose=0,
)

telemetry = get_defense_telemetry()
summary = telemetry.get_summary()
layer3_triggers = summary['layer3']['triggers']

print(f"\nFinal params: {popt}")
print("\n--- Telemetry ---")
print(f"Layer 3 (cost guard) triggers: {layer3_triggers}")

if layer3_triggers > 0:
    print("\n-> Layer 3 detected loss increase and aborted warmup!")
    print("   Returned best parameters found, not diverged values.")

INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.9240e-03)


Layer 3: Cost-Increase Guard Demo
Starting from near-optimal p0: [5.05  0.495 1.005]
With aggressive LR=0.1
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=5.277428e+00, time=0.2s


  GN iter 1/100: cost=4.788560e+00, grad_norm=3.112457e+01, reduction=4.888676e-01, Δ=1.0000, time=0.3s


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.679260e-02, reduction=6.680049e-06, Δ=1.0000, time=0.3s


  GN iter 3/100: cost=4.788553e+00, grad_norm=1.278903e-04, reduction=1.141194e-10, Δ=1.0000, time=0.2s

Final params: [4.97480136 0.49754672 1.00100806]

--- Telemetry ---
Layer 3 (cost guard) triggers: 0


### Layer 4: Trust Region Constraint (Step Clipping)

**Purpose**: Limit parameter update magnitude to prevent large destabilizing jumps.

**How it works**: Clips Adam parameter updates to max L2 norm of `max_warmup_step_size` (default 0.1).

**When it helps**: Multi-scale parameters, ill-conditioned problems, preventing overshooting.

In [8]:
# Demonstrate Layer 4: Step Clipping
print("=" * 60)
print("Layer 4: Trust Region Constraint Demo")
print("=" * 60)

reset_defense_telemetry()

# Config with step clipping
config = HybridStreamingConfig(
    enable_warm_start_detection=False,
    enable_adaptive_warmup_lr=False,
    warmup_learning_rate=0.01,  # High LR that would produce large steps
    enable_cost_guard=False,
    enable_step_clipping=True,  # Step clipping ON
    max_warmup_step_size=0.1,  # Max L2 norm of update
    warmup_iterations=100,
)

# Poor initial guess that needs exploration
poor_p0 = np.array([10.0, 0.1, 5.0])
print(f"Starting from poor p0: {poor_p0}")

popt, _ = curve_fit(
    exponential_decay, x, y, p0=poor_p0,
    method='hybrid_streaming',
    config=config,
    verbose=0,
)

telemetry = get_defense_telemetry()
summary = telemetry.get_summary()
layer4_triggers = summary['layer4']['triggers']

print(f"\nFinal params: {popt}")
print(f"True params:  {true_params}")
print("\n--- Telemetry ---")
print(f"Layer 4 (step clipping) triggers: {layer4_triggers}")

if layer4_triggers > 0:
    print(f"\n-> Layer 4 clipped {layer4_triggers} large parameter updates")

Layer 4: Trust Region Constraint Demo
Starting from poor p0: [10.   0.1  5. ]


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=8.327549e-04, reduction=5.099432e-10, Δ=1.0000, time=0.3s

Final params: [4.97480135 0.49754671 1.00100805]
True params:  [5.  0.5 1. ]

--- Telemetry ---
Layer 4 (step clipping) triggers: 32

-> Layer 4 clipped 32 large parameter updates


---

## 2. Using Defense Layer Telemetry

Telemetry helps monitor defense layer behavior in production.

In [9]:
# Run multiple fits and collect telemetry
print("=" * 60)
print("Defense Layer Telemetry Demo")
print("=" * 60)

reset_defense_telemetry()

# Simulate a batch of fits with varying starting points
n_fits = 5  # Reduced for faster execution
for i in range(n_fits):
    # Vary initial guess quality
    noise_scale = 0.01 if i < 2 else (0.3 if i < 4 else 1.0)
    p0 = true_params * (1 + np.random.uniform(-noise_scale, noise_scale, 3))

    popt, _ = curve_fit(
        exponential_decay, x, y, p0=p0,
        method='hybrid_streaming',
        verbose=0,
    )

# Get comprehensive telemetry report
telemetry = get_defense_telemetry()

print(f"\n--- Summary after {n_fits} fits ---")
summary = telemetry.get_summary()
for key, value in summary.items():
    print(f"  {key}: {value}")

print("\n--- Trigger Rates ---")
rates = telemetry.get_trigger_rates()
for key, value in rates.items():
    print(f"  {key}: {value:.1f}%")

INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.3225e-03)


Defense Layer Telemetry Demo
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.818970e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=7.345916e+00, reduction=3.041718e-02, Δ=1.0000, time=0.4s


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.3068e-03)


  GN iter 2/100: cost=4.788553e+00, grad_norm=2.649206e-03, reduction=2.097836e-08, Δ=1.0000, time=0.3s
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.807003e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=5.642843e+00, reduction=1.845015e-02, Δ=1.0000, time=0.5s


  GN iter 2/100: cost=4.788553e+00, grad_norm=1.196050e-03, reduction=1.037113e-08, Δ=1.0000, time=0.2s


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=5.578396e-08, reduction=-1.776357e-15, Δ=0.5000, time=0.3s


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.529763e-13, reduction=0.000000e+00, Δ=0.5000, time=0.4s


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.788553e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.368436e-09, reduction=1.776357e-15, Δ=1.0000, time=0.4s

--- Summary after 5 fits ---
  total_warmup_calls: 5
  layer1: {'name': 'warm_start_detection', 'triggers': 2, 'rate_pct': 40.0}
  layer2: {'name': 'adaptive_lr_selection', 'mode_counts': {'refinement': 1, 'careful': 2, 'exploration': 0, 'fixed': 0}, 'rates_pct': {'refinement': 20.0, 'careful': 40.0, 'exploration': 0.0}}
  layer3: {'name': 'cost_increase_guard', 'triggers': 0, 'rate_pct': 0.0}
  layer4: {'name': 'step_clipping', 'triggers': 38, 'rate_pct': 760.0}

--- Trigger Rates ---
  layer1_warm_start_rate: 40.0%
  layer2_refinement_rate: 20.0%
  layer2_careful_rate: 40.0%
  layer2_exploration_rate: 0.0%
  layer3_cost_guard_rate: 0.0%
  layer4_clip_rate: 760.0%
  lbfgs_history_buffer_fill_rate: 60.0%
  lbfgs_line_search_failure_rate: 0.0%


In [10]:
# View recent events
print("\n--- Recent Events (last 5) ---")
events = telemetry.get_recent_events(5)
for event in events:
    print(f"  {event['timestamp']}: {event['type']} - {event.get('data', {})}")


--- Recent Events (last 5) ---
  1767720020.8401947: layer4_clip - {'original_norm': 0.2910247763332054, 'max_norm': 0.1}
  1767720021.5858948: layer4_clip - {'original_norm': 0.24028729046779526, 'max_norm': 0.1}
  1767720022.3690147: layer4_clip - {'original_norm': 0.19601930467622072, 'max_norm': 0.1}
  1767720023.2728324: layer4_clip - {'original_norm': 0.14798067841343993, 'max_norm': 0.1}
  1767720023.9341311: layer4_clip - {'original_norm': 0.10031077975273962, 'max_norm': 0.1}


In [11]:
# Export metrics (Prometheus/Grafana compatible)
print("\n--- Prometheus-Compatible Metrics ---")
metrics = telemetry.export_metrics()
for metric_name, value in metrics.items():
    print(f"  {metric_name}: {value}")


--- Prometheus-Compatible Metrics ---
  nlsq_defense_warmup_calls_total: 5
  nlsq_defense_layer1_triggers_total: 2
  nlsq_defense_layer2_refinement_total: 1
  nlsq_defense_layer2_careful_total: 2
  nlsq_defense_layer2_exploration_total: 0
  nlsq_defense_layer3_triggers_total: 0
  nlsq_defense_layer4_triggers_total: 38
  nlsq_defense_lbfgs_history_fill_total: 3
  nlsq_defense_lbfgs_line_search_failures_total: 0


---

## 3. Configuration Presets

NLSQ provides preset configurations for common scenarios.

In [12]:
# Available presets
print("=" * 60)
print("Configuration Presets")
print("=" * 60)

presets = [
    ("Default", HybridStreamingConfig()),
    ("defense_strict()", HybridStreamingConfig.defense_strict()),
    ("defense_relaxed()", HybridStreamingConfig.defense_relaxed()),
    ("defense_disabled()", HybridStreamingConfig.defense_disabled()),
    ("scientific_default()", HybridStreamingConfig.scientific_default()),
]

print("\n{:<22} {:>8} {:>10} {:>12} {:>8}".format(
    "Preset", "Layer1", "Layer2", "Layer3", "Layer4"
))
print("-" * 60)

for name, config in presets:
    print("{:<22} {:>8} {:>10} {:>12} {:>8}".format(
        name,
        "ON" if config.enable_warm_start_detection else "OFF",
        "ON" if config.enable_adaptive_warmup_lr else "OFF",
        "ON" if config.enable_cost_guard else "OFF",
        "ON" if config.enable_step_clipping else "OFF",
    ))

Configuration Presets

Preset                   Layer1     Layer2       Layer3   Layer4
------------------------------------------------------------
Default                      ON         ON           ON       ON
defense_strict()             ON         ON           ON       ON
defense_relaxed()            ON         ON           ON       ON
defense_disabled()          OFF        OFF          OFF      OFF
scientific_default()         ON         ON           ON       ON


In [13]:
# Compare preset behavior
print("\n--- Preset Comparison: Near-Optimal Start ---")

near_optimal_p0 = true_params * np.array([1.005, 0.995, 1.002])

for name, config in presets[:4]:  # Skip scientific_default for brevity
    reset_defense_telemetry()

    popt, _ = curve_fit(
        exponential_decay, x, y, p0=near_optimal_p0,
        method='hybrid_streaming',
        config=config,
        verbose=0,
    )

    telemetry = get_defense_telemetry()
    rates = telemetry.get_trigger_rates()

    error = np.linalg.norm(popt - true_params)
    print(f"\n{name}:")
    print(f"  Parameter error: {error:.6f}")
    print(f"  Layer 1 rate: {rates.get('layer1_warm_start_rate', 0):.0f}%")

INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.4655e-03)



--- Preset Comparison: Near-Optimal Start ---
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.927918e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.599895e+01, reduction=1.393652e-01, Δ=1.0000, time=0.4s


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.4655e-03)


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.977496e-04, reduction=7.763559e-10, Δ=1.0000, time=0.4s

Default:
  Parameter error: 0.025338
  Layer 1 rate: 100%
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.927918e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.599895e+01, reduction=1.393652e-01, Δ=1.0000, time=0.4s


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.977496e-04, reduction=7.763559e-10, Δ=1.0000, time=0.4s

defense_strict():
  Parameter error: 0.025338
  Layer 1 rate: 100%


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.4655e-03)


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.927918e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.599895e+01, reduction=1.393652e-01, Δ=1.0000, time=0.4s


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.4655e-03)


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.977496e-04, reduction=7.763559e-10, Δ=1.0000, time=0.4s

defense_relaxed():
  Parameter error: 0.025338
  Layer 1 rate: 100%
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=4.927918e+00, time=0.2s


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.599895e+01, reduction=1.393652e-01, Δ=1.0000, time=0.4s


  GN iter 2/100: cost=4.788553e+00, grad_norm=9.977496e-04, reduction=7.763559e-10, Δ=1.0000, time=0.4s

defense_disabled():
  Parameter error: 0.025338
  Layer 1 rate: 100%


---

## 4. Custom Configuration

Fine-tune defense layers for your specific needs.

In [14]:
# Custom configuration example
print("=" * 60)
print("Custom Configuration Example")
print("=" * 60)

# Scenario: Scientific computing with multi-scale parameters
custom_config = HybridStreamingConfig(
    # Layer 1: Stricter warm start detection
    enable_warm_start_detection=True,
    warm_start_threshold=0.005,  # 0.5% instead of 1%

    # Layer 2: Conservative learning rates
    enable_adaptive_warmup_lr=True,
    warmup_lr_refinement=1e-7,  # Ultra-conservative
    warmup_lr_careful=1e-6,
    warmup_learning_rate=1e-4,  # Lower than default 0.001

    # Layer 3: Tighter cost tolerance
    enable_cost_guard=True,
    cost_increase_tolerance=0.02,  # 2% instead of 5%

    # Layer 4: Smaller step limit
    enable_step_clipping=True,
    max_warmup_step_size=0.05,  # Half the default

    # Other settings
    precision='float64',  # Full precision for scientific work
    warmup_iterations=300,
)

print("Custom config created with:")
print(f"  warm_start_threshold: {custom_config.warm_start_threshold}")
print(f"  warmup_lr_refinement: {custom_config.warmup_lr_refinement}")
print(f"  cost_increase_tolerance: {custom_config.cost_increase_tolerance}")
print(f"  max_warmup_step_size: {custom_config.max_warmup_step_size}")

# Use the custom config
reset_defense_telemetry()
popt, pcov = curve_fit(
    exponential_decay, x, y,
    p0=np.array([4.5, 0.45, 1.1]),
    method='hybrid_streaming',
    config=custom_config,
    verbose=0,
)

print(f"\nFitted params: {popt}")
print(f"Std errors: {np.sqrt(np.diag(pcov))}")

Custom Configuration Example
Custom config created with:
  warm_start_threshold: 0.005
  warmup_lr_refinement: 1e-07
  cost_increase_tolerance: 0.02
  max_warmup_step_size: 0.05


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=2.993245e-12, reduction=8.881784e-16, Δ=1.0000, time=0.4s

Fitted params: [4.97480136 0.49754672 1.00100807]
Std errors: [0.01946782 0.00422352 0.00862452]


---

## 5. Practical Scenarios

### Scenario A: Warm Start Refinement

Refining parameters from a previous fit.

In [15]:
# Warm Start Refinement Scenario
print("=" * 60)
print("Scenario A: Warm Start Refinement")
print("=" * 60)

# Step 1: Initial fit on first batch of data
x1 = x[:500]
y1 = y[:500]

popt_v1, _ = curve_fit(
    exponential_decay, x1, y1,
    p0=np.array([3.0, 0.3, 0.5]),
    method='hybrid_streaming',
    verbose=0,
)
print(f"Initial fit (v1): {popt_v1}")

# Step 2: Refinement with full data (using v1 as starting point)
reset_defense_telemetry()

popt_v2, pcov_v2 = curve_fit(
    exponential_decay, x, y,
    p0=popt_v1,  # Use previous result as initial guess
    method='hybrid_streaming',
    config=HybridStreamingConfig.defense_strict(),  # Use strict for refinement
    verbose=0,
)

telemetry = get_defense_telemetry()
rates = telemetry.get_trigger_rates()

print(f"Refined fit (v2): {popt_v2}")
print(f"True params:      {true_params}")
print(f"\nLayer 1 (warm start) triggered: {rates.get('layer1_warm_start_rate', 0):.0f}%")
print("-> Defense layers protected against divergence from good initial guess")

Scenario A: Warm Start Refinement


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=6.2826e-03)


  GN iter 1/100: cost=4.788553e+00, grad_norm=1.374809e-07, reduction=8.881784e-16, Δ=1.0000, time=0.3s
Initial fit (v1): [4.97480136 0.49754672 1.00100807]
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=4.788553e+00, time=0.1s


  GN iter 1/100: cost=4.788553e+00, grad_norm=6.604561e-10, reduction=-8.881784e-16, Δ=0.5000, time=0.3s
Refined fit (v2): [4.97480136 0.49754672 1.00100807]
True params:      [5.  0.5 1. ]

Layer 1 (warm start) triggered: 100%
-> Defense layers protected against divergence from good initial guess


### Scenario B: Production Monitoring

Monitoring defense layer activations in a batch processing pipeline.

In [16]:
# Production Monitoring Scenario
print("=" * 60)
print("Scenario B: Production Monitoring")
print("=" * 60)

reset_defense_telemetry()

# Simulate production batch with varying data quality
results = []
for i in range(3):  # Reduced from 20 for faster execution
    # Simulate different starting point qualities
    if i < 2:
        p0 = true_params * (1 + np.random.uniform(-0.01, 0.01, 3))  # Excellent
    elif i < 7:
        p0 = true_params * (1 + np.random.uniform(-0.2, 0.2, 3))  # Good
    else:
        p0 = np.array([1.0, 0.1, 5.0]) + np.random.uniform(-0.5, 0.5, 3)  # Poor

    # Add some noise to data
    y_noisy = y + np.random.normal(0, 0.05 * (i % 3 + 1), len(y))

    popt, _ = curve_fit(
        exponential_decay, x, y_noisy, p0=p0,
        method='hybrid_streaming',
        verbose=0,
    )
    results.append(popt)

# Production monitoring report
telemetry = get_defense_telemetry()
rates = telemetry.get_trigger_rates()

print(f"\n--- Production Report ({len(results)} fits) ---")
print("\nDefense Layer Activation Rates:")
print(f"  Layer 1 (Warm Start):     {rates.get('layer1_warm_start_rate', 0):.1f}%")
print(f"  Layer 2 (Refinement LR):  {rates.get('layer2_refinement_rate', 0):.1f}%")
print(f"  Layer 2 (Careful LR):     {rates.get('layer2_careful_rate', 0):.1f}%")
print(f"  Layer 2 (Exploration LR): {rates.get('layer2_exploration_rate', 0):.1f}%")
print(f"  Layer 3 (Cost Guard):     {rates.get('layer3_cost_guard_rate', 0):.1f}%")
print(f"  Layer 4 (Step Clipping):  {rates.get('layer4_clip_rate', 0):.1f}%")

# Alerts based on rates
print("\n--- Alerts ---")
if rates.get('layer1_warm_start_rate', 0) > 50:
    print("INFO: >50% warm starts - consider using defense_strict()")
if rates.get('layer3_cost_guard_rate', 0) > 20:
    print("WARNING: >20% cost guard triggers - review initial guess quality")
if rates.get('layer3_cost_guard_rate', 0) == 0 and rates.get('layer1_warm_start_rate', 0) == 0:
    print("OK: Defense layers active but not frequently triggered")

INFO:nlsq.adaptive_hybrid_streaming:Phase 1: Skipping L-BFGS warmup - warm start detected (relative_loss=7.8675e-03)


Scenario B: Production Monitoring
  Computing initial JTJ (1 chunks, 500 points)...


  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=5.959736e+00, time=0.1s


  GN iter 1/100: cost=5.773530e+00, grad_norm=8.409017e+00, reduction=1.862067e-01, Δ=1.0000, time=0.3s


  GN iter 2/100: cost=5.773523e+00, grad_norm=9.276995e-02, reduction=6.188714e-06, Δ=1.0000, time=0.3s


  GN iter 3/100: cost=5.773523e+00, grad_norm=5.816759e-05, reduction=3.226042e-11, Δ=1.0000, time=0.3s


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.1s
  Initial JTJ complete: cost=9.551355e+00, time=0.1s


  GN iter 1/100: cost=9.551355e+00, grad_norm=4.900457e-13, reduction=0.000000e+00, Δ=0.5000, time=0.3s


  Computing initial JTJ (1 chunks, 500 points)...
  Initial JTJ: 1/1 chunks (100%), elapsed=0.2s
  Initial JTJ complete: cost=1.572212e+01, time=0.2s


  GN iter 1/100: cost=1.572212e+01, grad_norm=2.268008e-09, reduction=-1.776357e-15, Δ=0.5000, time=0.5s

--- Production Report (3 fits) ---

Defense Layer Activation Rates:
  Layer 1 (Warm Start):     33.3%
  Layer 2 (Refinement LR):  66.7%
  Layer 2 (Careful LR):     0.0%
  Layer 2 (Exploration LR): 0.0%
  Layer 3 (Cost Guard):     0.0%
  Layer 4 (Step Clipping):  0.0%

--- Alerts ---


---

## 6. Troubleshooting

### Common Issues and Solutions

In [17]:
# Troubleshooting guide
print("=" * 60)
print("Troubleshooting Guide")
print("=" * 60)

issues = [
    {
        "problem": "Warmup always skipped (Layer 1 always triggers)",
        "cause": "warm_start_threshold too high for your use case",
        "solution": "config = HybridStreamingConfig(warm_start_threshold=0.001)",
    },
    {
        "problem": "Convergence too slow after upgrading to 0.3.6",
        "cause": "Layer 2 using ultra-conservative LR",
        "solution": "config = HybridStreamingConfig.defense_relaxed()",
    },
    {
        "problem": "Cost guard aborts warmup too early",
        "cause": "cost_increase_tolerance too strict",
        "solution": "config = HybridStreamingConfig(cost_increase_tolerance=0.2)",
    },
    {
        "problem": "Results different from pre-0.3.6",
        "cause": "Defense layers preventing previous divergence",
        "solution": "config = HybridStreamingConfig.defense_disabled() # For testing only",
    },
    {
        "problem": "Need pre-0.3.6 behavior for regression tests",
        "cause": "Defense layers change optimization path",
        "solution": "config = HybridStreamingConfig.defense_disabled()",
    },
]

for i, issue in enumerate(issues, 1):
    print(f"\n{i}. Problem: {issue['problem']}")
    print(f"   Cause: {issue['cause']}")
    print(f"   Solution: {issue['solution']}")

Troubleshooting Guide

1. Problem: Warmup always skipped (Layer 1 always triggers)
   Cause: warm_start_threshold too high for your use case
   Solution: config = HybridStreamingConfig(warm_start_threshold=0.001)

2. Problem: Convergence too slow after upgrading to 0.3.6
   Cause: Layer 2 using ultra-conservative LR
   Solution: config = HybridStreamingConfig.defense_relaxed()

3. Problem: Cost guard aborts warmup too early
   Cause: cost_increase_tolerance too strict
   Solution: config = HybridStreamingConfig(cost_increase_tolerance=0.2)

4. Problem: Results different from pre-0.3.6
   Cause: Defense layers preventing previous divergence
   Solution: config = HybridStreamingConfig.defense_disabled() # For testing only

5. Problem: Need pre-0.3.6 behavior for regression tests
   Cause: Defense layers change optimization path
   Solution: config = HybridStreamingConfig.defense_disabled()


---

## Summary

### Key Takeaways

1. **4 Defense Layers** protect against L-BFGS warmup divergence:
   - Layer 1: Warm start detection (skip warmup if near optimal)
   - Layer 2: Adaptive step size (scale step size by fit quality)
   - Layer 3: Cost-increase guard (abort on loss increase)
   - Layer 4: Step clipping (limit update magnitude)

2. **Enabled by default** - no code changes required for most users

3. **Telemetry** helps monitor defense behavior in production:
   - `get_defense_telemetry()` - access telemetry singleton
   - `reset_defense_telemetry()` - reset counters
   - `.get_summary()` - comprehensive report
   - `.get_trigger_rates()` - percentage rates
   - `.export_metrics()` - Prometheus-compatible format

4. **Presets** for common scenarios:
   - `defense_strict()` - for refinement/warm starts
   - `defense_relaxed()` - for exploration
   - `defense_disabled()` - pre-0.3.6 behavior
   - `scientific_default()` - optimized for physics/scientific computing

### When to Customize

| Scenario | Recommendation |
|----------|----------------|
| Default usage | Use defaults (all layers ON) |
| Warm start refinement | `defense_strict()` |
| Exploration from poor guess | `defense_relaxed()` |
| Regression testing | `defense_disabled()` |
| Scientific computing | `scientific_default()` |
| Custom requirements | Create `HybridStreamingConfig(...)` |

### Next Steps

- [Hybrid Streaming API](../06_streaming/05_hybrid_streaming_api.ipynb) - Complete hybrid streaming guide
- [Troubleshooting Guide](../03_advanced/troubleshooting_guide.ipynb) - General troubleshooting
- [Migration Guide](https://nlsq.readthedocs.io/en/latest/migration/v0.3.6_defense_layers.html) - v0.3.6 migration details