# üìò Large Dataset Fitting: Handle Millions of Data Points> Master NLSQ's strategies for fitting curves to datasets too large for memory‚è±Ô∏è **20-30 minutes** | üìä **Level: ‚óè‚óè‚óã Intermediate** | üè∑Ô∏è **Memory Management** | **Performance** | **Scalability**[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/imewei/NLSQ/blob/main/examples/notebooks/02_core_tutorials/large_dataset_demo.ipynb)---

## üó∫Ô∏è Learning Path**You are here:** Core Tutorials > **Large Dataset Fitting**```Getting Started ‚Üí Quickstart ‚Üí [Large Dataset Demo] ‚Üê You are here ‚Üí GPU Optimization```**Prerequisites:**- ‚úì Completed [NLSQ Quickstart](../01_getting_started/nlsq_quickstart.ipynb)- ‚úì Familiar with NumPy arrays and JAX basics- ‚úì Understand basic curve fitting concepts- ‚úì Knowledge of memory constraints in data processing**Recommended flow:**- ‚Üê **Previous:** [NLSQ Quickstart](../01_getting_started/nlsq_quickstart.ipynb)- ‚Üí **Next (Recommended):** [GPU Optimization Deep Dive](../03_advanced/gpu_optimization_deep_dive.ipynb)- ‚Üí **Alternative:** [Performance Optimization](performance_optimization_demo.ipynb)---

## üéØ What You'll LearnAfter completing this tutorial, you will be able to:- ‚úì **Estimate memory requirements** before fitting to avoid out-of-memory errors- ‚úì **Use automatic chunking** for datasets larger than available memory- ‚úì **Implement streaming optimization** for unlimited dataset sizes (100M+ points)- ‚úì **Choose between chunking vs streaming** approaches based on dataset characteristics- ‚úì **Configure memory limits** and use context managers for temporary settings- ‚úì **Monitor and troubleshoot** large dataset fits with progress reporting---

## üí° Why This Matters**The problem:** SciPy's `curve_fit` loads entire datasets into memory, failing on large datasets or becoming prohibitively slow. For datasets >1M points, traditional approaches either crash or require excessive computation time.**NLSQ's solution:**- **Automatic memory management** - Detects available memory and optimizes strategy- **GPU acceleration** - 150-270x faster than CPU-only approaches- **Intelligent chunking** - Achieves <1% error for well-conditioned problems- **Streaming optimization** - Handles unlimited dataset sizes with zero accuracy loss- **Progress reporting** - Track long-running fits in real-time**Real-world use cases:**- üî¨ **High-throughput screening** - Millions of measurements from automated experiments- üì° **Sensor calibration** - Continuous data streams from IoT devices- üß¨ **Genomics data fitting** - Large-scale biological datasets- üå°Ô∏è **Climate model parameter estimation** - Decades of environmental measurements- üìä **Financial time series** - Years of high-frequency trading data**When to use this approach:**- ‚úÖ **Good for:** Datasets >100K points, memory-constrained environments, production systems- ‚ùå **Not needed for:** Small datasets (<10K points) ‚Üí Use [Quickstart](../01_getting_started/nlsq_quickstart.ipynb) instead**Performance characteristics:**- **Speed:** GPU acceleration provides 150-270x speedup vs SciPy- **Memory:** Processes datasets 10-100x larger than available RAM- **Accuracy:** <1% error with chunking, zero loss with streaming---

## ‚ö° Quick StartFit a 1 million point dataset in 3 steps:```pythonfrom nlsq import fit_large_datasetimport numpy as np# 1. Generate datax = np.linspace(0, 5, 1_000_000)y = 5.0 * np.exp(-1.2 * x) + 0.5 + np.random.normal(0, 0.05, 1_000_000)# 2. Define modeldef exponential_decay(x, a, b, c):    return a * jnp.exp(-b * x) + c# 3. Fit automaticallyresult = fit_large_dataset(exponential_decay, x, y, p0=[4.0, 1.0, 0.4])print(f"Parameters: {result.popt}")```**Expected output:**```‚úÖ Fit completed in 0.8 secondsParameters: [5.001, 1.199, 0.500]Relative errors: <0.1%```---

## üìñ Setup and ImportsFirst, let's import the necessary modules and verify the Python version.

In [None]:
# Configure matplotlib for inline plotting in VS Code/Jupyter
# MUST come before importing matplotlib
%matplotlib inline

In [None]:
# Check Python versionimport sysprint(f"‚úÖ Python {sys.version_info.major}.{sys.version_info.minor} meets requirements")import timeimport jax.numpy as jnpimport numpy as npfrom nlsq import (    AlgorithmSelector,    CurveFit,    LargeDatasetConfig,    LargeDatasetFitter,    LDMemoryConfig,    MemoryConfig,    __version__,    auto_select_algorithm,    configure_for_large_datasets,    curve_fit_large,    estimate_memory_requirements,    fit_large_dataset,    get_memory_config,    large_dataset_context,    memory_context,    set_memory_limits,)print(f"NLSQ version: {__version__}")print("NLSQ Large Dataset Demo - Enhanced Version")

### Define Model FunctionsWe'll use several model functions throughout this tutorial to demonstrate different aspects of large dataset fitting.

In [None]:
def exponential_decay(x, a, b, c):    """Exponential decay model with offset: y = a * exp(-b * x) + c"""    return a * jnp.exp(-b * x) + cdef polynomial_model(x, a, b, c, d):    """Polynomial model: y = a*x^3 + b*x^2 + c*x + d"""    return a * x**3 + b * x**2 + c * x + ddef gaussian(x, a, mu, sigma, offset):    """Gaussian model: y = a * exp(-((x - mu)^2) / (2*sigma^2)) + offset"""    return a * jnp.exp(-((x - mu) ** 2) / (2 * sigma**2)) + offsetdef complex_model(x, a, b, c, d, e, f):    """Complex model with many parameters for algorithm selection testing"""    return a * jnp.exp(-b * x) + c * jnp.sin(d * x) + e * x**2 + f

## 1. Memory Estimation**Key concept:** Before fitting large datasets, use `estimate_memory_requirements()` to predict memory usage and determine the optimal processing strategy.**Why it matters:** Prevents out-of-memory errors and helps you choose between single-pass, chunked, or streaming approaches.**How it works:**1. Calculates memory needed for data arrays (x, y)2. Estimates Jacobian matrix size (n_points √ó n_params)3. Accounts for JAX compilation overhead4. Recommends chunk count based on available memory

In [None]:
def demo_memory_estimation():    """Demonstrate memory estimation capabilities."""    print("=" * 60)    print("MEMORY ESTIMATION DEMO")    print("=" * 60)    # Estimate requirements for different dataset sizes    test_cases = [        (100_000, 3, "Small dataset"),        (1_000_000, 3, "Medium dataset"),        (10_000_000, 3, "Large dataset"),        (50_000_000, 3, "Very large dataset"),        (100_000_000, 3, "Extremely large dataset"),    ]    for n_points, n_params, description in test_cases:        stats = estimate_memory_requirements(n_points, n_params)        print(f"\n{description} ({n_points:,} points, {n_params} parameters):")        print(f"  Total memory estimate: {stats.total_memory_estimate_gb:.3f} GB")        print(f"  Number of chunks: {stats.n_chunks}")        # Determine strategy description        if stats.n_chunks == 1:            print("  Strategy: Single pass (fits in memory)")        elif stats.n_chunks > 1:            print(f"  Strategy: Chunked processing ({stats.n_chunks} chunks)")        # For very large datasets, suggest streaming        if n_points > 50_000_000:            print("  üí° Consider: Streaming optimization for zero accuracy loss")demo_memory_estimation()

## 2. Advanced Configuration & Algorithm Selection**Key concept:** NLSQ provides sophisticated configuration management and automatic algorithm selection for optimal performance.**Features:**- **`get_memory_config()`** - View current memory settings- **`configure_for_large_datasets()`** - Optimize settings for large data- **`auto_select_algorithm()`** - Automatically choose best optimization algorithm- **Context managers** - Temporarily change settings for specific operations

In [None]:
def demo_advanced_configuration():    """Demonstrate advanced configuration and algorithm selection."""    print("=" * 60)    print("ADVANCED CONFIGURATION & ALGORITHM SELECTION DEMO")    print("=" * 60)    # Current memory configuration    current_config = get_memory_config()    print("Current memory configuration:")    print(f"  Memory limit: {current_config.memory_limit_gb} GB")    print(        f"  Mixed precision fallback: {current_config.enable_mixed_precision_fallback}"    )    # Automatically configure for large datasets    print("\nConfiguring for large dataset processing...")    configure_for_large_datasets(memory_limit_gb=8.0, enable_chunking=True)    # Show updated configuration    new_config = get_memory_config()    print(f"Updated memory limit: {new_config.memory_limit_gb} GB")    # Generate test dataset for algorithm selection    print("\n=== Algorithm Selection Demo ===")    np.random.seed(42)    # Test different model complexities    test_cases = [        ("Simple exponential", exponential_decay, 3, [5.0, 1.2, 0.5]),        ("Polynomial", polynomial_model, 4, [0.1, -0.5, 2.0, 1.0]),        ("Complex multi-param", complex_model, 6, [3.0, 0.8, 1.5, 2.0, 0.1, 0.2]),    ]    for model_name, model_func, n_params, true_params in test_cases:        print(f"\n{model_name} ({n_params} parameters):")        # Generate sample data        n_sample = 10000  # Smaller sample for algorithm analysis        x_sample = np.linspace(0, 5, n_sample)        y_sample = model_func(x_sample, *true_params) + np.random.normal(            0, 0.05, n_sample        )        # Get algorithm recommendation        try:            recommendations = auto_select_algorithm(model_func, x_sample, y_sample)            print(f"  Recommended algorithm: {recommendations['algorithm']}")            print(f"  Recommended tolerance: {recommendations['ftol']}")            print(                f"  Problem complexity: {recommendations.get('complexity', 'Unknown')}"            )            # Estimate memory for full dataset            large_n = 1_000_000  # 1M points            stats = estimate_memory_requirements(large_n, n_params)            print(f"  Memory for 1M points: {stats.total_memory_estimate_gb:.3f} GB")            print(                f"  Chunking strategy: {'Required' if stats.n_chunks > 1 else 'Not needed'}"            )        except Exception as e:            print(f"  Algorithm selection failed: {e}")            print(f"  Using default settings for {model_name}")# Run the demodemo_advanced_configuration()

## 3. Basic Large Dataset Fitting**Key function:** `fit_large_dataset()` - Convenience function for automatic large dataset handling**Features:**- Automatic memory management- Progress reporting for long-running fits- Intelligent strategy selection (single-pass, chunked, or streaming)- Returns standard `OptimizeResult` with fitted parameters

In [None]:
def demo_basic_large_dataset_fitting():    """Demonstrate basic large dataset fitting."""    print("\n" + "=" * 60)    print("BASIC LARGE DATASET FITTING DEMO")    print("=" * 60)    # Generate synthetic large dataset (1M points)    print("Generating 1M point exponential decay dataset...")    np.random.seed(42)    n_points = 1_000_000    x_data = np.linspace(0, 5, n_points, dtype=np.float64)    true_params = [5.0, 1.2, 0.5]    noise_level = 0.05    y_true = true_params[0] * np.exp(-true_params[1] * x_data) + true_params[2]    y_data = y_true + np.random.normal(0, noise_level, n_points)    print(f"Dataset: {n_points:,} points")    print(        f"True parameters: a={true_params[0]}, b={true_params[1]}, c={true_params[2]}"    )    # Fit using convenience function    print("\nFitting with automatic memory management...")    start_time = time.time()    result = fit_large_dataset(        exponential_decay,        x_data,        y_data,        p0=[4.0, 1.0, 0.4],        memory_limit_gb=2.0,  # 2GB limit        show_progress=True,    )    fit_time = time.time() - start_time    if result.success:        fitted_params = np.array(result.popt)        errors = np.abs(fitted_params - np.array(true_params))        rel_errors = errors / np.array(true_params) * 100        print(f"\n‚úÖ Fit completed in {fit_time:.2f} seconds")        print(            f"Fitted parameters: [{fitted_params[0]:.3f}, {fitted_params[1]:.3f}, {fitted_params[2]:.3f}]"        )        print(f"Absolute errors: [{errors[0]:.4f}, {errors[1]:.4f}, {errors[2]:.4f}]")        print(            f"Relative errors: [{rel_errors[0]:.2f}%, {rel_errors[1]:.2f}%, {rel_errors[2]:.2f}%]"        )    else:        print(f"‚ùå Fit failed: {result.message}")# Run the demodemo_basic_large_dataset_fitting()

## 4. Context Managers for Temporary Configuration**Key concept:** Use context managers to temporarily change settings without affecting global state**Available contexts:**- **`memory_context(MemoryConfig)`** - Temporarily change memory settings- **`large_dataset_context(LargeDatasetConfig)`** - Optimize for large dataset processing**Why use context managers:**- Settings automatically restore after the context exits- Safe for nested operations- Allows experiment with different configurations- No risk of forgetting to restore settings

In [None]:
def demo_context_managers():    """Demonstrate context managers for temporary configuration."""    print("\n" + "=" * 60)    print("CONTEXT MANAGERS DEMO")    print("=" * 60)    # Show current configuration    original_mem_config = get_memory_config()    print(f"Original memory limit: {original_mem_config.memory_limit_gb} GB")    # Generate test data    np.random.seed(555)    n_points = 500_000    x_data = np.linspace(0, 5, n_points)    y_data = exponential_decay(x_data, 4.0, 1.5, 0.3) + np.random.normal(        0, 0.05, n_points    )    print(f"Test dataset: {n_points:,} points")    # Test 1: Memory context for memory-constrained fitting    print("\n--- Test 1: Memory-constrained fitting ---")    constrained_config = MemoryConfig(        memory_limit_gb=0.5,  # Very low limit        enable_mixed_precision_fallback=True,    )    with memory_context(constrained_config):        temp_config = get_memory_config()        print(f"Inside context memory limit: {temp_config.memory_limit_gb} GB")        print(f"Mixed precision enabled: {temp_config.enable_mixed_precision_fallback}")        start_time = time.time()        result1 = fit_large_dataset(            exponential_decay, x_data, y_data, p0=[3.5, 1.3, 0.25], show_progress=False        )        time1 = time.time() - start_time        if result1.success:            print(f"‚úÖ Constrained fit completed: {time1:.3f}s")            print(f"   Parameters: {result1.popt}")        else:            print(f"‚ùå Constrained fit failed: {result1.message}")    # Check that configuration is restored    restored_config = get_memory_config()    print(f"After context memory limit: {restored_config.memory_limit_gb} GB")    # Test 2: Large dataset context for optimized processing    print("\n--- Test 2: Large dataset optimization ---")    ld_config = LargeDatasetConfig()    with large_dataset_context(ld_config):        print("Inside large dataset context - chunking optimized")        start_time = time.time()        result2 = fit_large_dataset(            exponential_decay, x_data, y_data, p0=[3.5, 1.3, 0.25], show_progress=False        )        time2 = time.time() - start_time        if result2.success:            print(f"‚úÖ Optimized fit completed: {time2:.3f}s")            print(f"   Parameters: {result2.popt}")        else:            print(f"‚ùå Optimized fit failed: {result2.message}")    print("\n‚úì Context managers allow flexible, temporary configuration changes!")# Run the demodemo_context_managers()

## 5. Chunked Processing**Key concept:** For datasets that don't fit in memory, NLSQ automatically chunks the data and processes it in batches using an advanced exponential moving average algorithm.**How it works:**1. Dataset divided into manageable chunks based on memory limit2. Each chunk processed separately to compute partial gradient3. Gradients combined using exponential moving average4. Achieves <1% error for well-conditioned problems**When to use:**- Dataset larger than available RAM- Memory-constrained environments- Well-conditioned optimization problems

In [None]:
def demo_chunked_processing():    """Demonstrate chunked processing with progress reporting."""    print("\n" + "=" * 60)    print("CHUNKED PROCESSING DEMO")    print("=" * 60)    # Generate a dataset that will require chunking    print("Generating 2M point polynomial dataset...")    np.random.seed(123)    n_points = 2_000_000    x_data = np.linspace(-2, 2, n_points, dtype=np.float64)    true_params = [0.5, -1.2, 2.0, 1.5]    noise_level = 0.1    y_true = (        true_params[0] * x_data**3        + true_params[1] * x_data**2        + true_params[2] * x_data        + true_params[3]    )    y_data = y_true + np.random.normal(0, noise_level, n_points)    print(f"Dataset: {n_points:,} points")    print(f"True parameters: {true_params}")    # Create fitter with limited memory to force chunking    fitter = LargeDatasetFitter(memory_limit_gb=0.5)  # Small limit to force chunking    # Get processing recommendations    recs = fitter.get_memory_recommendations(n_points, 4)    print(f"\nProcessing strategy: {recs['processing_strategy']}")    print(f"Chunk size: {recs['recommendations']['chunk_size']:,}")    print(f"Number of chunks: {recs['recommendations']['n_chunks']}")    print(        f"Memory estimate: {recs['recommendations']['total_memory_estimate_gb']:.2f} GB"    )    # Fit with progress reporting    print("\nFitting with chunked processing...")    start_time = time.time()    result = fitter.fit_with_progress(        polynomial_model, x_data, y_data, p0=[0.4, -1.0, 1.8, 1.2]    )    fit_time = time.time() - start_time    if result.success:        fitted_params = np.array(result.popt)        errors = np.abs(fitted_params - np.array(true_params))        rel_errors = errors / np.abs(np.array(true_params)) * 100        print(f"\n‚úÖ Chunked fit completed in {fit_time:.2f} seconds")        if hasattr(result, "n_chunks"):            print(                f"Used {result.n_chunks} chunks with {result.success_rate:.1%} success rate"            )        print(f"Fitted parameters: {fitted_params}")        print(f"Absolute errors: {errors}")        print(f"Relative errors: {rel_errors}%")    else:        print(f"‚ùå Chunked fit failed: {result.message}")# Run the demodemo_chunked_processing()

## 6. Streaming Optimization for Unlimited Datasets**Key concept:** For datasets too large to fit in memory, NLSQ uses streaming optimization with mini-batch gradient descent. Unlike subsampling (deprecated), streaming processes **100% of data with zero accuracy loss**.**‚ö†Ô∏è Deprecation Notice:**- **Removed:** Subsampling (which caused data loss)- **Added:** Streaming optimization (processes all data)- **Deprecated:** `enable_sampling`, `sampling_threshold`, `max_sampled_size` parameters now emit warnings**How streaming works:**1. Processes data in sequential batches2. Uses mini-batch gradient descent3. No data is skipped or discarded4. Zero accuracy loss compared to full dataset processing**When to use:**- Dataset > available RAM- Unlimited or continuously generated data- When accuracy is critical

In [None]:
def demo_streaming_optimization():    """Demonstrate streaming optimization for unlimited datasets."""    print("\n" + "=" * 60)    print("STREAMING OPTIMIZATION DEMO")    print("=" * 60)    # Simulate a very large dataset scenario    print("Simulating extremely large dataset (100M points)...")    print("Using streaming optimization for zero data loss\n")    n_points_full = 100_000_000  # 100M points    true_params = [3.0, 0.8, 0.2]    # For demo purposes, generate a representative dataset    # In production, streaming would process full dataset in batches    print("Generating representative dataset for demo...")    np.random.seed(777)    n_demo = 1_000_000  # 1M points for demo    x_data = np.linspace(0, 10, n_demo)    y_data = exponential_decay(x_data, *true_params) + np.random.normal(0, 0.1, n_demo)    # Memory estimation    stats = estimate_memory_requirements(n_points_full, len(true_params))    print(f"\nFull dataset memory estimate: {stats.total_memory_estimate_gb:.1f} GB")    print(f"Number of chunks required: {stats.n_chunks}")    # Configure streaming optimization    print("\nConfiguring streaming optimization...")    config = LDMemoryConfig(        memory_limit_gb=4.0,        use_streaming=True,  # Enable streaming        streaming_batch_size=50000,  # Process 50K points per batch    )    fitter = LargeDatasetFitter(config=config)    print("\nFitting with streaming optimization...")    print("(Processing 100% of data in batches)\n")    try:        start_time = time.time()        result = fitter.fit(exponential_decay, x_data, y_data, p0=[2.5, 0.6, 0.15])        fit_time = time.time() - start_time        if result.success:            print(f"\n‚úÖ Streaming fit completed in {fit_time:.2f} seconds")            print(f"\nFitted parameters: {result.x}")            print(f"True parameters:    {true_params}")            errors = np.abs(result.x - np.array(true_params))            rel_errors = errors / np.abs(np.array(true_params)) * 100            print(f"Relative errors:    {[f'{e:.2f}%' for e in rel_errors]}")            print("\n‚ÑπÔ∏è Streaming processed 100% of data (zero accuracy loss)")        else:            print(f"‚ùå Streaming fit failed: {result.message}")    except Exception as e:        print(f"‚ùå Error during streaming fit: {e}")demo_streaming_optimization()

## 7. curve_fit_large Convenience Function**Key function:** `curve_fit_large()` provides automatic detection and handling of large datasets**Features:**- Automatic dataset size detection- Intelligent processing strategy selection- SciPy-compatible API (drop-in replacement)- Returns standard `(popt, pcov)` tuple**When to use:**- You want automatic handling of both small and large datasets- Migrating from SciPy's `curve_fit`- Don't want to manually configure chunking/streaming

In [None]:
def demo_curve_fit_large():    """Demonstrate the curve_fit_large convenience function."""    print("\n" + "=" * 60)    print("CURVE_FIT_LARGE CONVENIENCE FUNCTION DEMO")    print("=" * 60)    # Generate test dataset    print("Generating 3M point dataset for curve_fit_large demo...")    np.random.seed(789)    n_points = 3_000_000    x_data = np.linspace(0, 10, n_points, dtype=np.float64)    true_params = [5.0, 5.0, 1.5, 0.5]    y_true = gaussian(x_data, *true_params)    y_data = y_true + np.random.normal(0, 0.1, n_points)    print(f"Dataset: {n_points:,} points")    print(        f"True parameters: a={true_params[0]:.2f}, mu={true_params[1]:.2f}, sigma={true_params[2]:.2f}, offset={true_params[3]:.2f}"    )    # Use curve_fit_large - automatic large dataset handling    print("\nUsing curve_fit_large with automatic optimization...")    start_time = time.time()    popt, pcov = curve_fit_large(        gaussian,        x_data,        y_data,        p0=[4.5, 4.8, 1.3, 0.4],        memory_limit_gb=1.0,  # Force chunking with low memory limit        show_progress=True,        auto_size_detection=True,  # Automatically detect large dataset    )    fit_time = time.time() - start_time    errors = np.abs(popt - np.array(true_params))    rel_errors = errors / np.array(true_params) * 100    print(f"\n‚úÖ curve_fit_large completed in {fit_time:.2f} seconds")    print(f"Fitted parameters: {popt}")    print(f"Absolute errors: {errors}")    print(f"Relative errors: {rel_errors}%")    # Show parameter uncertainties from covariance matrix    param_std = np.sqrt(np.diag(pcov))    print(f"Parameter uncertainties (std): {param_std}")# Run the demodemo_curve_fit_large()

## 8. Performance ComparisonLet's compare different fitting approaches across various dataset sizes to understand when each strategy is most effective.

In [None]:
def compare_approaches():    """Compare different fitting approaches."""    print("\n" + "=" * 60)    print("PERFORMANCE COMPARISON")    print("=" * 60)    # Test different dataset sizes    sizes = [10_000, 100_000, 500_000]    print(f"\n{'Size':>10} {'Time (s)':>12} {'Memory (GB)':>12} {'Strategy':>20}")    print("-" * 55)    for n in sizes:        # Generate data        np.random.seed(42)        x = np.linspace(0, 10, n)        y = 2.0 * np.exp(-0.5 * x) + 0.3 + np.random.normal(0, 0.05, n)        # Get memory estimate        stats = estimate_memory_requirements(n, 3)        # Determine strategy        if stats.n_chunks == 1:            strategy = "Single chunk"        else:            strategy = f"Chunked ({stats.n_chunks} chunks)"        # Time the fit        start = time.time()        result = fit_large_dataset(            exponential_decay,            x,            y,            p0=[2.5, 0.6, 0.2],            memory_limit_gb=0.5,  # Small limit to test chunking            show_progress=False,        )        elapsed = time.time() - start        print(            f"{n:10,} {elapsed:12.3f} {stats.total_memory_estimate_gb:12.3f} {strategy:>20}"        )# Run comparisoncompare_approaches()

## üîë Key Takeaways1. **Memory estimation first:** Always use `estimate_memory_requirements()` before fitting large datasets to predict memory usage and avoid crashes.2. **Automatic is best:** Use `curve_fit_large()` for automatic optimization - it intelligently selects the best strategy (single-pass, chunked, or streaming).3. **Chunking for large data:** Chunked processing works well when dataset is larger than RAM but can be processed in batches. Achieves <1% error for well-conditioned problems.4. **Streaming for unlimited:** Use streaming optimization when dataset exceeds available memory or is continuously generated. Processes 100% of data with zero accuracy loss.5. **Context managers for flexibility:** Use `memory_context()` and `large_dataset_context()` for temporary configuration changes without affecting global settings.6. **Monitor progress:** Enable `show_progress=True` for long-running fits to track optimization progress in real-time.7. **Algorithm selection matters:** Use `auto_select_algorithm()` to automatically choose the best optimization algorithm for your specific problem.---

## ‚ö†Ô∏è Common Pitfalls**Pitfall 1: Not checking memory requirements**- **Symptom:** Out of memory errors, system crashes, or extremely slow performance- **Cause:** Dataset too large for available RAM, not using chunking/streaming- **Solution:** Always call `estimate_memory_requirements()` first to understand memory needs```python# ‚úÖ Correct approachstats = estimate_memory_requirements(n_points, n_params)if stats.n_chunks > 1:    # Use chunking or streaming    result = fit_large_dataset(func, x, y, memory_limit_gb=2.0)```**Pitfall 2: Using streaming when chunking is sufficient**- **Symptom:** Slower performance than necessary- **Cause:** Streaming uses mini-batch gradient descent which is slower than direct optimization- **Solution:** Chunking is faster when data fits in memory (even if split into chunks)```python# Choose based on memory requirementsstats = estimate_memory_requirements(n_points, n_params)if stats.total_memory_estimate_gb < available_ram_gb:    # Use chunking (faster)    result = fit_large_dataset(func, x, y, memory_limit_gb=available_ram_gb)else:    # Use streaming (handles unlimited data)    config = LDMemoryConfig(use_streaming=True)    fitter = LargeDatasetFitter(config=config)```**Pitfall 3: Forgetting to restore configuration**- **Symptom:** Global settings changed unexpectedly, affecting subsequent fits- **Cause:** Manually changing config without restoring- **Solution:** Use context managers to automatically restore settings```python# ‚ùå Wrong approachconfigure_for_large_datasets(memory_limit_gb=1.0)# ... do work ...# (forgot to restore original settings)# ‚úÖ Correct approachwith memory_context(MemoryConfig(memory_limit_gb=1.0)):    # ... do work ...# Settings automatically restored here```**Pitfall 4: Not monitoring long-running fits**- **Symptom:** Fits appear frozen, no feedback on progress- **Cause:** Not enabling progress reporting- **Solution:** Use `show_progress=True` for datasets >100K points```python# ‚úÖ Always use progress reporting for large datasetsresult = fit_large_dataset(    func, x, y,     p0=initial_guess,    show_progress=True  # Get real-time updates)```---

## üí° Best Practices1. **Start with memory estimation**   - Call `estimate_memory_requirements()` before fitting   - Plan your strategy based on the results   - Set appropriate `memory_limit_gb` for your system2. **Use automatic functions when possible**   - `curve_fit_large()` handles most cases automatically   - `fit_large_dataset()` provides explicit control when needed   - Let NLSQ choose the optimal strategy3. **Enable progress reporting**   - Use `show_progress=True` for datasets >100K points   - Monitor optimization progress for long-running fits   - Helps identify convergence issues early4. **Choose the right approach**   - **Small (<100K):** Regular `curve_fit()` is sufficient   - **Medium (100K-10M):** Use `curve_fit_large()` with chunking   - **Large (>10M):** Consider streaming optimization   - **Unlimited:** Always use streaming5. **Use context managers**   - Temporary configuration changes with automatic restoration   - Safe for nested operations   - Prevents global state pollution6. **Leverage algorithm selection**   - Use `auto_select_algorithm()` for complex models   - Let NLSQ choose optimal tolerance and algorithm   - Improves convergence for difficult problems7. **Monitor memory usage**   - Check system memory before starting   - Leave headroom (20-30%) for other processes   - Use mixed precision fallback for memory-constrained systems---

## üìä Performance Considerations**Memory usage:**- **Single-pass:** Requires `n_points √ó n_params √ó 8 bytes` for Jacobian- **Chunked:** Memory divided by number of chunks- **Streaming:** Constant memory regardless of dataset size- **Trade-off:** Memory vs accuracy (chunking has <1% error, streaming has 0% error)**Computational cost:**- **Time complexity:** O(n √ó m) where n = points, m = parameters- **JAX compilation:** First fit is slow (~1-5s), subsequent fits are fast- **GPU acceleration:** 150-270x speedup for large datasets (>1M points)- **Chunking overhead:** Minimal (<5%) for well-conditioned problems**Scaling behavior:**- **Linear scaling:** Fit time scales linearly with dataset size- **GPU advantage:** Increases with dataset size (more parallelism)- **Memory scaling:** O(n √ó m) for Jacobian matrix- **Chunking efficiency:** >95% accuracy retention for most problems**Trade-offs:**| Approach | Speed | Memory | Accuracy | Best For ||----------|-------|--------|----------|----------|| Single-pass | Fastest | High | 100% | Fits in RAM || Chunked | Fast | Medium | >99% | Larger than RAM || Streaming | Moderate | Low | 100% | Unlimited size |**Optimization tips:**1. Use GPU when available (automatic in JAX)2. Set `memory_limit_gb` to 70-80% of available RAM3. Enable mixed precision fallback for memory-constrained systems4. Use `auto_select_algorithm()` for complex models5. Reuse `CurveFit` objects to avoid recompilation---

## ‚ùì Common Questions**Q: How do I know if I need chunking vs streaming?**A: Use `estimate_memory_requirements()`. If `n_chunks > 1` but the total memory estimate is less than your available RAM, use chunking (faster). If dataset exceeds available memory, use streaming (handles unlimited data).**Q: What's the accuracy trade-off with chunking?**A: NLSQ's advanced chunking algorithm (exponential moving average) achieves <1% error for well-conditioned problems. For ill-conditioned problems or when accuracy is critical, use streaming for zero accuracy loss.**Q: Why is my first fit slow?**A: JAX compiles functions on first use (JIT compilation). Subsequent fits with the same function signature reuse the compiled code and run 100-300x faster.**Q: Can I use large dataset features on a GPU?**A: Yes! JAX automatically uses GPU when available. Large dataset features work seamlessly on both CPU and GPU, with GPU providing additional 2-5x speedup.**Q: What if my dataset doesn't fit in RAM at all?**A: Use streaming optimization with `LDMemoryConfig(use_streaming=True)`. Streaming processes data in batches and can handle unlimited dataset sizes with zero accuracy loss.**Q: How do I monitor long-running fits?**A: Set `show_progress=True` when calling `fit_large_dataset()` or `curve_fit_large()`. This provides real-time progress updates showing iteration count and current objective value.**Q: Should I always use `curve_fit_large()` instead of `curve_fit()`?**A: For small datasets (<100K points), regular `curve_fit()` is simpler and equally fast. Use `curve_fit_large()` when you have >100K points or want automatic dataset size detection.[Complete FAQ](../../docs/faq.md)---

## üîó Related Resources**Build on this knowledge:**- [GPU Optimization Deep Dive](../03_advanced/gpu_optimization_deep_dive.ipynb) - Maximize GPU performance- [Performance Optimization Demo](performance_optimization_demo.ipynb) - General optimization strategies- [Streaming Tutorials](../06_streaming/) - Production streaming workflows**Alternative approaches:**- [NLSQ Quickstart](../01_getting_started/nlsq_quickstart.ipynb) - For small datasets (<100K points)- [Custom Algorithms Advanced](../03_advanced/custom_algorithms_advanced.ipynb) - When standard algorithms don't converge**Feature demos:**- [Callbacks Demo](../05_feature_demos/callbacks_demo.ipynb) - Monitor optimization progress- [Enhanced Error Messages](../05_feature_demos/enhanced_error_messages_demo.ipynb) - Debug fitting issues**References:**- [API Documentation - Large Dataset Functions](https://nlsq.readthedocs.io/en/latest/api.html#large-dataset-fitting)- [Memory Management Guide](https://nlsq.readthedocs.io/en/latest/guides/memory.html)- [Performance Benchmarks](https://nlsq.readthedocs.io/en/latest/benchmarks.html)---

## üìö Technical Glossary**Chunking:** Dividing a large dataset into smaller batches that fit in memory, processing each batch separately, and combining results using an exponential moving average algorithm.**Streaming optimization:** Processing data in sequential batches using mini-batch gradient descent. Handles unlimited dataset sizes with zero accuracy loss.**Memory estimation:** Predicting memory requirements before fitting by calculating data array sizes, Jacobian matrix size, and JAX compilation overhead.**Exponential moving average (EMA):** Algorithm used in chunking to combine gradients from different chunks with decaying weights, achieving <1% error for well-conditioned problems.**JIT compilation:** Just-In-Time compilation by JAX that converts Python functions to optimized machine code on first use. Subsequent calls reuse the compiled code for 100-300x speedup.**Context manager:** Python construct (`with` statement) that automatically manages resource setup and cleanup, used for temporary configuration changes.**Well-conditioned problem:** Optimization problem where the objective function is smooth, has a clear minimum, and small parameter changes lead to proportional objective changes.**Ill-conditioned problem:** Optimization problem with steep gradients, multiple local minima, or high sensitivity to parameter changes. Benefits from streaming (zero accuracy loss) over chunking.**Auto-detection:** NLSQ feature that automatically detects dataset size and chooses optimal processing strategy (single-pass, chunked, or streaming).**Mixed precision fallback:** Memory optimization technique that uses float32 instead of float64 when memory is constrained, trading slight accuracy for 50% memory reduction.[Complete glossary](../../docs/glossary.md)