# Unified fit() Entry Point

> Learn to use NLSQ's unified `fit()` function for automatic workflow selection

**15 minutes** | **Level: Beginner**

---

## What You'll Learn

By the end of this notebook, you will be able to:

- Use the `fit()` function with automatic workflow selection
- Apply preset configurations like `preset="robust"` and `preset="fast"`
- Configure `fit()` with `WorkflowConfig` for custom workflows
- Understand when to use `fit()`, `curve_fit()`, and `curve_fit_large()`

---

## Learning Path

**You are here:** Workflow System > **Unified fit() Entry Point**

```
Getting Started --> [You are here: fit() Quickstart] --> Workflow Tiers --> Presets
```

**Recommended flow:**
- **Next:** [02_workflow_tiers.ipynb](02_workflow_tiers.ipynb) - Learn about STANDARD, CHUNKED, STREAMING tiers

**Alternative paths:**
- Want global optimization? Go to [../07_global_optimization/01_multistart_basics.ipynb](../07_global_optimization/01_multistart_basics.ipynb)
- Need domain examples? Go to [../04_gallery/](../04_gallery/)

---

## Before You Begin

**Required knowledge:**
- Basic Python and NumPy
- Familiarity with curve fitting concepts

**Required software:**
- NLSQ >= 0.3.4
- Python >= 3.12

**First time with NLSQ?** Start here: [NLSQ Quickstart](../01_getting_started/nlsq_quickstart.ipynb)

---

## Why This Matters

NLSQ provides three main APIs for curve fitting:
- `curve_fit()` - Standard fitting for small datasets
- `curve_fit_large()` - Memory-managed fitting for large datasets
- `fit()` - **Unified entry point that automatically selects the best approach**

The `fit()` function simplifies your workflow by:
- Automatically detecting dataset size and selecting appropriate strategy
- Providing preset configurations for common use cases
- Offering a consistent API regardless of dataset size

**Common use cases:**
- Rapid prototyping with sensible defaults
- Production code that handles varying dataset sizes
- Switching between speed-optimized and accuracy-optimized workflows

---

## Quick Start (30 seconds)

See NLSQ's `fit()` in action with this minimal example:

In [None]:
# Configure matplotlib for inline plotting (MUST come before imports)
%matplotlib inline

In [None]:
import numpy as np
import jax.numpy as jnp
from nlsq import fit

# Define model and generate data
def model(x, a, b): return a * jnp.exp(-b * x)
x = np.linspace(0, 5, 100)
y = 2.5 * np.exp(-1.3 * x) + 0.1 * np.random.randn(100)

# Fit with automatic workflow selection
popt, pcov = fit(model, x, y, p0=[1, 1])
print(f"Fitted parameters: a={popt[0]:.3f}, b={popt[1]:.3f}")

If you see fitted parameters close to `a=2.5, b=1.3`, you're ready to continue!

---

## Setup

In [None]:
import numpy as np
import jax.numpy as jnp
import matplotlib.pyplot as plt

from nlsq import fit, curve_fit, curve_fit_large
from nlsq import WorkflowConfig, OptimizationGoal

# Set random seed for reproducibility
np.random.seed(42)

---

## Tutorial Content

### Section 1: Basic fit() Usage

The `fit()` function provides a unified interface for curve fitting. It automatically
selects the appropriate backend based on dataset size.

In [None]:
# Define an exponential decay model
def exponential_decay(x, a, b, c):
    """Exponential decay: y = a * exp(-b * x) + c"""
    return a * jnp.exp(-b * x) + c

In [None]:
# Generate synthetic data
n_samples = 500
x_data = np.linspace(0, 5, n_samples)

# True parameters
true_a, true_b, true_c = 3.0, 1.2, 0.5

# Generate noisy observations
y_true = true_a * np.exp(-true_b * x_data) + true_c
noise = 0.15 * np.random.randn(n_samples)
y_data = y_true + noise

print(f"True parameters: a={true_a}, b={true_b}, c={true_c}")
print(f"Dataset size: {n_samples} points")

In [None]:
# Basic fit() - auto-selects workflow
popt, pcov = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],  # Initial guess
)

print("\nfit() with automatic workflow selection:")
print(f"  Fitted: a={popt[0]:.4f}, b={popt[1]:.4f}, c={popt[2]:.4f}")
print(f"  True:   a={true_a:.4f}, b={true_b:.4f}, c={true_c:.4f}")

### Section 2: Using Presets

The `fit()` function supports preset configurations for common use cases:

| Preset | Description | Multi-start | Use Case |
|--------|-------------|-------------|----------|
| `fast` | Maximum speed, single-start | No | Quick exploration |
| `robust` | Multi-start with 5 starts | Yes | Production use |
| `global` | Thorough search with 20 starts | Yes | Complex problems |
| `streaming` | For large datasets | Yes | Big data |
| `large` | Auto-detect large datasets | Yes | Variable sizes |

In [None]:
# Define bounds for constrained optimization
bounds = ([0.1, 0.1, -1.0], [10.0, 5.0, 2.0])

In [None]:
# Preset: 'fast' - single-start for maximum speed
popt_fast, _ = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
    preset="fast",
)

print("preset='fast':")
print(f"  Fitted: a={popt_fast[0]:.4f}, b={popt_fast[1]:.4f}, c={popt_fast[2]:.4f}")

In [None]:
# Preset: 'robust' - multi-start with 5 starting points
popt_robust, _ = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
    preset="robust",
)

print("\npreset='robust':")
print(f"  Fitted: a={popt_robust[0]:.4f}, b={popt_robust[1]:.4f}, c={popt_robust[2]:.4f}")

In [None]:
# Preset: 'global' - thorough global search with 20 starts
popt_global, _ = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
    preset="global",
)

print("\npreset='global':")
print(f"  Fitted: a={popt_global[0]:.4f}, b={popt_global[1]:.4f}, c={popt_global[2]:.4f}")

### Section 3: Using WorkflowConfig

For more control, you can create a `WorkflowConfig` object to customize the workflow.

In [None]:
# Create a custom WorkflowConfig
config = WorkflowConfig(
    goal=OptimizationGoal.QUALITY,  # Prioritize accuracy
    enable_multistart=True,
    n_starts=15,
    sampler="lhs",
)

print("WorkflowConfig:")
print(f"  goal: {config.goal}")
print(f"  enable_multistart: {config.enable_multistart}")
print(f"  n_starts: {config.n_starts}")
print(f"  sampler: {config.sampler}")

In [None]:
# Using WorkflowConfig with fit() - pass multistart parameters directly
popt_custom, _ = fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
    multistart=True,
    n_starts=15,
    sampler="lhs",
)

print("\nCustom configuration result:")
print(f"  Fitted: a={popt_custom[0]:.4f}, b={popt_custom[1]:.4f}, c={popt_custom[2]:.4f}")

### Section 4: Comparison with curve_fit() and curve_fit_large()

Let's compare `fit()` with the lower-level APIs to understand when to use each.

In [None]:
# Using curve_fit() - standard API
popt_cf, pcov_cf = curve_fit(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
)

print("curve_fit():")
print(f"  Fitted: a={popt_cf[0]:.4f}, b={popt_cf[1]:.4f}, c={popt_cf[2]:.4f}")

In [None]:
# Using curve_fit_large() - auto-detects and falls back to curve_fit for small datasets
# For datasets > 1M points, it uses chunked processing
popt_cfl, pcov_cfl = curve_fit_large(
    exponential_decay,
    x_data,
    y_data,
    p0=[1.0, 1.0, 0.0],
    bounds=bounds,
)

print("\ncurve_fit_large():")
print(f"  Fitted: a={popt_cfl[0]:.4f}, b={popt_cfl[1]:.4f}, c={popt_cfl[2]:.4f}")

### When to Use Each API

| API | Best For | Dataset Size |
|-----|----------|-------------|
| `fit()` | General use, automatic selection | Any |
| `curve_fit()` | Full control, SciPy compatibility | < 1M points |
| `curve_fit_large()` | Explicit large dataset handling | > 1M points |

In [None]:
# Visualize the fits
y_pred = exponential_decay(x_data, *popt)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Left: Data and fit
ax1 = axes[0]
ax1.scatter(x_data, y_data, alpha=0.4, s=10, label="Data")
ax1.plot(x_data, y_true, "k--", linewidth=2, label="True function")
ax1.plot(x_data, y_pred, "r-", linewidth=2, label="fit() result")
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax1.set_title("Exponential Decay Fit")
ax1.legend()

# Right: Residuals
ax2 = axes[1]
residuals = y_data - y_pred
ax2.scatter(x_data, residuals, alpha=0.5, s=10)
ax2.axhline(y=0, color="k", linestyle="--", alpha=0.5)
ax2.set_xlabel("x")
ax2.set_ylabel("Residual")
ax2.set_title("Residuals")

plt.tight_layout()
plt.savefig("figures/01_fit_result.png", dpi=300, bbox_inches="tight")
plt.show()

---

## Key Takeaways

After completing this notebook, remember:

1. **`fit()` is the unified entry point:** It automatically selects the best workflow based on dataset size and configuration.

2. **Presets simplify configuration:** Use `preset="fast"` for speed, `preset="robust"` for production, or `preset="global"` for complex problems.

3. **`WorkflowConfig` provides full control:** Create custom configurations with `OptimizationGoal`, multi-start settings, and more.

4. **Choose the right API:**
   - `fit()` - General use, automatic selection
   - `curve_fit()` - SciPy compatibility, full control
   - `curve_fit_large()` - Explicit large dataset handling

---

## Common Questions

**Q: When should I use `fit()` vs `curve_fit()`?**

A: Use `fit()` when you want automatic workflow selection and preset configurations. Use `curve_fit()` when you need SciPy compatibility or want explicit control over all parameters.

**Q: What's the default preset?**

A: The default is `preset="fast"` for small datasets (< 1M points) and `preset="large"` for datasets exceeding the size threshold.

**Q: Can I combine presets with custom parameters?**

A: Yes! Custom parameters override preset defaults. For example, `fit(..., preset="robust", n_starts=10)` uses the robust preset but with 10 starts instead of 5.

---

## Related Resources

**Next steps:**
- [02_workflow_tiers.ipynb](02_workflow_tiers.ipynb) - Learn about STANDARD, CHUNKED, STREAMING tiers
- [04_workflow_presets.ipynb](04_workflow_presets.ipynb) - Explore all available presets

**Further reading:**
- [API Documentation](https://nlsq.readthedocs.io/)
- [GitHub Repository](https://github.com/imewei/NLSQ)

**Need help?**
- [Discussions](https://github.com/imewei/NLSQ/discussions)
- [Report issues](https://github.com/imewei/NLSQ/issues)

---

## Glossary

**Preset:** A named configuration that sets multiple parameters at once (e.g., 'fast', 'robust', 'global').

**Workflow:** A processing strategy for curve fitting (STANDARD, CHUNKED, STREAMING).

**Multi-start:** An optimization technique that evaluates multiple starting points to find the global optimum.

In [None]:
# Final summary
print("Summary")
print("=" * 50)
print(f"True parameters: a={true_a}, b={true_b}, c={true_c}")
print()
print("Results from different approaches:")
print(f"  fit() auto:     a={popt[0]:.4f}, b={popt[1]:.4f}, c={popt[2]:.4f}")
print(f"  preset='fast':  a={popt_fast[0]:.4f}, b={popt_fast[1]:.4f}, c={popt_fast[2]:.4f}")
print(f"  preset='robust':a={popt_robust[0]:.4f}, b={popt_robust[1]:.4f}, c={popt_robust[2]:.4f}")
print(f"  curve_fit():    a={popt_cf[0]:.4f}, b={popt_cf[1]:.4f}, c={popt_cf[2]:.4f}")