# 📘 Performance Optimization: Maximize NLSQ Speed⏱️ **25-35 minutes** | 📊 **Level: ●●○ Intermediate** | 🏷️ **Performance**---

## 🎯 What You'll Learn- ✓ **Profile** curve fitting performance- ✓ **Optimize** JAX compilation and JIT- ✓ **Leverage** GPU acceleration- ✓ **Minimize** memory overhead- ✓ **Apply** best practices for speed---

## Setup

In [None]:
import timeimport jaximport jax.numpy as jnpimport numpy as npfrom nlsq import CurveFit, curve_fitdevices = jax.devices()print(f"Device: {devices[0].platform}")

## 1. JAX Compilation Basics**Key insight:** First fit is slow (compilation), subsequent fits are fast

In [None]:
def exponential(x, a, b, c):    return a * jnp.exp(-b * x) + cx = np.linspace(0, 10, 1000)y = 5 * np.exp(-0.5 * x) + 2 + np.random.normal(0, 0.1, 1000)# First fit (slow - compilation)start = time.time()popt1, _ = curve_fit(exponential, x, y, p0=[4, 0.4, 1.5])t1 = time.time() - start# Second fit (fast - reuses compiled code)start = time.time()popt2, _ = curve_fit(exponential, x, y, p0=[4, 0.4, 1.5])t2 = time.time() - startprint(f"First fit:  {t1:.4f}s (with compilation)")print(f"Second fit: {t2:.4f}s (compiled)")print(f"Speedup: {t1/t2:.1f}x")

## 2. Reusing CurveFit Objects**Best practice:** Reuse CurveFit objects to avoid recompilation

In [None]:
# Create reusable fitterjcf = CurveFit()# Multiple fits reuse compiled functionstimes = []for _ in range(10):    start = time.time()    popt, _ = jcf.curve_fit(exponential, x, y, p0=[4, 0.4, 1.5])    times.append(time.time() - start)print(f"Average fit time: {np.mean(times[1:]):.4f}s")print(f"✓ Reusing CurveFit object eliminates compilation overhead")

## 3. Fixed Array Size**Issue:** Different array sizes trigger recompilation**Solution:** Use `flength` parameter

In [None]:
# Without fixed length (recompiles for each size)jcf_dynamic = CurveFit()sizes = [100, 200, 300]for n in sizes:    x_test = np.linspace(0, 10, n)    y_test = 5 * np.exp(-0.5 * x_test) + 2    start = time.time()    popt, _ = jcf_dynamic.curve_fit(exponential, x_test, y_test, p0=[4, 0.4, 1.5])    elapsed = time.time() - start    print(f"Size {n}: {elapsed:.4f}s")print("\n⚠️ Each size triggers recompilation")

## 4. GPU vs CPU Performance**GPU advantage:** Increases with dataset size

In [None]:
if devices[0].platform == 'gpu':    print("🚀 GPU Performance Test")    sizes = [1000, 10000, 100000]    for n in sizes:        x_large = np.linspace(0, 10, n)        y_large = 5 * np.exp(-0.5 * x_large) + 2 + np.random.normal(0, 0.1, n)        start = time.time()        popt, _ = curve_fit(exponential, x_large, y_large, p0=[4, 0.4, 1.5])        elapsed = time.time() - start        print(f"  {n:>6} points: {elapsed:.4f}s")else:    print("💻 CPU mode - GPU would provide 100-300x speedup")

## 🔑 Key Takeaways1. **Reuse CurveFit objects** to avoid recompilation2. **Use fixed array size** when processing multiple datasets3. **GPU acceleration** scales with dataset size4. **Warmup first fit** to compile functions5. **Profile before optimizing** - measure actual bottlenecks---

## 🔗 Next Steps- [Large Dataset Demo](large_dataset_demo.ipynb) - Memory optimization- [GPU Deep Dive](../03_advanced/gpu_optimization_deep_dive.ipynb) - Advanced GPU techniques- [Advanced Features](advanced_features_demo.ipynb) - Callbacks, robust fitting---