# HPXPy Parallel Numerical Integration

This notebook computes definite integrals using parallel evaluation, demonstrating strong scaling for compute-intensive workloads.

## The Function

We integrate: $f(x) = \sin(x) \cdot e^{-x^2} \cdot \cos(10x)$

This is a complex oscillating function that requires many evaluations, making it ideal for demonstrating parallel speedup.

**Note:** For proper scalability testing with multiple thread counts, run the `parallel_integration_demo.py` script instead.

In [None]:
import time
import numpy as np
import hpxpy as hpx

hpx.init(num_threads=4)

## NumPy Reference Implementation

In [None]:
def numpy_integration(n_points, a=-5.0, b=5.0):
    """NumPy reference implementation."""
    dx = (b - a) / n_points
    x = np.linspace(a + dx/2, b - dx/2, n_points)
    
    start = time.perf_counter()
    f = np.sin(x) * np.exp(-x**2) * np.cos(10*x)
    integral = np.sum(f) * dx
    elapsed = time.perf_counter() - start
    
    return elapsed, integral

## HPXPy Integration

In [None]:
def hpxpy_integration(n_points, a=-5.0, b=5.0):
    """HPXPy parallel integration."""
    dx = (b - a) / n_points
    
    # Create evaluation points
    x_np = np.linspace(a + dx/2, b - dx/2, n_points)
    x = hpx.from_numpy(x_np)
    
    start = time.perf_counter()
    
    # Evaluate: f(x) = sin(x) * exp(-x²) * cos(10x)
    sin_x = hpx.sin(x)
    exp_neg_x2 = hpx.exp(-(x * x))
    cos_10x = hpx.cos(x * 10)
    
    f = sin_x * exp_neg_x2 * cos_10x
    
    # Riemann sum
    integral = float(hpx.sum(f)) * dx
    
    elapsed = time.perf_counter() - start
    
    return elapsed, integral

## Benchmark: NumPy vs HPXPy

In [None]:
# Warm up
_ = hpxpy_integration(1000)

n_points = 50_000_000

print("=" * 60)
print("HPXPy Parallel Numerical Integration")
print("Integrating: f(x) = sin(x) * exp(-x²) * cos(10x) over [-5, 5]")
print("=" * 60)
print(f"\nIntegration points: {n_points:,}")

# NumPy
np_time, np_integral = numpy_integration(n_points)
print(f"\nNumPy:")
print(f"  Time: {np_time*1000:.2f} ms")
print(f"  Result: {np_integral:.10f}")

# HPXPy
hpx_time, hpx_integral = hpxpy_integration(n_points)
print(f"\nHPXPy (4 threads):")
print(f"  Time: {hpx_time*1000:.2f} ms")
print(f"  Result: {hpx_integral:.10f}")

print(f"\nSpeedup: {np_time/hpx_time:.2f}x")
print(f"Results match: {abs(np_integral - hpx_integral) < 1e-10}")

## Scaling with Problem Size

In [None]:
sizes = [1_000_000, 10_000_000, 50_000_000, 100_000_000]

print(f"\n{'Points':>15} | {'NumPy (ms)':>12} | {'HPXPy (ms)':>12} | {'Speedup':>10}")
print("-" * 60)

for n_pts in sizes:
    np_t, _ = numpy_integration(n_pts)
    hpx_t, _ = hpxpy_integration(n_pts)
    speedup = np_t / hpx_t
    print(f"{n_pts:>15,} | {np_t*1000:>12.2f} | {hpx_t*1000:>12.2f} | {speedup:>9.2f}x")

## Distributed Computing Projection

Numerical integration is embarrassingly parallel - perfect for distribution:

### 1. No Communication Required
- Each locality evaluates f(x) on its portion independently
- Only final sum needs global reduction
- Perfect weak scaling expected

### 2. Distribution Strategy
- Block distribution: locality i evaluates x in `[a + i*(b-a)/N, a + (i+1)*(b-a)/N]`
- Each locality computes partial sum
- Single reduction at the end

### 3. Expected Distributed Performance

| Localities | Points/Locality | Communication | Expected Speedup |
|------------|-----------------|---------------|------------------|
| 1 | 50,000,000 | 0 | 1x |
| 4 | 12,500,000 | 1 reduce | ~4x |
| 16 | 3,125,000 | 1 reduce | ~16x |
| 64 | 781,250 | 1 reduce | ~64x |
| 256 | 195,312 | 1 reduce | ~200x+ |

### 4. HPXPy Future API

```python
# Distribute work across all localities
x = hpx.linspace(a, b, n_points, distribution='block')
f = hpx.sin(x) * hpx.exp(-x*x) * hpx.cos(10*x)
integral = hpx.sum(f) * dx  # Automatic distributed reduction
```

In [None]:
hpx.finalize()
print("Demo complete!")