# Distributed Sensor Analytics with PartitionedArray

This notebook demonstrates distributed data analytics using HPXPy's `PartitionedArray`, which distributes data across multiple HPX localities (processes) using `hpx::partitioned_vector`.

When run with multiple localities, data is physically partitioned across processes. Each locality holds a portion of the array and performs local computation. Distributed reductions (sum, mean, min, max, var, std) automatically combine results across all localities.

**To run:** Execute all cells. The worker script is launched across multiple localities automatically.

In [None]:
%%writefile _distributed_analytics_worker.py
"""Distributed sensor analytics using PartitionedArray.

This script runs on each HPX locality. The PartitionedArray distributes
data across all localities automatically.
"""
import sys
import numpy as np
import hpxpy as hpx
from hpxpy.launcher import init_from_args

# Initialize HPX with launcher-provided configuration
init_from_args()

my_id = hpx.locality_id()
num_locs = hpx.num_localities()
print(f"[Locality {my_id}/{num_locs}] Started with {hpx.num_threads()} threads")

# --- Create distributed sensor data ---
n_sensors = 100_000

# PartitionedArray distributes data across localities
temperatures = hpx.partitioned_from_numpy(
    20.0 + np.random.RandomState(42).randn(n_sensors) * 5.0
)
humidity = hpx.partitioned_from_numpy(
    60.0 + np.random.RandomState(43).randn(n_sensors) * 10.0
)

print(f"[Locality {my_id}] Created PartitionedArrays:")
print(f"  Temperature: {temperatures.num_partitions} partitions, distributed={temperatures.is_distributed}")
print(f"  Humidity:    {humidity.num_partitions} partitions, distributed={humidity.is_distributed}")

# --- Distributed reductions ---
# These automatically combine across all localities
temp_sum = hpx.distributed_sum(temperatures)
temp_mean = hpx.distributed_mean(temperatures)
temp_min = hpx.distributed_min(temperatures)
temp_max = hpx.distributed_max(temperatures)
temp_std = hpx.distributed_std(temperatures)

humid_mean = hpx.distributed_mean(humidity)
humid_std = hpx.distributed_std(humidity)

if my_id == 0:
    print(f"\n--- Distributed Reduction Results (computed across {num_locs} localities) ---")
    print(f"  Temperature: mean={temp_mean:.2f}, std={temp_std:.2f}, min={temp_min:.2f}, max={temp_max:.2f}")
    print(f"  Humidity:    mean={humid_mean:.2f}, std={humid_std:.2f}")
    print(f"  Total sensor readings: {n_sensors:,}")

# Synchronize before exit
hpx.barrier("done")
if my_id == 0:
    print("\nAll localities completed successfully.")
hpx.finalize()

## Launch Distributed Execution

The cell below spawns multiple HPX localities (processes) connected via TCP. Each locality runs the worker script above. Data created with `partitioned_from_numpy` is automatically distributed across all localities.

In [None]:
from hpxpy.launcher import launch_localities

launch_localities(
    "_distributed_analytics_worker.py",
    num_localities=2,
    threads_per_locality=2,
    verbose=True,
)

## How PartitionedArray Works

`PartitionedArray` wraps HPX's `hpx::partitioned_vector`, which physically distributes data across localities:

```
Locality 0                    Locality 1
┌────────────────────┐       ┌────────────────────┐
│ Partition 0        │       │ Partition 1        │
│ temps[0:50000]     │       │ temps[50000:100000]│
│ humid[0:50000]     │       │ humid[50000:100000]│
└────────────────────┘       └────────────────────┘
         │                            │
         └──────────┬─────────────────┘
                    ▼
        Distributed Reduction
        (sum, mean, min, max, std)
        → combines partial results
          from all localities
```

### Key API

| Function | Description |
|----------|-------------|
| `partitioned_from_numpy(arr)` | Create distributed array from NumPy |
| `partitioned_arange(start, stop)` | Create distributed range |
| `partitioned_zeros(shape)` | Create distributed zeros |
| `distributed_sum(arr)` | Sum across all localities |
| `distributed_mean(arr)` | Mean across all localities |
| `distributed_min(arr)` | Min across all localities |
| `distributed_max(arr)` | Max across all localities |
| `distributed_std(arr)` | Std dev across all localities |
| `distributed_var(arr)` | Variance across all localities |

### Single vs Multi-Locality

- **Single locality**: PartitionedArray behaves like a regular array
- **Multi-locality**: Data is physically split across processes, reductions combine automatically

In [None]:
import os
os.remove("_distributed_analytics_worker.py")
print("Cleaned up worker script.")