# Distributed Computing with HPXPy

HPXPy supports distributed computing across multiple processes (localities). This tutorial covers:

- Collective operations (all_reduce, broadcast, gather, scatter)
- Distributed arrays
- Distribution policies
- Multi-locality concepts

**Note**: In single-locality mode (one process), collective operations return sensible defaults. The full power of these features is realized when running across multiple localities.

In [1]:
import hpxpy as hpx
import numpy as np

# Initialize HPX runtime
hpx.init()

print(f"HPXPy version: {hpx.__version__}")
print(f"Number of localities: {hpx.num_localities()}")
print(f"Current locality ID: {hpx.locality_id()}")
print(f"Number of threads: {hpx.num_threads()}")

HPXPy version: 0.1.0
Number of localities: 1
Current locality ID: 0
Number of threads: 12


## 1. Collective Operations

Collective operations are communication patterns that involve all localities in a distributed computation.

### All-Reduce

Combines values from all localities and distributes the result to everyone.

```
Locality 0: [1, 2, 3]  ─┐
Locality 1: [4, 5, 6]  ─┼─► all_reduce(sum) ─► [5, 7, 9] (to all)
Locality 2: [0, 0, 0]  ─┘
```

In [2]:
# Create local data
local_data = hpx.array([1.0, 2.0, 3.0, 4.0, 5.0])
print("Local data:", local_data.to_numpy())

# All-reduce with different operations
print("\nAll-reduce operations:")
print("  sum:", hpx.all_reduce(local_data, op='sum').to_numpy())
print("  prod:", hpx.all_reduce(local_data, op='prod').to_numpy())
print("  min:", hpx.all_reduce(local_data, op='min').to_numpy())
print("  max:", hpx.all_reduce(local_data, op='max').to_numpy())

Local data: [1. 2. 3. 4. 5.]

All-reduce operations:
  sum: [1. 2. 3. 4. 5.]
  prod: [1. 2. 3. 4. 5.]
  min: [1. 2. 3. 4. 5.]
  max: [1. 2. 3. 4. 5.]


In [3]:
# Practical example: Global statistics
# In a distributed setting, each locality would have different local_stats
local_stats = hpx.array([100.0, 50.0, 200.0])  # [count, min, max]

# Combine across localities
global_count = hpx.all_reduce(hpx.array([local_stats[0]]), op='sum')
global_min = hpx.all_reduce(hpx.array([local_stats[1]]), op='min')
global_max = hpx.all_reduce(hpx.array([local_stats[2]]), op='max')

print("Global statistics:")
print(f"  Total count: {global_count.to_numpy()[0]}")
print(f"  Global min: {global_min.to_numpy()[0]}")
print(f"  Global max: {global_max.to_numpy()[0]}")

Global statistics:
  Total count: 100.0
  Global min: 50.0
  Global max: 200.0


### Broadcast

Sends data from one locality (root) to all other localities.

```
Locality 0: [1, 2, 3]  ─── broadcast(root=0) ───► [1, 2, 3] (to all)
Locality 1: [?, ?, ?]  ─────────────────────────► [1, 2, 3]
Locality 2: [?, ?, ?]  ─────────────────────────► [1, 2, 3]
```

In [4]:
# Root locality has the data to share
if hpx.locality_id() == 0:
    config_data = hpx.array([3.14159, 2.71828, 1.41421])
else:
    config_data = hpx.zeros(3)  # Placeholder on other localities

print("Before broadcast:", config_data.to_numpy())

# Broadcast from root=0
shared_config = hpx.broadcast(config_data, root=0)
print("After broadcast:", shared_config.to_numpy())

Before broadcast: [3.14159 2.71828 1.41421]
After broadcast: [3.14159 2.71828 1.41421]


### Gather

Collects data from all localities to a single root locality.

```
Locality 0: [1, 2]  ─┐
Locality 1: [3, 4]  ─┼─► gather(root=0) ─► [[1,2], [3,4], [5,6]] (on root)
Locality 2: [5, 6]  ─┘
```

In [5]:
# Each locality contributes its data
my_contribution = hpx.array([hpx.locality_id() * 10 + i for i in range(3)])
print(f"My contribution (locality {hpx.locality_id()}):", my_contribution.to_numpy())

# Gather to root
all_data = hpx.gather(my_contribution, root=0)

if hpx.locality_id() == 0:
    print(f"\nGathered {len(all_data)} arrays at root:")
    for i, arr in enumerate(all_data):
        print(f"  From locality {i}: {arr}")

My contribution (locality 0): [0 1 2]

Gathered 1 arrays at root:
  From locality 0: [0 1 2]


### Scatter

Distributes portions of data from root to all localities.

```
Locality 0: [1,2,3,4,5,6] ─► scatter(root=0) ─► [1,2] (to loc 0)
                                             ─► [3,4] (to loc 1)
                                             ─► [5,6] (to loc 2)
```

In [6]:
# Root has all the data
if hpx.locality_id() == 0:
    full_data = hpx.arange(12)  # [0, 1, 2, ..., 11]
    print("Full data on root:", full_data.to_numpy())
else:
    full_data = hpx.zeros(12)  # Placeholder

# Scatter distributes chunks
my_chunk = hpx.scatter(full_data, root=0)
print(f"My chunk (locality {hpx.locality_id()}):", my_chunk.to_numpy())

Full data on root: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]
My chunk (locality 0): [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]


### Barrier

Synchronizes all localities - everyone waits until all reach the barrier.

In [7]:
print(f"Locality {hpx.locality_id()} starting computation...")

# Simulate some work
result = hpx.sum(hpx.arange(1000000))

# Synchronize before continuing
hpx.barrier("computation_done")

print(f"Locality {hpx.locality_id()} passed barrier with result: {result}")

Locality 0 starting computation...
Locality 0 passed barrier with result: 499999500000.0


### Collectives Module

Additional locality functions are available in the `hpx.collectives` module.

In [8]:
# Access locality information via collectives module
print("Via hpx.collectives module:")
print(f"  Number of localities: {hpx.collectives.get_num_localities()}")
print(f"  Current locality ID: {hpx.collectives.get_locality_id()}")

Via hpx.collectives module:
  Number of localities: 1
  Current locality ID: 0


## 2. Distributed Arrays

Distributed arrays span multiple localities with automatic data distribution.

In [9]:
# Create distributed arrays
d_zeros = hpx.distributed_zeros([100])
d_ones = hpx.distributed_ones([50, 2])
d_full = hpx.distributed_full([20], 3.14)

print("Distributed zeros:")
print(f"  Shape: {d_zeros.shape}")
print(f"  Size: {d_zeros.size}")
print(f"  First 10: {d_zeros.to_numpy()[:10]}")

print("\nDistributed ones:")
print(f"  Shape: {d_ones.shape}")
print(f"  First row: {d_ones.to_numpy()[0]}")

print("\nDistributed full (3.14):")
print(f"  Values: {d_full.to_numpy()}")

Distributed zeros:
  Shape: [100]
  Size: 100
  First 10: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Distributed ones:
  Shape: [50, 2]
  First row: [1. 1.]

Distributed full (3.14):
  Values: [3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14 3.14
 3.14 3.14 3.14 3.14 3.14 3.14]


In [10]:
# Create distributed array from NumPy
np_data = np.linspace(0, 10, 20)
d_arr = hpx.distributed_from_numpy(np_data)

print("From NumPy:")
print(f"  Original: {np_data}")
print(f"  Distributed: {d_arr.to_numpy()}")

From NumPy:
  Original: [ 0.          0.52631579  1.05263158  1.57894737  2.10526316  2.63157895
  3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368
  6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842
  9.47368421 10.        ]
  Distributed: [ 0.          0.52631579  1.05263158  1.57894737  2.10526316  2.63157895
  3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368
  6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842
  9.47368421 10.        ]


### Distribution Policies

Control how data is partitioned across localities:

- **none**: No distribution (local array)
- **block**: Contiguous chunks to each locality
- **cyclic**: Round-robin distribution

In [11]:
# Available distribution policies
print("Distribution Policies:")
print(f"  None:  {hpx.DistributionPolicy.none}")
print(f"  Block: {hpx.DistributionPolicy.block}")
print(f"  Cyclic: {hpx.DistributionPolicy.cyclic}")

Distribution Policies:
  None:  DistributionPolicy.none
  Block: DistributionPolicy.block
  Cyclic: DistributionPolicy.cyclic


In [12]:
# Create arrays with different distribution policies
arr_none = hpx.distributed_zeros([100])  # Default: no distribution
arr_block = hpx.distributed_zeros([100], distribution='block')
arr_cyclic = hpx.distributed_zeros([100], distribution='cyclic')

print("Distribution policies:")
print(f"  None policy:   {arr_none.policy}")
print(f"  Block policy:  {arr_block.policy}")
print(f"  Cyclic policy: {arr_cyclic.policy}")

Distribution policies:
  None policy:   DistributionPolicy.none
  Block policy:  DistributionPolicy.block
  Cyclic policy: DistributionPolicy.cyclic


### Distributed Array Properties

In [13]:
# Create a distributed array
darr = hpx.distributed_ones([1000], distribution='block')

print("Distributed Array Properties:")
print(f"  Shape: {darr.shape}")
print(f"  Size: {darr.size}")
print(f"  Dimensions: {darr.ndim}")
print(f"  Policy: {darr.policy}")
print(f"  Num partitions: {darr.num_partitions}")
print(f"  Locality ID: {darr.locality_id}")
print(f"  Is distributed: {darr.is_distributed()}")

Distributed Array Properties:
  Shape: [1000]
  Size: 1000
  Dimensions: 1
  Policy: DistributionPolicy.block
  Num partitions: 1
  Locality ID: 0
  Is distributed: False


In [14]:
# Get detailed distribution information
info = darr.get_distribution_info()

print("Distribution Info:")
print(f"  Policy: {info.policy}")
print(f"  Num partitions: {info.num_partitions}")
print(f"  Chunk size: {info.chunk_size}")
print(f"  Locality ID: {info.locality_id}")
print(f"  Is distributed: {info.is_distributed()}")

Distribution Info:
  Policy: DistributionPolicy.block
  Num partitions: 1
  Chunk size: 1000
  Locality ID: 0
  Is distributed: False


### Distributed Array Methods

In [15]:
# Fill with a value
darr = hpx.distributed_zeros([10], distribution='block')
print("Before fill:", darr.to_numpy())

darr.fill(42.0)
print("After fill(42):", darr.to_numpy())

Before fill: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
After fill(42): [42. 42. 42. 42. 42. 42. 42. 42. 42. 42.]


In [16]:
# Convert to NumPy (gathers all data if distributed)
darr = hpx.distributed_full([5], 7.0, distribution='block')

np_arr = darr.to_numpy()
print(f"Type: {type(np_arr)}")
print(f"Values: {np_arr}")

Type: <class 'numpy.ndarray'>
Values: [7. 7. 7. 7. 7.]


### String Representation

In [17]:
# See the string representation
darr_none = hpx.distributed_zeros([100])
darr_block = hpx.distributed_ones([200], distribution='block')
darr_cyclic = hpx.distributed_full([150], 5.0, distribution='cyclic')

print("String representations:")
print(f"  {repr(darr_none)}")
print(f"  {repr(darr_block)}")
print(f"  {repr(darr_cyclic)}")

String representations:
  DistributedArrayF64(shape=[100], distribution='none', partitions=1)
  DistributedArrayF64(shape=[200], distribution='block', partitions=1)
  DistributedArrayF64(shape=[150], distribution='cyclic', partitions=1)


## 3. Multi-Locality Concepts

When running with multiple localities, here's how the components work together.

### SPMD Pattern (Single Program, Multiple Data)

The SPMD pattern runs the same program on all localities, each working on different data:

```python
# Each locality gets its portion of work
my_id = hpx.locality_id()
num_locs = hpx.num_localities()

# Each locality processes its chunk
chunk_size = total_size // num_locs
my_start = my_id * chunk_size
my_data = process(data[my_start:my_start + chunk_size])

# Combine results
global_result = hpx.all_reduce(my_data, op='sum')
```

In [18]:
# SPMD example: Distributed sum
def distributed_sum_example():
    """Example of SPMD pattern for distributed computation."""
    my_id = hpx.locality_id()
    num_locs = hpx.num_localities()
    
    # Total problem size
    total_size = 1000000
    chunk_size = total_size // num_locs
    
    # Each locality works on its chunk
    my_start = my_id * chunk_size
    my_chunk = hpx.arange(my_start, my_start + chunk_size)
    
    # Compute local sum
    local_sum = hpx.sum(my_chunk)
    print(f"Locality {my_id}: local sum = {local_sum}")
    
    # Combine across all localities
    global_sum = hpx.all_reduce(hpx.array([local_sum]), op='sum')
    
    return global_sum.to_numpy()[0]

result = distributed_sum_example()
expected = sum(range(1000000))
print(f"\nGlobal sum: {result}")
print(f"Expected:   {expected}")

Locality 0: local sum = 499999500000.0

Global sum: 499999500000.0
Expected:   499999500000


### Multi-Locality Launcher

HPXPy includes a launcher module for running across multiple processes:

```python
from hpxpy.launcher import launch_localities, spmd_main

# Launch 4 localities running the same script
launch_localities("my_script.py", num_localities=4)

# Or use the decorator
@spmd_main(num_localities=4)
def main():
    with hpx.runtime():
        # Distributed code here
        pass
```

In [19]:
# Check launcher utilities
from hpxpy.launcher import (
    is_multi_locality_mode,
    get_expected_num_localities,
    LocalityConfig,
)

print("Launcher utilities:")
print(f"  In multi-locality mode: {is_multi_locality_mode()}")
print(f"  Expected localities: {get_expected_num_localities()}")

# Example of LocalityConfig
config = LocalityConfig(
    locality_id=0,
    num_localities=4,
    host="localhost",
    port=7910
)
print(f"\nExample HPX args for locality 0:")
for arg in config.to_hpx_args():
    print(f"  {arg}")

Launcher utilities:
  In multi-locality mode: False
  Expected localities: 1

Example HPX args for locality 0:
  --hpx:localities=4
  --hpx:hpx=localhost:7910
  --hpx:agas=localhost:7910


In [20]:
# Clean up
hpx.finalize()
print("Runtime finalized")

Runtime finalized


## Summary

In this tutorial, you learned:

### Collective Operations
- `all_reduce(arr, op)` - Combine values across localities (sum, prod, min, max)
- `broadcast(arr, root)` - Send data from root to all localities
- `gather(arr, root)` - Collect data from all localities to root
- `scatter(arr, root)` - Distribute data from root to all localities
- `barrier(name)` - Synchronize all localities

### Distributed Arrays
- `distributed_zeros()`, `distributed_ones()`, `distributed_full()`, `distributed_from_numpy()`
- Distribution policies: `none`, `block`, `cyclic`
- Properties: `shape`, `size`, `ndim`, `policy`, `num_partitions`, `locality_id`
- Methods: `to_numpy()`, `fill()`, `is_distributed()`, `get_distribution_info()`

### Multi-Locality Support
- SPMD pattern for distributed computing
- `hpx.launcher` module for multi-process execution
- `@spmd_main` decorator for automatic process spawning

For more examples, see the `examples/` directory:
- `distributed_reduction_demo.py` - SPMD pattern example
- `multi_locality_demo.py` - Multi-process launch example