# Using GPUs

The speed of RockVerse routines can be greatly enhanced by taking advantage of GPU devices.

RockVerse's internal distribution strategies can utilize multiple GPU devices simultaneously and will prioritize executing operations on GPUs whenever they are available. This process is transparent to the user, allowing for seamless integration without the need for manual configuration.

At import time, RockVerse calls [Numba](https://numba.readthedocs.io/en/stable/index.html) 
and maps available GPU devices. You can manage these devices using the library-wide ``config`` object in RockVerse.

## Check for Availability

There are specific methods for managing GPU devices at runtime:

In [2]:
import rockverse as rv

# Importing RockVerse creates the 'config' object as an instance of the
# rockverse.configuration.Config class.

# Let's use the print_available_gpus method to check for available devices.
print("Available GPU devices:")
rv.config.print_available_gpus()

# By default, RockVerse will utilize all available devices to execute its tasks.
print("\nSelected GPU devices:")
rv.config.print_selected_gpus()

Available GPU devices:
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-ad64cd83-d3d5-a418-1272-37181609ab5c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-dbff0259-af71-fbf6-bac4-ff7278a00a12)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-44331e7a-a604-1c8b-6f1f-799e78d1e9c2)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-d10bfadb-d96f-9347-c449-7d202fd3ba96)
GPU 4: Tesla V100-SXM2-32GB (UUID: GPU-e16c7693-2e5b-47a7-1d12-2d6c23011cf8)
GPU 5: Tesla V100-SXM2-32GB (UUID: GPU-c36418e8-4138-6cac-27f9-71f7a01dc9e7)
GPU 6: Tesla V100-SXM2-32GB (UUID: GPU-8a0b5b3e-b301-ea11-f89f-c224a926edcd)
GPU 7: Tesla V100-SXM2-32GB (UUID: GPU-2b2fd0ef-7a2b-fded-012f-0ae9c20f2acd)

Selected GPU devices:
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-ad64cd83-d3d5-a418-1272-37181609ab5c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-dbff0259-af71-fbf6-bac4-ff7278a00a12)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-44331e7a-a604-1c8b-6f1f-799e78d1e9c2)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-d10bfadb-d96f-9347-c449-7d202fd3ba96)
GPU 4: Tesla V100-SXM2-32GB (U

## Select the devices using ``rockverse.config``

The machine used in this tutorial has 8 Tesla V100-SXM2-32GB GPUs available.

As seen above, RockVerse will by default utilize all available devices to execute its tasks.
This behavior can be modified by setting a list of device indices in ``config['selected_gpus']``:

In [3]:
# Print the original indices of selected devices
print(f"Old list of selected GPU devices: {rv.config['selected_gpus']}")

# Change the list of selected devices
rv.config['selected_gpus'] = [1, 2, 4, 5]
print(f"New device list: {rv.config['selected_gpus']}")

print("\nSelected GPU devices:")
rv.config.print_selected_gpus()

Old list of selected GPU devices: [0, 1, 2, 3, 4, 5, 6, 7]
New device list: [1, 2, 4, 5]

Selected GPU devices:
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-dbff0259-af71-fbf6-bac4-ff7278a00a12)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-44331e7a-a604-1c8b-6f1f-799e78d1e9c2)
GPU 4: Tesla V100-SXM2-32GB (UUID: GPU-e16c7693-2e5b-47a7-1d12-2d6c23011cf8)
GPU 5: Tesla V100-SXM2-32GB (UUID: GPU-c36418e8-4138-6cac-27f9-71f7a01dc9e7)


You can use any iterable of integers, in any order, to set the selected devices:

In [4]:
# All this are valid commands and generate the same outcome
rv.config['selected_gpus'] = (0, 1, 2, 3) # tuple
rv.config['selected_gpus'] = [1, 2, 0, 3] # list
rv.config['selected_gpus'] = {0, 1, 3, 2} # set
rv.config['selected_gpus'] = [0, 1, 3, 2, 3] # repeated index will be filtered out
rv.config['selected_gpus'] = range(4) #range
print(f"New selected GPU devices: {rv.config['selected_gpus']}")

New selected GPU devices: [0, 1, 2, 3]


Set empty list to disable GPU processing:

In [5]:
rv.config['selected_gpus'] = []
print(f"New selected GPU devices: {rv.config['selected_gpus']}")

New selected GPU devices: []


Using invalid indices will raise exceptions, such as 
```python
# This will raise a runtime error: maximum index is 7 in this example.
rv.config['selected_gpus'] = (0, 1, 2, 3, 11)
```



## Select the devices using the config context manager

The config context manager allows you to temporarily modify the configuration settings for GPU selection within a specific block of code. This is particularly useful when you want to experiment with different device selections without permanently altering your configuration.

By using the context manager, you can easily revert to the original settings once the block is exited, ensuring that your application maintains its intended behavior beyond the temporary changes.

In [6]:
# Default to full device list
rv.config['selected_gpus'] = range(8)
print(f"Permanent device list: {rv.config['selected_gpus']}")

# Temporary reassignment with the context manager
with rv.config_context({'selected_gpus': [2, 4, 6]}):
    print(f"This block device list: {rv.config['selected_gpus']}")

# After the with block, everything goes back to normal
print(f"Back to permanent device list: {rv.config['selected_gpus']}")


Permanent device list: [0, 1, 2, 3, 4, 5, 6, 7]
This block device list: [2, 4, 6]
Back to permanent device list: [0, 1, 2, 3, 4, 5, 6, 7]


## Using several GPUs

Each MPI process will use only one device, which is automatically selected at runtime based on the selection list through the ``rank_select_gpu`` method:

In [7]:
rv.config['selected_gpus'] = [1, 2, 4, 5]

print("Selected devices:")
rv.config.print_selected_gpus()

print(f"\nRunning {rv.mpi_nprocs} MPI process(es):")
print(f"   Rank {rv.mpi_rank} using device {rv.config.rank_select_gpu()}")


Selected devices:
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-dbff0259-af71-fbf6-bac4-ff7278a00a12)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-44331e7a-a604-1c8b-6f1f-799e78d1e9c2)
GPU 4: Tesla V100-SXM2-32GB (UUID: GPU-e16c7693-2e5b-47a7-1d12-2d6c23011cf8)
GPU 5: Tesla V100-SXM2-32GB (UUID: GPU-c36418e8-4138-6cac-27f9-71f7a01dc9e7)

Running 1 MPI process(es):
   Rank 0 using device 1


To utilize more than one device, you need to run with multiple MPI processes.

Let's illustrate this directly within this Jupyter Notebook, 
using [ipyparallel](https://ipyparallel.readthedocs.io/).

We will create a cluster with a set of 8 MPI engines. RockVerse will automatically assign one MPI process to each GPU:

In [8]:
import ipyparallel as ipp

# Create an MPI cluster with 8 engines
cluster = ipp.Cluster(engines="mpi", n=8)

# Start and connect to the cluster
rc = cluster.start_and_connect_sync()

# Enable IPython magics for parallel processing
rc[:].activate()

# Now we have the %%px cell magic, which will direct Jupyter to run in the parallel cluster we just created.

Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>


  0%|          | 0/8 [00:00<?, ?engine/s]

In [9]:
%%px --block

import rockverse as rv

if rv.mpi_rank == 0:
    print("Selected devices:")
    rv.config.print_selected_gpus()
    print(f"\nRunning {rv.mpi_nprocs} MPI process(es)")

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

[stdout:0] Selected devices:
GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-ad64cd83-d3d5-a418-1272-37181609ab5c)
GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-dbff0259-af71-fbf6-bac4-ff7278a00a12)
GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-44331e7a-a604-1c8b-6f1f-799e78d1e9c2)
GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-d10bfadb-d96f-9347-c449-7d202fd3ba96)
GPU 4: Tesla V100-SXM2-32GB (UUID: GPU-e16c7693-2e5b-47a7-1d12-2d6c23011cf8)
GPU 5: Tesla V100-SXM2-32GB (UUID: GPU-c36418e8-4138-6cac-27f9-71f7a01dc9e7)
GPU 6: Tesla V100-SXM2-32GB (UUID: GPU-8a0b5b3e-b301-ea11-f89f-c224a926edcd)
GPU 7: Tesla V100-SXM2-32GB (UUID: GPU-2b2fd0ef-7a2b-fded-012f-0ae9c20f2acd)

Running 8 MPI process(es)


In [10]:
%%px --block
print(f"Rank {rv.mpi_rank} using device {rv.config.rank_select_gpu()}")

[stdout:0] Rank 0 using device 0


[stdout:1] Rank 1 using device 1


[stdout:2] Rank 2 using device 2


[stdout:4] Rank 4 using device 4


[stdout:3] Rank 3 using device 3


[stdout:5] Rank 5 using device 5


[stdout:6] Rank 6 using device 6


[stdout:7] Rank 7 using device 7


For scenarios where there are more MPI processes than GPU devices, RockVerse will distribute the workload as evenly as possible:

In [11]:
%%px --block
rv.config['selected_gpus'] = [1, 2, 4, 5]
print(f"Rank {rv.mpi_rank} using device {rv.config.rank_select_gpu()}")

[stdout:1] Rank 1 using device 2


[stdout:3] Rank 3 using device 5


[stdout:0] Rank 0 using device 1


[stdout:2] Rank 2 using device 4


[stdout:7] Rank 7 using device 5


[stdout:6] Rank 6 using device 4


[stdout:4] Rank 4 using device 1


[stdout:5] Rank 5 using device 2


Let's open another cluster with 32 engines and observe how the devices are assigned. 
Fisrt we need to close the current cluster:

In [12]:
cluster.stop_cluster() # Close the current cluster

<coroutine object Cluster.stop_cluster at 0x7feed86c6f60>

and then call ipyparallel again to create the new cluster:

In [13]:
#New cluster with 32 engines
cluster = ipp.Cluster(engines="mpi", n=32)
rc = cluster.start_and_connect_sync()
rc[:].activate()

Starting 32 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>


  0%|          | 0/32 [00:00<?, ?engine/s]

Now we can print the device assignment:

In [14]:
%%px --block
import rockverse as rv
print(f"Rank {rv.mpi_rank} using device {rv.config.rank_select_gpu()}")

%px:   0%|          | 0/32 [00:00<?, ?tasks/s]

[stdout:29] Rank 29 using device 5


[stdout:23] Rank 23 using device 7


[stdout:24] Rank 24 using device 0


[stdout:9] Rank 9 using device 1


[stdout:18] Rank 18 using device 2


[stdout:28] Rank 28 using device 4


[stdout:7] Rank 7 using device 7


[stdout:5] Rank 5 using device 5


[stdout:25] Rank 25 using device 1


[stdout:0] Rank 0 using device 0


[stdout:4] Rank 4 using device 4


[stdout:20] Rank 20 using device 4


[stdout:8] Rank 8 using device 0


[stdout:22] Rank 22 using device 6


[stdout:17] Rank 17 using device 1


[stdout:26] Rank 26 using device 2


[stdout:6] Rank 6 using device 6


[stdout:2] Rank 2 using device 2


[stdout:3] Rank 3 using device 3


[stdout:14] Rank 14 using device 6


[stdout:27] Rank 27 using device 3


[stdout:10] Rank 10 using device 2


[stdout:15] Rank 15 using device 7


[stdout:13] Rank 13 using device 5


[stdout:11] Rank 11 using device 3


[stdout:30] Rank 30 using device 6


[stdout:21] Rank 21 using device 5


[stdout:31] Rank 31 using device 7


[stdout:12] Rank 12 using device 4


[stdout:1] Rank 1 using device 1


[stdout:16] Rank 16 using device 0


[stdout:19] Rank 19 using device 3


We can play with MPI collective calls to better organize this output:

In [15]:
%%px --block

def print_rank_list():
    rank_list = {k: [] for k in rv.config['selected_gpus']}
    for rank in range(rv.mpi_nprocs):
        device = rv.mpi_comm.bcast(rv.config.rank_select_gpu(), root=rank)
        rank_list[device].append(rank)

    if mpi_rank == 0:
        print("Device assignment:")
        for k in sorted(rank_list.keys()):
            print(f"GPU {k}: ranks {rank_list[k]}")

print_rank_list()

[stdout:0] Device assignment:
GPU 0: ranks [0, 8, 16, 24]
GPU 1: ranks [1, 9, 17, 25]
GPU 2: ranks [2, 10, 18, 26]
GPU 3: ranks [3, 11, 19, 27]
GPU 4: ranks [4, 12, 20, 28]
GPU 5: ranks [5, 13, 21, 29]
GPU 6: ranks [6, 14, 22, 30]
GPU 7: ranks [7, 15, 23, 31]


Now, let's test with restricted lists:

In [16]:
%%px --block
with rv.config_context({'selected_gpus': range(5)}):
    print_rank_list()

[stdout:0] Device assignment:
GPU 0: ranks [0, 5, 10, 15, 20, 25, 30]
GPU 1: ranks [1, 6, 11, 16, 21, 26, 31]
GPU 2: ranks [2, 7, 12, 17, 22, 27]
GPU 3: ranks [3, 8, 13, 18, 23, 28]
GPU 4: ranks [4, 9, 14, 19, 24, 29]


In [17]:
%%px --block
with rv.config_context({'selected_gpus': [1, 3, 5, 7]}):
    print_rank_list()

[stdout:0] Device assignment:
GPU 1: ranks [0, 4, 8, 12, 16, 20, 24, 28]
GPU 3: ranks [1, 5, 9, 13, 17, 21, 25, 29]
GPU 5: ranks [2, 6, 10, 14, 18, 22, 26, 30]
GPU 7: ranks [3, 7, 11, 15, 19, 23, 27, 31]


In [18]:
%%px --block
with rv.config_context({'selected_gpus': [0, 1, 7]}):
    print_rank_list()

[stdout:0] Device assignment:
GPU 0: ranks [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
GPU 1: ranks [1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31]
GPU 7: ranks [2, 5, 8, 11, 14, 17, 20, 23, 26, 29]
