Skip to content

Conversation

@jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Jul 27, 2023

The MetaExecutor manages multiple Executors in the background.

Example:

from time import sleep
from pympipool.external_interfaces.meta import MetaExecutor

def calc(i):
    sleep(5)
    return i

with MetaExecutor(max_workers=2) as exe:
    fs_1 = exe.submit(calc, 1)
    fs_2 = exe.submit(calc, 2)
    print(fs_1.result(), fs_2.result())

@jan-janssen jan-janssen merged commit dc2d282 into main Jul 27, 2023
@jan-janssen jan-janssen deleted the meta branch July 27, 2023 17:16
@jan-janssen
Copy link
Member Author

GPU example test.py:

from pympipool import MetaExecutor
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [(x.name, x.physical_device_desc) for x in local_device_protos if x.device_type == 'GPU']


if __name__ == "__main__":
    with MetaExecutor(max_workers=4, cores_per_worker=1, gpus_per_worker=1, enable_flux_backend=True) as exe:
        fs_1 = exe.submit(get_available_gpus)
        fs_2 = exe.submit(get_available_gpus)
        fs_3 = exe.submit(get_available_gpus)
        fs_4 = exe.submit(get_available_gpus)
    
    print(fs_1.result())
    print(fs_2.result())
    print(fs_3.result())
    print(fs_4.result())

Output:

>>> python test.py
2023-07-27 14:00:45.008247: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[('/device:GPU:0', 'device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:01:00.0, compute capability: 8.0')]
[('/device:GPU:0', 'device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:41:00.0, compute capability: 8.0')]
[('/device:GPU:0', 'device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:c1:00.0, compute capability: 8.0')]
[('/device:GPU:0', 'device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:81:00.0, compute capability: 8.0')]

@jan-janssen
Copy link
Member Author

The same works on GPU clusters:

import socket
from pympipool import MetaExecutor
from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [(x.name, x.physical_device_desc, socket.gethostname()) for x in local_device_protos if x.device_type == 'GPU']


if __name__ == "__main__":
    with MetaExecutor(max_workers=2, cores_per_worker=1, gpus_per_worker=1, enable_flux_backend=True) as exe:
        fs_1 = exe.submit(get_available_gpus)
        fs_2 = exe.submit(get_available_gpus)
    
    print(fs_1.result())
    print(fs_2.result())

Output:

[('/device:GPU:0', 'device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:84:00.0, compute capability: 7.0', 'cn138')]
[('/device:GPU:0', 'device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:84:00.0, compute capability: 7.0', 'cn139')]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Simplify distributing tasks over a number of GPUs in a cluster

2 participants