Support device mapping for tensor libraries #21

fabawi · 2022-09-23T17:02:32Z

Add argument to tensor data structures with direct GPU/TPU mapping to support re-mapping on mirrored node e.g.,

@PluginRegistrar.register
class MXNetTensor(Plugin):
    def __init__(self, load_mxnet_device=None, map_mxnet_devices=None, **kwargs):

where map_mxnet_devices should be {'all': mxnet.gpu(0) when load_mxnet_device=mxnet.gpu(0) and map_mxnet_devices=None.
For instance, when load_mxnet_device=mxnet.gpu(0) or load_mxnet_device="cuda:0", map_mxnet_devices can be set manually as a dictionary representing the source device as key and the target device as value for non-default device maps.

Suppose we have the following wrapified function:

@MiddlewareCommunicator.register("NativeObject", args.mware, "Notify", "/notify/test_native_exchange",
                                         carrier="tcp", should_wait=True, load_mxnet_device=mxnet.cpu(0), 
                                         map_mxnet_devices={"cuda:0": "cuda:1", mxnet.gpu(1): "cuda:0", "cuda:3": "cpu:0", 
                                                                              mxnet.gpu(2):  mxnet.gpu(0)})
        def exchange_object(self):
            msg = input("Type your message: ")
            ret = {"message": msg,
                   "mx_ones": mxnet.nd.ones((2, 4)),
                   "mxnet_zeros_cuda1": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(1)),
                   "mxnet_zeros_cuda0": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(0)),
                   "mxnet_zeros_cuda2": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(2)),
                   "mxnet_zeros_cuda3": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(3))}
            return ret,

then the source and target gpus 1 & 0 would be flipped, gpu 3 would be placed on cpu 0, and gpu 2 would be placed on gpu 0. Defining mxnet.gpu(1): mxnet.gpu(0) and cuda:1: cuda:2 in the same mapping should raise an error since the same device is mapped to two different targets.

The text was updated successfully, but these errors were encountered:

fabawi · 2022-10-11T07:42:10Z

Resolved but needs to be more consistent e.g., pytorch does not accept "gpu:0" mapping whereas mxnet and paddle accept "gpu:0" and "cuda:0"

fabawi · 2022-10-11T11:23:13Z

Closing for now as resolved with #24 but needs improvement

fabawi added the help wanted Extra attention is needed label Sep 23, 2022

fabawi closed this as completed Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support device mapping for tensor libraries #21

Support device mapping for tensor libraries #21

fabawi commented Sep 23, 2022 •

edited

fabawi commented Oct 11, 2022

fabawi commented Oct 11, 2022

Support device mapping for tensor libraries #21

Support device mapping for tensor libraries #21

Comments

fabawi commented Sep 23, 2022 • edited

fabawi commented Oct 11, 2022

fabawi commented Oct 11, 2022

fabawi commented Sep 23, 2022 •

edited