Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support device mapping for tensor libraries #21

Closed
fabawi opened this issue Sep 23, 2022 · 2 comments
Closed

Support device mapping for tensor libraries #21

fabawi opened this issue Sep 23, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@fabawi
Copy link
Member

fabawi commented Sep 23, 2022

Add argument to tensor data structures with direct GPU/TPU mapping to support re-mapping on mirrored node e.g.,

@PluginRegistrar.register
class MXNetTensor(Plugin):
    def __init__(self, load_mxnet_device=None, map_mxnet_devices=None, **kwargs):

where map_mxnet_devices should be {'all': mxnet.gpu(0) when load_mxnet_device=mxnet.gpu(0) and map_mxnet_devices=None.
For instance, when load_mxnet_device=mxnet.gpu(0) or load_mxnet_device="cuda:0", map_mxnet_devices can be set manually as a dictionary representing the source device as key and the target device as value for non-default device maps.

Suppose we have the following wrapified function:

@MiddlewareCommunicator.register("NativeObject", args.mware, "Notify", "/notify/test_native_exchange",
                                         carrier="tcp", should_wait=True, load_mxnet_device=mxnet.cpu(0), 
                                         map_mxnet_devices={"cuda:0": "cuda:1", mxnet.gpu(1): "cuda:0", "cuda:3": "cpu:0", 
                                                                              mxnet.gpu(2):  mxnet.gpu(0)})
        def exchange_object(self):
            msg = input("Type your message: ")
            ret = {"message": msg,
                   "mx_ones": mxnet.nd.ones((2, 4)),
                   "mxnet_zeros_cuda1": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(1)),
                   "mxnet_zeros_cuda0": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(0)),
                   "mxnet_zeros_cuda2": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(2)),
                   "mxnet_zeros_cuda3": mxnet.nd.zeros((2, 3), ctx=mxnet.gpu(3))}
            return ret,

then the source and target gpus 1 & 0 would be flipped, gpu 3 would be placed on cpu 0, and gpu 2 would be placed on gpu 0. Defining mxnet.gpu(1): mxnet.gpu(0) and cuda:1: cuda:2 in the same mapping should raise an error since the same device is mapped to two different targets.

@fabawi fabawi added the help wanted Extra attention is needed label Sep 23, 2022
@fabawi
Copy link
Member Author

fabawi commented Oct 11, 2022

Resolved but needs to be more consistent e.g., pytorch does not accept "gpu:0" mapping whereas mxnet and paddle accept "gpu:0" and "cuda:0"

@fabawi
Copy link
Member Author

fabawi commented Oct 11, 2022

Closing for now as resolved with #24 but needs improvement

@fabawi fabawi closed this as completed Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant