simple and small optimization: default device argument for custom device #103828

heidongxianhua · 2023-06-19T07:47:50Z

🚀 The feature, motivation and pitch

1、For many operators(such as pin_memory), the device argument is default as cuda if not given; but for other device, we must have to give extra argument device_type comparing to cuda, so we add an API to set the default argument device just once at the begining to keep usage consistent with cuda.
2、And there are some API defined in Python, we add a argument named device_type and the default value is cuda, (such as https://github.com/pytorch/pytorch/blob/main/torch/random.py#L104), so that we could support more device (privateuse1 device).
So we want to add an API to set the default argument device just once at the begining to keep usage consistent with cuda, and add an api to get the default device to keep usage consistent with cuda if not gived device_type.

Alternatives

No response

Additional context

No response

cc @albanD

The text was updated successfully, but these errors were encountered:

albanD · 2023-06-26T17:09:38Z

cc @ezyang who was discussing this on the proposed PRs

malfet · 2023-06-26T18:47:50Z

There is already torch.set_default_device, but some API seems to be device specific (for example is_pinned )

heidongxianhua · 2023-06-27T13:32:41Z

set_default_device

yes, the torch.set_default_device func is rather special, it will make all the operators device to be the setted value.

torch.set_default_device('cuda')
a = torch.rand(2)
a.device  # device(type='cuda', index=0)

But we want to add an API to set the device argument of some operators which is cuda (hard-code with cuda), it will be difficult for other device to extend and has different usage, such as is_pinned or pin_memory, for cuda, it just to call with is_pinned(), but for other device, it should be is_pinned("foo").
And another case is torch.device, if we call with torch.device(2), it will be "cuda:2".
So we want to add an API to set the device-argument for the operators wich has hard-code default device with cuda, we can change it to other device only run once, and so that we can keep the usage for this operators as now for other device. @malfet @albanD @ezyang Maybe consider our ideas again ?

ezyang · 2023-07-04T21:19:51Z

As I said, I have proposed a context manager similar to the torch.device context manager, but which ONLY applies to things like is_pinned device argument. You can see the existing implementation for device context manager at torch/utils/_device.py, it is easy to adapt.

heidongxianhua · 2023-07-17T12:59:17Z

As I said, I have proposed a context manager similar to the torch.device context manager, but which ONLY applies to things like is_pinned device argument. You can see the existing implementation for device context manager at torch/utils/_device.py, it is easy to adapt.

@ezyang yeah, thankyou so much, and we have made some tests with DeviceContext , it could solve the problems for operators.
And I would like to discuss this issue in more depth. The current status is that in the pytorch framework, whether it is on the c++ or python side, many operators and interfaces have the device parameter, but when the device parameter is not specified or only the device index is specified, and hard-code cuda is used by default. This may be very unfriendly to extend other device types, whether it is mps/xla or custom device. So we want to solve this problem.
So we propose some ideas to address this issue.

As the PR add default argument device type api #103575, we add an API to set the default device type, and then we could get the device type to replace the hard-code cuda type by the func named get_default_argument_device_type, and it is available in both c++ and python. This ieda could solve all the problem and is very friendly to extend other device types, whether it is mps/xla or custom device.
And another PR Deprecated the device usage without device_type #104457, the core ieda is to discard the inappropriate usage, we want to deprecate the device usage like a = torch.device(0), and we should use a = torch.device("cuda:0"). This idea could solve all the operators with device argument and the core API torch.device, and the APIs defined in python could not be solved.
And the ieda (your suggestion) to use DeviceContext or TorchFunctionMode , we have made many tests, it also could solve all the operators with device argument, but the core API torch.device and the APIs defined in python could not be solved.

So now we do not know how to solve the API torch.device only given with index argument, maybe we should deprecate the device usage torch.device(0).

ezyang · 2023-07-24T15:55:41Z

Is it just torch.device? We can make torch.device interposable by TorchFunctionMode, would that be sufficient?

heidongxianhua · 2023-07-25T13:55:42Z

Is it just torch.device? We can make torch.device interposable by TorchFunctionMode, would that be sufficient?

ehha, except for torch.device, there are some APIs defined in python and hard-code cuda in C++, it is hard to extend for other device except for cuda.

heidongxianhua · 2023-07-25T13:55:55Z

And I have tried to use TorchFunctionMode to solve torch.device, it does not work, is there any examples? @ezyang

ezyang · 2023-07-25T14:17:24Z

You need to modify it, but the modification is a lot smaller.

PyObject* THPDevice_pynew(
    PyTypeObject* type,
    PyObject* args,
    PyObject* kwargs) {
  HANDLE_TH_ERRORS
  static torch::PythonArgParser parser(
      {"Device(Device device)",
       "Device(c10::string_view type, int64_t? index=-1)"});
  torch::ParsedArgs<2> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  if (r.idx == 0) {
    auto device = r.device(0);
    return THPDevice_New(device);
  } else if (r.idx == 1) {
    auto as_device = r.device(0); // this works, because device can take strings
    auto device_type = r.string(0);
    if (as_device.has_index()) {
      throw std::runtime_error(
          "type (string) must not include an index because index "
          "was passed explicitly: " +
          device_type);
    }
    int32_t device_index = -1;
    if (!r.isNone(1)) {
      device_index = r.toInt64(1);
      // -1 is allowed in ATen/C++, to mean the default device, but not in
      // Python.
      TORCH_CHECK(device_index >= 0, "Device index must not be negative");
    }
    at::Device device(as_device.type(), device_index);
    return THPDevice_New(device);
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

change it to something like

PyObject* THPDevice_pynew(
    PyTypeObject* type,
    PyObject* args,
    PyObject* kwargs) {
  HANDLE_TH_ERRORS
  static torch::PythonArgParser parser(
      {"device(Device device)",
       "device(c10::string_view type, int64_t? index=-1)"});
  torch::ParsedArgs<2> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  if (r.has_torch_function()) {
    return handle_torch_function(
        r, nullptr, args, kwargs, THPVariableFunctionsModule, "torch");
  }
  if (r.idx == 0) {
    auto device = r.device(0);
    return THPDevice_New(device);
  } else if (r.idx == 1) {
    auto as_device = r.device(0); // this works, because device can take strings
    auto device_type = r.string(0);
    if (as_device.has_index()) {
      throw std::runtime_error(
          "type (string) must not include an index because index "
          "was passed explicitly: " +
          device_type);
    }
    int32_t device_index = -1;
    if (!r.isNone(1)) {
      device_index = r.toInt64(1);
      // -1 is allowed in ATen/C++, to mean the default device, but not in
      // Python.
      TORCH_CHECK(device_index >= 0, "Device index must not be negative");
    }
    at::Device device(as_device.type(), device_index);
    return THPDevice_New(device);
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

and then you should be able to interpose on torch.device constructions. Then, based on your other patch, you just need to modify is_pinned (should already be interposable) and fork_rng (do something similar, but use the Python side torch function handling idiom

    def storage(self):  
        r"""            
        storage() -> torch.TypedStorage
                    
        Returns the underlying :class:`TypedStorage`.
                        
        .. warning::        
                            
            :class:`TypedStorage` is deprecated. It will be removed in the future, and
            :class:`UntypedStorage` will be the only storage class. To access the
            :class:`UntypedStorage` directly, use :attr:`Tensor.untyped_storage()`.
        """     
        if has_torch_function_unary(self):
            return handle_torch_function(Tensor.storage, (self,), self)
                        
        torch.storage._warn_typed_storage_removal(stacklevel=2)
        return self._typed_storage()

)

heidongxianhua · 2023-07-26T09:04:58Z

yeah, I give a wrong string, and I fix it now. this is PR #106017
but this PR will cause error in prims tests, unlike the operators defined in torch.aten, torch.device is an independent class. I am not familiar with prims，I do not have a good idea to solve it.
and I have made a simple test, TorchFunctionMode may also cause performance degradation:

import torch
from torch.overrides import TorchFunctionMode
import time

_device_constructors = {torch.tensor, torch.device, torch.rand}
class DeviceContext(TorchFunctionMode):
    def __init__(self, device):
        pass

    def __enter__(self):
        return super().__enter__()

    def __exit__(self, exc_type, exc_val, exc_tb):
        return super().__exit__(exc_type, exc_val, exc_tb)

    def __torch_function__(self, func, types, args=(), kwargs=None):
        kwargs = kwargs or {}
        if func in _device_constructors and isinstance(kwargs.get("device"), int):
            kwargs["device"] = "cpu"
        return func(*args, **kwargs)

def run():
    start_time = time.time()
    for i in range(100000):
        a = torch.rand(2,3)
    print("time:", time.time() - start_time)
    DeviceContext("npu").__enter__()
    start_time = time.time()
    for i in range(100000):
        a = torch.rand(2,3, device=0)
    print("time:", time.time() - start_time)
run()

the result is :

origin time: 0.3201165199279785
torch_function time: 0.6074237823486328

And in python as you give a storage example, we may need to add handle_torch_function func for many APIs, because there are many APIs have the device type argument with default value cuda.
So based on the above questions, perhaps it is the simpler way to add a default device parameter setting in c++, as PR #103575? @ezyang

ezyang · 2023-07-28T03:55:52Z

Are you more comfortable with the overhead if you assume you're going to torch.compile the model anyway?

heidongxianhua · 2023-07-29T01:30:28Z

Are you more comfortable with the overhead if you assume you're going to torch.compile the model anyway?

yeah, we also want to support training model in eager mode and torch.compile mode. @ezyang

heidongxianhua mentioned this issue Jun 15, 2023

add default argument device type api #103575

Closed

drisspg added the triage review label Jun 21, 2023

albanD added needs design module: python frontend For issues relating to PyTorch's Python frontend labels Jun 26, 2023

malfet added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Jun 26, 2023

heidongxianhua mentioned this issue Jul 17, 2023

Deprecated the device usage without device_type #104457

Closed

dilililiwhy mentioned this issue Aug 3, 2023

Let torch.device be overrideable by TorchFunctionMode #106514

Closed

pytorchmergebot closed this as completed in 5a9e82f Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple and small optimization: default device argument for custom device #103828

simple and small optimization: default device argument for custom device #103828

heidongxianhua commented Jun 19, 2023 •

edited by pytorch-bot bot

albanD commented Jun 26, 2023

malfet commented Jun 26, 2023

heidongxianhua commented Jun 27, 2023 •

edited

ezyang commented Jul 4, 2023

heidongxianhua commented Jul 17, 2023

ezyang commented Jul 24, 2023

heidongxianhua commented Jul 25, 2023

heidongxianhua commented Jul 25, 2023

ezyang commented Jul 25, 2023

heidongxianhua commented Jul 26, 2023 •

edited

ezyang commented Jul 28, 2023

heidongxianhua commented Jul 29, 2023

simple and small optimization: default device argument for custom device #103828

simple and small optimization: default device argument for custom device #103828

Comments

heidongxianhua commented Jun 19, 2023 • edited by pytorch-bot bot

🚀 The feature, motivation and pitch

Alternatives

Additional context

albanD commented Jun 26, 2023

malfet commented Jun 26, 2023

heidongxianhua commented Jun 27, 2023 • edited

ezyang commented Jul 4, 2023

heidongxianhua commented Jul 17, 2023

ezyang commented Jul 24, 2023

heidongxianhua commented Jul 25, 2023

heidongxianhua commented Jul 25, 2023

ezyang commented Jul 25, 2023

heidongxianhua commented Jul 26, 2023 • edited

ezyang commented Jul 28, 2023

heidongxianhua commented Jul 29, 2023

heidongxianhua commented Jun 19, 2023 •

edited by pytorch-bot bot

heidongxianhua commented Jun 27, 2023 •

edited

heidongxianhua commented Jul 26, 2023 •

edited