Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get available devices and set a specific device in Pytorch-DML? #165

Closed
Coderx7 opened this issue Oct 22, 2021 · 11 comments
Closed
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend

Comments

@Coderx7
Copy link

Coderx7 commented Oct 22, 2021

Hi,
For accessing available devices in Pytorch we'd normally do :

    print(f'available devices: {torch.cuda.device_count()}')
    print(f'current device: { torch.cuda.current_device()}')

However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled).
I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:

    print(f'available devices: {torch.dml.device_count()}')
    print(f'current device: { torch.dml.current_device()}')

as it fails with the error :

AttributeError: module 'torch.dml' has no attribute 'device_count'

Apart from this, trying to specify a device using the form "dml:number" fails if number>1!
that is this fails for "dml:1":

import torch 
import time
def bench(device ='cpu'):
    print(f'running on {device}:')
    a = torch.randn(size=(2000,2000)).to(device=device)
    b = torch.randn(size=(2000,2000)).to(device=device)
   
    start = time.time()
    c = a+b
    end = time.time()
    
    # print(f'available devices: {torch.dml.device_count()}')
    # print(f'current device: { torch.dml.current_device()}')
    print(f'--took {end-start:.2f} seconds')

bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')    

it outputs :

running on cpu:
--took 0.00 seconds
running on dml:
--took 0.01 seconds
running on dml:0:
--took 0.00 seconds
running on dml:1:

and thats it, it doesnt execute when it comes to "dml:1".

also trying to do :

import torch 
import time
def bench(device ='cpu'):
    print(f'running on {device}:')
    a = torch.randn(size=(2000,2000)).to(device=device)
    b = torch.randn_like(a).to(device=device)
    
    start = time.time()
    c = a+b
    end = time.time()
    
    # print(f'available devices: {torch.dml.device_count()}')
    # print(f'current device: { torch.dml.current_device()}')
    print(f'--took {end-start:.2f} seconds')

bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')    

Fails with the following error :

running on cpu:
--took 0.00 seconds
running on dml:
Traceback (most recent call last):
  File "g:\tests.py", line 1246, in <module>
    bench('dml')
  File "g:\tests.py", line 1235, in bench
    b = torch.randn_like(a).to(device=device)
RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom 
build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]

@xsacha
Copy link

xsacha commented Oct 24, 2021

I got the same issue here. Yet their examples work.
It looks like you need to import more things from torch first before .to("dml") works, otherwise it complains about it. You still can't do some things like create a new tensor with device set to "dml".

Once I import the same things as the examples, I can use DML but none of my models appear to be supported. I usually have to freeze the model first so it can run it but I still get:
RuntimeError: tensor.is_dml() INTERNAL ASSERT FAILED at "D:\a\_work\1\s\aten\src\ATen\native\dml\DMLTensor.cpp":422, please report a bug to PyTorch. unbox expects Dml tensor as inputs

I decided to dive in to their headers to figure out more since they have exposed almost nothing to Python.

  • When you pick "dml", it defaults to "dml:0"
  • None of the operators I require appear to be supported. You can see the full list in include/ATen/DMLFunctions.h
  • There is a HardwareAdapter class in the c++ that can enumerate the devices and returns a list that has vendor, driver version and name. It's only used by the DmlBackend, which isn't visible to Python.
  • I noticed it responds to an environment variable, similar to CUDA, DML_VISIBLE_DEVICES
  • They appear to have created it via caffe2 headers, copying some of their tensorflow implementation and basing some parts off the torch CUDA implementation to give some understanding about how it came about.

@ryanlai2 ryanlai2 added the pytorch-directml Issues in PyTorch when using its DirectML backend label Oct 26, 2021
@alimoezzi
Copy link

alimoezzi commented Oct 29, 2021

I'm also getting same errors


Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> device = torch.device('dml')
>>> torch.rand(10, device=device)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

>>> torch.randn(size=(2000,2000)).to(device='dml:0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fbUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

>>> torch.randn(size=(2000,2000)).to(device='dml:1')
[libprotobuf FATAL D:\a\_work\1\s\caffe2\dml\dml_operator.cc:65] CHECK failed: ((((HRESULT)((backend_->dml_device->CreateOperator(&op_desc_, __uuidof(**(&op)), IID_PPV_ARGS_Helper(&op))))) >= 0)) == (true):
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 274, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 274, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 267, in get_summarized_data
    return torch.cat((self[:PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems:]))
RuntimeError: CHECK failed: ((((HRESULT)((backend_->dml_device->CreateOperator(&op_desc_, __uuidof(**(&op)), IID_PPV_ARGS_Helper(&op))))) >= 0)) == (true):


@Adele101 Adele101 added this to the PyTorch-DirectML Next Release milestone Nov 1, 2021
@Adele101
Copy link

Adele101 commented Nov 1, 2021

Hi,
Thanks for trying out PyTorch+DML, and reporting these issues!

We are currently actively developing the next pre-release version of PyTorch-DML, in which we will investigate and fix these issues.

We will update you when with more details on the next pre-release shortly.

@Hyenadae
Copy link

Hyenadae commented Nov 5, 2021

Hi, definitely looking forward to more features/DML porting, I've got a similar issue with this new 'Ruclip/RuDalle' AI/ML software for generating photos.
I'm on Windows 10 20H2 and a Vega 56 GPU, I was able to run and train the Squezeenet classifier AI with DML without too many problems, but of course other PyTorch models and things to mess with have a lot of CUDA specific functions/calls in them that don't make porting possible or easy this early on.

Basically, device type DML has the Unknown Tensor Type ID for a few of these PyTorch functions:

Traceback (most recent call last):
File "", line 11, in
File "C:\Users\Hy\Downloads\pytorch\rdalle\ru-dalle-master\rudalle\pipelines.py", line 35, in generate_images
attention_mask = torch.tril(torch.ones((chunk_bs, 1, total_seq_length, total_seq_length), device=device))

RuntimeError: Could not run 'aten::tril.out' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::tril.out' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at D:\a_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradCPU: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradCUDA: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradXLA: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradNestedTensor: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse1: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse2: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse3: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
Tracer: registered at D:\a_work\1\s\torch\csrc\autograd\generated\TraceType_2.cpp:10525 [kernel]
Autocast: fallthrough registered at D:\a_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

@Adele101
Copy link

Adele101 commented Mar 8, 2022

Thank you for reporting these issues. The new release of PyTorch-DirectML has support for selecting a specific device. Check it out here: https://pypi.org/project/pytorch-directml/

@Adele101 Adele101 closed this as completed Mar 8, 2022
@Coderx7
Copy link
Author

Coderx7 commented Mar 9, 2022

@Adele101 Thanks, but not all the issues are fixed. such as the line b = torch.randn_like(a).to(device=device)
still generates the error :

RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Could we get a changelog of whats changed/fixed?

@tautomer
Copy link

For me, a = torch.randn(size=(2000,2000)).to('dml') gives the following error

RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /home/vsts/work/1/s/build/aten/src/ATen/RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at /home/vsts/work/1/s/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The package is the latest version pytorch-directml 1.8.0a0.dev220224
The error appears to be the same as what was reported as few month back.

@smk2007
Copy link
Member

smk2007 commented Jun 15, 2022

For me, a = torch.randn(size=(2000,2000)).to('dml') gives the following error

RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /home/vsts/work/1/s/build/aten/src/ATen/RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at /home/vsts/work/1/s/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The package is the latest version pytorch-directml 1.8.0a0.dev220224 The error appears to be the same as what was reported as few month back.

Hi, please try out masked_select in the latest prerelease: https://pypi.org/project/pytorch-directml/1.8.0a0.dev220506/
Your error is because masked_select was not implemented. It should available in the latest version.

I am not sure why the to aten function is failing to move your tensor to directml given the information provided, but please make sure that torch has been uninstalled and pytorch-directml is listed in your environment.

Can you share a list of packages in your environment?

@xsacha
Copy link

xsacha commented Jun 15, 2022

Is there a reason the latest pre-release is 4 versions behind the current torch pre-release?
It makes it quite difficult to work out if the issues are because of that torch version or a change in DML, for instance that masked_select.

@smk2007
Copy link
Member

smk2007 commented Jun 15, 2022

b = torch.randn_like(a).to(device=device)

Hi, sorry for the inconvenience. normal_ is not implemented yet.
The roadmap incorrectly flags this operator as complete (https://github.com/microsoft/DirectML/wiki/PyTorch-DirectML-Operator-Roadmap).
We will remedy this issue shortly.

@smk2007
Copy link
Member

smk2007 commented Jun 15, 2022

Is there a reason the latest pre-release is 4 versions behind the current torch pre-release? It makes it quite difficult to work out if the issues are because of that torch version or a change in DML, for instance that masked_select.

The current version of pytorch-directml is snapped to PyTorch 1.8, but we understand the pain here given the drift caused by rapid progress and updates made to Torch.

We are working on a solution to address this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend
Projects
None yet
Development

No branches or pull requests

8 participants