How to get available devices and set a specific device in Pytorch-DML? #165

Coderx7 · 2021-10-22T11:22:36Z

Hi,
For accessing available devices in Pytorch we'd normally do :

    print(f'available devices: {torch.cuda.device_count()}')
    print(f'current device: { torch.cuda.current_device()}')

However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled).
I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:

    print(f'available devices: {torch.dml.device_count()}')
    print(f'current device: { torch.dml.current_device()}')

as it fails with the error :

AttributeError: module 'torch.dml' has no attribute 'device_count'

Apart from this, trying to specify a device using the form "dml:number" fails if number>1!
that is this fails for "dml:1":

import torch 
import time
def bench(device ='cpu'):
    print(f'running on {device}:')
    a = torch.randn(size=(2000,2000)).to(device=device)
    b = torch.randn(size=(2000,2000)).to(device=device)
   
    start = time.time()
    c = a+b
    end = time.time()
    
    # print(f'available devices: {torch.dml.device_count()}')
    # print(f'current device: { torch.dml.current_device()}')
    print(f'--took {end-start:.2f} seconds')

bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')

it outputs :

running on cpu:
--took 0.00 seconds
running on dml:
--took 0.01 seconds
running on dml:0:
--took 0.00 seconds
running on dml:1:

and thats it, it doesnt execute when it comes to "dml:1".

also trying to do :

import torch 
import time
def bench(device ='cpu'):
    print(f'running on {device}:')
    a = torch.randn(size=(2000,2000)).to(device=device)
    b = torch.randn_like(a).to(device=device)
    
    start = time.time()
    c = a+b
    end = time.time()
    
    # print(f'available devices: {torch.dml.device_count()}')
    # print(f'current device: { torch.dml.current_device()}')
    print(f'--took {end-start:.2f} seconds')

bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')

Fails with the following error :

running on cpu:
--took 0.00 seconds
running on dml:
Traceback (most recent call last):
  File "g:\tests.py", line 1246, in <module>
    bench('dml')
  File "g:\tests.py", line 1235, in bench
    b = torch.randn_like(a).to(device=device)
RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom 
build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]

The text was updated successfully, but these errors were encountered:

xsacha · 2021-10-24T02:47:36Z

I got the same issue here. Yet their examples work.
It looks like you need to import more things from torch first before .to("dml") works, otherwise it complains about it. You still can't do some things like create a new tensor with device set to "dml".

Once I import the same things as the examples, I can use DML but none of my models appear to be supported. I usually have to freeze the model first so it can run it but I still get:
RuntimeError: tensor.is_dml() INTERNAL ASSERT FAILED at "D:\a\_work\1\s\aten\src\ATen\native\dml\DMLTensor.cpp":422, please report a bug to PyTorch. unbox expects Dml tensor as inputs

I decided to dive in to their headers to figure out more since they have exposed almost nothing to Python.

When you pick "dml", it defaults to "dml:0"
None of the operators I require appear to be supported. You can see the full list in include/ATen/DMLFunctions.h
There is a HardwareAdapter class in the c++ that can enumerate the devices and returns a list that has vendor, driver version and name. It's only used by the DmlBackend, which isn't visible to Python.
I noticed it responds to an environment variable, similar to CUDA, DML_VISIBLE_DEVICES
They appear to have created it via caffe2 headers, copying some of their tensorflow implementation and basing some parts off the torch CUDA implementation to give some understanding about how it came about.

alimoezzi · 2021-10-29T09:35:55Z

I'm also getting same errors


Python 3.8.8 (tags/v3.8.8:024d805, Feb 19 2021, 13:18:16) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> device = torch.device('dml')
>>> torch.rand(10, device=device)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

>>> torch.randn(size=(2000,2000)).to(device='dml:0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fbUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

>>> torch.randn(size=(2000,2000)).to(device='dml:1')
[libprotobuf FATAL D:\a\_work\1\s\caffe2\dml\dml_operator.cc:65] CHECK failed: ((((HRESULT)((backend_->dml_device->CreateOperator(&op_desc_, __uuidof(**(&op)), IID_PPV_ARGS_Helper(&op))))) >= 0)) == (true):
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 274, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 274, in <listcomp>
    return torch.stack([get_summarized_data(x) for x in (start + end)])
  File "C:\Users\User\scoop\apps\python\current\lib\site-packages\torch\_tensor_str.py", line 267, in get_summarized_data
    return torch.cat((self[:PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems:]))
RuntimeError: CHECK failed: ((((HRESULT)((backend_->dml_device->CreateOperator(&op_desc_, __uuidof(**(&op)), IID_PPV_ARGS_Helper(&op))))) >= 0)) == (true):

Adele101 · 2021-11-01T20:42:13Z

Hi,
Thanks for trying out PyTorch+DML, and reporting these issues!

We are currently actively developing the next pre-release version of PyTorch-DML, in which we will investigate and fix these issues.

We will update you when with more details on the next pre-release shortly.

Hyenadae · 2021-11-05T23:25:16Z

Hi, definitely looking forward to more features/DML porting, I've got a similar issue with this new 'Ruclip/RuDalle' AI/ML software for generating photos.
I'm on Windows 10 20H2 and a Vega 56 GPU, I was able to run and train the Squezeenet classifier AI with DML without too many problems, but of course other PyTorch models and things to mess with have a lot of CUDA specific functions/calls in them that don't make porting possible or easy this early on.

Basically, device type DML has the Unknown Tensor Type ID for a few of these PyTorch functions:

Traceback (most recent call last):
File "", line 11, in
File "C:\Users\Hy\Downloads\pytorch\rdalle\ru-dalle-master\rudalle\pipelines.py", line 35, in generate_images
attention_mask = torch.tril(torch.ones((chunk_bs, 1, total_seq_length, total_seq_length), device=device))

RuntimeError: Could not run 'aten::tril.out' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::tril.out' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at D:\a_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at D:\a_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradCPU: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradCUDA: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradXLA: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradNestedTensor: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse1: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse2: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
AutogradPrivateUse3: registered at D:\a_work\1\s\torch\csrc\autograd\generated\VariableType_2.cpp:9170 [autograd kernel]
Tracer: registered at D:\a_work\1\s\torch\csrc\autograd\generated\TraceType_2.cpp:10525 [kernel]
Autocast: fallthrough registered at D:\a_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at D:\a_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

Adele101 · 2022-03-08T18:54:52Z

Thank you for reporting these issues. The new release of PyTorch-DirectML has support for selecting a specific device. Check it out here: https://pypi.org/project/pytorch-directml/

Coderx7 · 2022-03-09T04:30:36Z

@Adele101 Thanks, but not all the issues are fixed. such as the line b = torch.randn_like(a).to(device=device)
still generates the error :

RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Could we get a changelog of whats changed/fixed?

tautomer · 2022-03-11T16:55:53Z

For me, a = torch.randn(size=(2000,2000)).to('dml') gives the following error

RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /home/vsts/work/1/s/build/aten/src/ATen/RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at /home/vsts/work/1/s/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The package is the latest version pytorch-directml 1.8.0a0.dev220224
The error appears to be the same as what was reported as few month back.

smk2007 · 2022-06-15T22:05:12Z

For me, a = torch.randn(size=(2000,2000)).to('dml') gives the following error

RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /home/vsts/work/1/s/build/aten/src/ATen/RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at /home/vsts/work/1/s/torch/csrc/autograd/generated/TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/autocast_mode.cpp:250 [backend fallback]
Batched: registered at /home/vsts/work/1/s/aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at /home/vsts/work/1/s/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

The package is the latest version pytorch-directml 1.8.0a0.dev220224 The error appears to be the same as what was reported as few month back.

Hi, please try out masked_select in the latest prerelease: https://pypi.org/project/pytorch-directml/1.8.0a0.dev220506/
Your error is because masked_select was not implemented. It should available in the latest version.

I am not sure why the to aten function is failing to move your tensor to directml given the information provided, but please make sure that torch has been uninstalled and pytorch-directml is listed in your environment.

Can you share a list of packages in your environment?

xsacha · 2022-06-15T22:12:28Z

Is there a reason the latest pre-release is 4 versions behind the current torch pre-release?
It makes it quite difficult to work out if the issues are because of that torch version or a change in DML, for instance that masked_select.

smk2007 · 2022-06-15T22:16:36Z

b = torch.randn_like(a).to(device=device)

Hi, sorry for the inconvenience. normal_ is not implemented yet.
The roadmap incorrectly flags this operator as complete (https://github.com/microsoft/DirectML/wiki/PyTorch-DirectML-Operator-Roadmap).
We will remedy this issue shortly.

smk2007 · 2022-06-15T22:18:30Z

Is there a reason the latest pre-release is 4 versions behind the current torch pre-release? It makes it quite difficult to work out if the issues are because of that torch version or a change in DML, for instance that masked_select.

The current version of pytorch-directml is snapped to PyTorch 1.8, but we understand the pain here given the drift caused by rapid progress and updates made to Torch.

We are working on a solution to address this problem.

ryanlai2 added the pytorch-directml Issues in PyTorch when using its DirectML backend label Oct 26, 2021

Adele101 added this to the PyTorch-DirectML Next Release milestone Nov 1, 2021

Adele101 closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get available devices and set a specific device in Pytorch-DML? #165

How to get available devices and set a specific device in Pytorch-DML? #165

Coderx7 commented Oct 22, 2021

xsacha commented Oct 24, 2021 •

edited

alimoezzi commented Oct 29, 2021 •

edited

Adele101 commented Nov 1, 2021 •

edited

Hyenadae commented Nov 5, 2021 •

edited

Adele101 commented Mar 8, 2022

Coderx7 commented Mar 9, 2022

tautomer commented Mar 11, 2022

smk2007 commented Jun 15, 2022 •

edited

xsacha commented Jun 15, 2022 •

edited

smk2007 commented Jun 15, 2022

smk2007 commented Jun 15, 2022 •

edited

How to get available devices and set a specific device in Pytorch-DML? #165

How to get available devices and set a specific device in Pytorch-DML? #165

Comments

Coderx7 commented Oct 22, 2021

xsacha commented Oct 24, 2021 • edited

alimoezzi commented Oct 29, 2021 • edited

Adele101 commented Nov 1, 2021 • edited

Hyenadae commented Nov 5, 2021 • edited

Adele101 commented Mar 8, 2022

Coderx7 commented Mar 9, 2022

tautomer commented Mar 11, 2022

smk2007 commented Jun 15, 2022 • edited

xsacha commented Jun 15, 2022 • edited

smk2007 commented Jun 15, 2022

smk2007 commented Jun 15, 2022 • edited

xsacha commented Oct 24, 2021 •

edited

alimoezzi commented Oct 29, 2021 •

edited

Adele101 commented Nov 1, 2021 •

edited

Hyenadae commented Nov 5, 2021 •

edited

smk2007 commented Jun 15, 2022 •

edited

xsacha commented Jun 15, 2022 •

edited

smk2007 commented Jun 15, 2022 •

edited