RuntimeError: don't know how to restore data location of torch.FloatStorage #196

masc-it · 2022-01-23T12:12:45Z

I am trying to train Yolov5 using DML and after tweaks in the code to let it use the DML device, I have this error when it tries to load the weights on the GPU:

 File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 592, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 851, in _load
    result = unpickler.load()
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 843, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 832, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 812, in restore_location
    return default_restore_location(storage, str(map_location))
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 178, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.FloatStorage (tagged with dml)

Loading pretrained weights is not a supported feature..?

The text was updated successfully, but these errors were encountered:

masc-it · 2022-01-23T13:12:19Z

Update, I have fixed that problem using:
torch.load(weights, map_location={'0':'dml'})

But now, I have this just at the start of the training (I have replaced SiLu with ReLu since is not supported):

libprotobuf FATAL D:\a\_work\1\s\caffe2\dml\dml_command_recorder.cc:361] CHECK failed: ((((HRESULT)((current_command_list_->Close()))) >= 0)) == (true):
val: Scanning 'D:\Projects\python\semantics\project\tests\annotations\multiclass\dataset\valid\labels.cache' images and
  0%|          | 0/11 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 640, in <module>
    main(opt)
  File "train.py", line 537, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 331, in train
    loss, loss_items = compute_loss(pred, targets.to(device))  # loss scaled by batch_size
  File "D:\Documenti\models\yolov5\utils\loss.py", line 120, in __call__
    tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets
  File "D:\Documenti\models\yolov5\utils\loss.py", line 199, in build_targets
    j, k = ((gxy % 1 < g) & (gxy > 1)).T
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\tensor.py", line 591, in __iter__
    return iter(self.unbind(0))
RuntimeError: CHECK failed: ((((HRESULT)((current_command_list_->Close()))) >= 0)) == (true):

If needed, this is yolov5 loss script

ryanlai2 · 2022-03-10T02:16:37Z

Hi @masc-it, thanks for reporting this issue. Can you try our latest release on Pypi to see if you are still experiencing this issue?

Also can you share an e2e python script for us to reproduce your error?

Mlekow · 2022-03-12T16:25:55Z

Hi @ryanlai2, I've met the same issue too. When I was running train.py, a runtime error occurred as RuntimeError: CHECK failed: ((((HRESULT)((dml_device_->GetDeviceRemovedReason()))) >= 0)) == (true):. The traceback was the same as above.

masc-it · 2022-03-16T06:18:07Z

Hi @masc-it, thanks for reporting this issue. Can you try our latest release on Pypi to see if you are still experiencing this issue?

Also can you share an e2e python script for us to reproduce your error?

You can directly try YoloV5 tutorial (train section), using dml as device.

masc-it · 2022-03-16T06:44:24Z

Anyways, @ryanlai2 I have tried the latest pytorch-directml and this is the result now:

YOLOv5  v6.0-207-g8efe977 torch 1.8.0a0+14f3b5d CPU                                                 
                                                                                                    
Fusing layers...                                                                                    
Model Summary: 213 layers, 1760518 parameters, 0 gradients                                          
1/1: 0...  Success (inf frames 640x480 at 30.00 FPS)                                                
                                                                                                    
Traceback (most recent call last):                                                                  
  File "age.py", line 319, in <module>                                                              
    main(opt)                                                                                       
  File "age.py", line 314, in main                                                                  
    run(**vars(opt))                                                                                
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\autograd\grad_mode.py", line 27, in 
decorate_context                                                                                    
    return func(*args, **kwargs)                                                                    
  File "age.py", line 173, in run                                                                   
    model.warmup(imgsz=(1, 3, *imgsz), half=half)  # warmup                                         
  File "D:\Documenti\models\yolov5\models\common.py", line 460, in warmup                           
    self.forward(im)  # warmup                                                                      
  File "D:\Documenti\models\yolov5\models\common.py", line 397, in forward                          
    y = self.model(im) if self.jit else self.model(im, augment=augment, visualize=visualize)        
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\models\yolov5\models\yolo.py", line 126, in forward                            
    return self._forward_once(x, profile, visualize)  # single-scale inference, train               
  File "D:\Documenti\models\yolov5\models\yolo.py", line 149, in _forward_once                      
    x = m(x)  # run                                                                                 
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\models\yolov5\models\common.py", line 50, in forward_fuse                      
    return self.act(self.conv(x))                                                                   
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\conv.py", line 399, in fo
rward                                                                                               
    return self._conv_forward(input, self.weight, self.bias)                                        
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\conv.py", line 395, in _c
onv_forward                                                                                         
    return F.conv2d(input, weight, bias, self.stride,                                               
RuntimeError: tensor.is_dml() INTERNAL ASSERT FAILED at "D:\\a\\_work\\1\\s\\pytorch-directml\\aten\
\src\\ATen\\native\\dml\\DMLTensor.cpp":33, please report a bug to PyTorch. unbox expects Dml tensor
 as inputs

Model seems to be correctly loaded though.

Edit
This is the "problematic" code:

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))

PersoDevo · 2022-04-28T10:55:38Z

Hi @masc-it , I'm facing the same issue "unbox expects Dml tensor as inputs" while using Yolov5. Did you find a solution? Thanks.

masc-it · 2022-04-28T11:04:19Z

Hi @masc-it , I'm facing the same issue "unbox expects Dml tensor as inputs" while using Yolov5. Did you find a solution? Thanks.

nope, here nobody seems interested :/

anyways I tried to run some parts of the yoloV5 loss on CPU and it kinda worked, that means DML lacks some of the operators involved in it. So we can just hope they'll implement more operators in the future.

smk2007 · 2022-05-18T00:51:07Z

Hi, and thanks for posting the issue.

We have not taken a look at yolov5 yet, and as such the operators needed to get yolov5 working end-to-end are spotty.
We absolutely intend to support yolov5 in the future, and will update this thread when we expect to have the operator conformance needed to enable it.

foemre · 2022-05-29T18:35:07Z

#ultralytics/yolov5#7642

lyleaf-81 · 2022-09-07T02:42:30Z

Hello, do you support YOLOV4?

mahi80 · 2023-02-16T01:12:45Z

below code should work for majority of issues related to " dont know how to restore data location of torch.storage._untype storage ( tagged with gpu)

import torch  
import whisper 
devices = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
model = whisper.load_model("medium" , device =devices)

ag2s20150909 · 2023-09-23T13:45:12Z

Traceback (most recent call last):
  File "d:\whisper\app.py", line 16, in <module>
    model=whisper.load_model(name="base",device="privateuseone:0",download_root="model")
  File "d:\dev\python\lib\site-packages\whisper\__init__.py", line 144, in load_model
    checkpoint = torch.load(fp, map_location=device)
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1086, in restore_location
    return default_restore_location(storage, str(map_location))
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 220, in default_restore_location     
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with privateuseone:0)

risharde · 2023-09-28T21:41:43Z

Same thing, this really gets me when Microsoft makes it sound like it's so simple to do something and then you do it and it just barfs.

Amna26103 · 2024-07-23T07:28:35Z

Hey there,
i had same issue while using easyocr but when i declared gpu=False it worked as much as i have searched this issue has something to do with gpu

Adele101 added the pytorch-directml Issues in PyTorch when using its DirectML backend label Jan 26, 2022

Naozumi520 mentioned this issue Jun 12, 2023

Fix RuntimeError when using MPS under MacOS maxrmorrison/torchcrepe#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: don't know how to restore data location of torch.FloatStorage #196

RuntimeError: don't know how to restore data location of torch.FloatStorage #196

masc-it commented Jan 23, 2022

masc-it commented Jan 23, 2022

ryanlai2 commented Mar 10, 2022

Mlekow commented Mar 12, 2022

masc-it commented Mar 16, 2022

masc-it commented Mar 16, 2022 •

edited

Loading

PersoDevo commented Apr 28, 2022 •

edited

Loading

masc-it commented Apr 28, 2022

smk2007 commented May 18, 2022

foemre commented May 29, 2022

lyleaf-81 commented Sep 7, 2022

mahi80 commented Feb 16, 2023

ag2s20150909 commented Sep 23, 2023

risharde commented Sep 28, 2023

Amna26103 commented Jul 23, 2024

RuntimeError: don't know how to restore data location of torch.FloatStorage #196

RuntimeError: don't know how to restore data location of torch.FloatStorage #196

Comments

masc-it commented Jan 23, 2022

masc-it commented Jan 23, 2022

ryanlai2 commented Mar 10, 2022

Mlekow commented Mar 12, 2022

masc-it commented Mar 16, 2022

masc-it commented Mar 16, 2022 • edited Loading

PersoDevo commented Apr 28, 2022 • edited Loading

masc-it commented Apr 28, 2022

smk2007 commented May 18, 2022

foemre commented May 29, 2022

lyleaf-81 commented Sep 7, 2022

mahi80 commented Feb 16, 2023

ag2s20150909 commented Sep 23, 2023

risharde commented Sep 28, 2023

Amna26103 commented Jul 23, 2024

masc-it commented Mar 16, 2022 •

edited

Loading

PersoDevo commented Apr 28, 2022 •

edited

Loading