Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: don't know how to restore data location of torch.FloatStorage #196

Open
masc-it opened this issue Jan 23, 2022 · 14 comments
Open
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend

Comments

@masc-it
Copy link

masc-it commented Jan 23, 2022

I am trying to train Yolov5 using DML and after tweaks in the code to let it use the DML device, I have this error when it tries to load the weights on the GPU:

 File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 592, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 851, in _load
    result = unpickler.load()
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 843, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 832, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 812, in restore_location
    return default_restore_location(storage, str(map_location))
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\serialization.py", line 178, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.FloatStorage (tagged with dml)

Loading pretrained weights is not a supported feature..?

@masc-it
Copy link
Author

masc-it commented Jan 23, 2022

Update, I have fixed that problem using:
torch.load(weights, map_location={'0':'dml'})

But now, I have this just at the start of the training (I have replaced SiLu with ReLu since is not supported):

libprotobuf FATAL D:\a\_work\1\s\caffe2\dml\dml_command_recorder.cc:361] CHECK failed: ((((HRESULT)((current_command_list_->Close()))) >= 0)) == (true):
val: Scanning 'D:\Projects\python\semantics\project\tests\annotations\multiclass\dataset\valid\labels.cache' images and
  0%|          | 0/11 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 640, in <module>
    main(opt)
  File "train.py", line 537, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 331, in train
    loss, loss_items = compute_loss(pred, targets.to(device))  # loss scaled by batch_size
  File "D:\Documenti\models\yolov5\utils\loss.py", line 120, in __call__
    tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets
  File "D:\Documenti\models\yolov5\utils\loss.py", line 199, in build_targets
    j, k = ((gxy % 1 < g) & (gxy > 1)).T
  File "D:\Documenti\models\GTA_code\dml\lib\site-packages\torch\tensor.py", line 591, in __iter__
    return iter(self.unbind(0))
RuntimeError: CHECK failed: ((((HRESULT)((current_command_list_->Close()))) >= 0)) == (true):

If needed, this is yolov5 loss script

@Adele101 Adele101 added the pytorch-directml Issues in PyTorch when using its DirectML backend label Jan 26, 2022
@ryanlai2
Copy link
Contributor

Hi @masc-it, thanks for reporting this issue. Can you try our latest release on Pypi to see if you are still experiencing this issue?

Also can you share an e2e python script for us to reproduce your error?

@Mlekow
Copy link

Mlekow commented Mar 12, 2022

Hi @ryanlai2, I've met the same issue too. When I was running train.py, a runtime error occurred as RuntimeError: CHECK failed: ((((HRESULT)((dml_device_->GetDeviceRemovedReason()))) >= 0)) == (true):. The traceback was the same as above.

@masc-it
Copy link
Author

masc-it commented Mar 16, 2022

Hi @masc-it, thanks for reporting this issue. Can you try our latest release on Pypi to see if you are still experiencing this issue?

Also can you share an e2e python script for us to reproduce your error?

You can directly try YoloV5 tutorial (train section), using dml as device.

@masc-it
Copy link
Author

masc-it commented Mar 16, 2022

Anyways, @ryanlai2 I have tried the latest pytorch-directml and this is the result now:

YOLOv5  v6.0-207-g8efe977 torch 1.8.0a0+14f3b5d CPU                                                 
                                                                                                    
Fusing layers...                                                                                    
Model Summary: 213 layers, 1760518 parameters, 0 gradients                                          
1/1: 0...  Success (inf frames 640x480 at 30.00 FPS)                                                
                                                                                                    
Traceback (most recent call last):                                                                  
  File "age.py", line 319, in <module>                                                              
    main(opt)                                                                                       
  File "age.py", line 314, in main                                                                  
    run(**vars(opt))                                                                                
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\autograd\grad_mode.py", line 27, in 
decorate_context                                                                                    
    return func(*args, **kwargs)                                                                    
  File "age.py", line 173, in run                                                                   
    model.warmup(imgsz=(1, 3, *imgsz), half=half)  # warmup                                         
  File "D:\Documenti\models\yolov5\models\common.py", line 460, in warmup                           
    self.forward(im)  # warmup                                                                      
  File "D:\Documenti\models\yolov5\models\common.py", line 397, in forward                          
    y = self.model(im) if self.jit else self.model(im, augment=augment, visualize=visualize)        
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\models\yolov5\models\yolo.py", line 126, in forward                            
    return self._forward_once(x, profile, visualize)  # single-scale inference, train               
  File "D:\Documenti\models\yolov5\models\yolo.py", line 149, in _forward_once                      
    x = m(x)  # run                                                                                 
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\models\yolov5\models\common.py", line 50, in forward_fuse                      
    return self.act(self.conv(x))                                                                   
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\module.py", line 889, in 
_call_impl                                                                                          
    result = self.forward(*input, **kwargs)                                                         
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\conv.py", line 399, in fo
rward                                                                                               
    return self._conv_forward(input, self.weight, self.bias)                                        
  File "D:\Documenti\win_venvs\directml\lib\site-packages\torch\nn\modules\conv.py", line 395, in _c
onv_forward                                                                                         
    return F.conv2d(input, weight, bias, self.stride,                                               
RuntimeError: tensor.is_dml() INTERNAL ASSERT FAILED at "D:\\a\\_work\\1\\s\\pytorch-directml\\aten\
\src\\ATen\\native\\dml\\DMLTensor.cpp":33, please report a bug to PyTorch. unbox expects Dml tensor
 as inputs

Model seems to be correctly loaded though.

Edit
This is the "problematic" code:

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        return self.act(self.conv(x))

@PersoDevo
Copy link

PersoDevo commented Apr 28, 2022

Hi @masc-it , I'm facing the same issue "unbox expects Dml tensor as inputs" while using Yolov5. Did you find a solution? Thanks.

@masc-it
Copy link
Author

masc-it commented Apr 28, 2022

Hi @masc-it , I'm facing the same issue "unbox expects Dml tensor as inputs" while using Yolov5. Did you find a solution? Thanks.

nope, here nobody seems interested :/

anyways I tried to run some parts of the yoloV5 loss on CPU and it kinda worked, that means DML lacks some of the operators involved in it. So we can just hope they'll implement more operators in the future.

@smk2007
Copy link
Member

smk2007 commented May 18, 2022

Hi, and thanks for posting the issue.

We have not taken a look at yolov5 yet, and as such the operators needed to get yolov5 working end-to-end are spotty.
We absolutely intend to support yolov5 in the future, and will update this thread when we expect to have the operator conformance needed to enable it.

@foemre
Copy link

foemre commented May 29, 2022

#ultralytics/yolov5#7642

@lyleaf-81
Copy link

Hello, do you support YOLOV4?

@mahi80
Copy link

mahi80 commented Feb 16, 2023

below code should work for majority of issues related to " dont know how to restore data location of torch.storage._untype storage ( tagged with gpu)

import torch  
import whisper 
devices = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
model = whisper.load_model("medium" , device =devices)

@ag2s20150909
Copy link

Traceback (most recent call last):
  File "d:\whisper\app.py", line 16, in <module>
    model=whisper.load_model(name="base",device="privateuseone:0",download_root="model")
  File "d:\dev\python\lib\site-packages\whisper\__init__.py", line 144, in load_model
    checkpoint = torch.load(fp, map_location=device)
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1142, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1116, in load_tensor
    wrap_storage=restore_location(storage, location),
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 1086, in restore_location
    return default_restore_location(storage, str(map_location))
  File "d:\dev\python\lib\site-packages\torch\serialization.py", line 220, in default_restore_location     
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with privateuseone:0)   

@risharde
Copy link

Same thing, this really gets me when Microsoft makes it sound like it's so simple to do something and then you do it and it just barfs.

@Amna26103
Copy link

Hey there,
i had same issue while using easyocr but when i declared gpu=False it worked as much as i have searched this issue has something to do with gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pytorch-directml Issues in PyTorch when using its DirectML backend
Projects
None yet
Development

No branches or pull requests