Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'torch.cuda' has no attribute '_UntypedStorage' #88839

Closed
emailweixu opened this issue Nov 10, 2022 · 7 comments
Closed

module 'torch.cuda' has no attribute '_UntypedStorage' #88839

emailweixu opened this issue Nov 10, 2022 · 7 comments
Assignees

Comments

@emailweixu
Copy link

emailweixu commented Nov 10, 2022

馃悰 Describe the bug

On a machine with PyTorch version: 1.12.1+cu116, running the following code gets error message module 'torch.cuda' has no attribute '_UntypedStorage'. However, the error disappears if not using cuda. The same code can run correctly on a different machine with PyTorch version: 1.8.2+cu111

import time
import torch
import torch.multiprocessing as mp

def set_device():
    # Note: the code can run if the following two lines are commented out
    if torch.cuda.is_available():
        torch.set_default_tensor_type(torch.cuda.FloatTensor)
    return

def worker(job_queue: mp.Queue, done_queue: mp.Queue, result_queue: mp.Queue):
    set_device()
    para = torch.zeros((100, 100))
    try:
        while True:
            result = para + torch.randn_like(para)

            if not job_queue.empty():
                job_queue.get()
                break
            if result_queue.full():
                time.sleep(0.1)
                continue
            result_queue.put(result)

        done_queue.put(None)
        result_queue.cancel_join_thread()

    except Exception as e:
        print(f'{mp.current_process().name} - {e}')

def test_queue():
    set_device()
    ctx = mp.get_context('spawn')
    job_queue = ctx.Queue()
    result_queue = ctx.Queue(100)
    done_queue = ctx.Queue()
    proc = ctx.Process(target=worker, args=(job_queue, done_queue, result_queue))
    proc.start()
    for i in range(10):
        result = result_queue.get()
        for j in range(100):
            if not result_queue.empty():
                result = result_queue.get()
            else:
                break
        print("result: ", result.sum().item())
        time.sleep(0.1)
    job_queue.put(None)
    proc.join()

if __name__ == '__main__':
    test_queue()

Error message:

Traceback (most recent call last):
  File "test_queue.py", line 53, in <module>
    test_queue()
  File "test_queue.py", line 44, in test_queue
    result = result_queue.get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/home/weixu/venvs/python3.8/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 122, in rebuild_cuda_tensor
    shared_cache[(storage_handle, storage_offset_bytes)] = StorageWeakRef(storage)
  File "/home/weixu/venvs/python3.8/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 65, in __setitem__
    self.free_dead_references()
  File "/home/weixu/venvs/python3.8/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 70, in free_dead_references
    if storage_ref.expired():
  File "/home/weixu/venvs/python3.8/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 35, in expired
    return torch.Storage._expired(self.cdata)  # type: ignore[attr-defined]
  File "/home/weixu/venvs/python3.8/lib/python3.8/site-packages/torch/storage.py", line 757, in _expired
    return eval(cls.__module__)._UntypedStorage._expired(*args, **kwargs)
AttributeError: module 'torch.cuda' has no attribute '_UntypedStorage'

Versions

Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.8.15 (default, Oct 12 2022, 19:15:16) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 510.47.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.12.1+cu116
[pip3] torchaudio==0.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect

@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2022

cc @kurtamohler @ezyang

@kurtamohler kurtamohler self-assigned this Nov 10, 2022
@kurtamohler
Copy link
Collaborator

This problem doesn't exist in the newer pytorch 1.13.

I could fix this on the 1.12 branch, but will there be a 1.12.2 release?

@ngimel
Copy link
Collaborator

ngimel commented Nov 10, 2022

No, 1.13 is out, thanks for confirming @kurtamohler. @emailweixu please reopen if error repros on pytorch 1.13

@ngimel ngimel closed this as completed Nov 10, 2022
@hnanacc
Copy link

hnanacc commented Dec 1, 2022

Hi @ngimel, @kurtamohler ,

I'm stuck with this issue and the problem is I cannot use the latest version of pytorch (currently using 1.12+cu11.3). Is there a workaround? or can I please get some context of why this is occuring?

@breakds
Copy link

breakds commented Dec 6, 2022

I ran into this problem as well. At this moment we are not planning to move to pytorch 1.13 yet. Can we reopen this issue and maybe get a backport to 1.12? Thanks!

@ngimel
Copy link
Collaborator

ngimel commented Dec 6, 2022

Pytorch doesn't do backport fixes.

@breakds
Copy link

breakds commented Dec 6, 2022

Thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants