Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared torch.tensor with multiprocesses using python Queue cause coredump #56480

Open
jackzhou121 opened this issue Apr 20, 2021 · 4 comments
Open
Labels
module: multiprocessing Related to torch.multiprocessing shadow review Request the triage shadow to take a second look at your triage and see if they agree or not triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jackzhou121
Copy link

jackzhou121 commented Apr 20, 2021

馃悰 Bug

Our process crashed when send torch.tensor with Queue, but when the torch.tensor is converted to numpy, the process works good.

When we send small torch.tensor, after couple of times of success, the process crash again!

To Reproduce

Steps to reproduce the behavior:

1 .run the following code on linux with kernel 3.10.0-693.el7.x86_64

import time
from multiprocessing import Process
from multiprocessing.managers import SyncManager
from queue import PriorityQueue
import torch

class MyManager(SyncManager):
    pass
MyManager.register("PriorityQueue", PriorityQueue)  # Register a shared PriorityQueue


def Manager():
    class PipelineManager(SyncManager):
        pass

    PipelineManager.register("PriorityQueue", PriorityQueue)
    m = PipelineManager()
    m.start()
    return m

m = Manager()
pr_queue = m.PriorityQueue()

mytensor = torch.ones((153, 3, 224, 224), dtype=torch.float32)*3.141592

print(mytensor.shape)

pr_queue.put({"data": mytensor})

print("put data done")

time.sleep(600)

Expected behavior

the process with queue.put will crashed.

Environment

python 3.6
linux kernel 3.10.0-693.el7.x86_64
torch 1.2.0

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

cc @ezyang

@jackzhou121 jackzhou121 changed the title shared torch.tensor() with multiprocesses using python Queu cause coredump shared torch.tensor() with multiprocesses using python Queue cause coredump Apr 20, 2021
@jackzhou121 jackzhou121 changed the title shared torch.tensor() with multiprocesses using python Queue cause coredump shared torch.tensor with multiprocesses using python Queue cause coredump Apr 20, 2021
@heitorschueroff heitorschueroff added the shadow review Request the triage shadow to take a second look at your triage and see if they agree or not label Apr 20, 2021
@ezyang
Copy link
Contributor

ezyang commented Apr 21, 2021

Can you try using torch.multiprocessing instead of stock multiprocessing?

@ailzhang ailzhang added module: multiprocessing Related to torch.multiprocessing triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 23, 2021
@jackzhou121
Copy link
Author

torch.multiprocessing

torch.multiprocessing has no syncManager package

@ezyang
Copy link
Contributor

ezyang commented Apr 26, 2021

ok filed an issue for this #56921

@ezyang
Copy link
Contributor

ezyang commented Apr 26, 2021

Do you really need a priority queue here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: multiprocessing Related to torch.multiprocessing shadow review Request the triage shadow to take a second look at your triage and see if they agree or not triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants