Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunk = read(handle, remaining) #23

Closed
sunshantong opened this issue Sep 17, 2020 · 4 comments
Closed

chunk = read(handle, remaining) #23

sunshantong opened this issue Sep 17, 2020 · 4 comments

Comments

@sunshantong
Copy link

sunshantong commented Sep 17, 2020

Hi, thank you for your work! The code runs fine in training. But pause for validation. This does not seem to be caused by the "try-except" code in the testdataloader. When I on CTRL+C shutdown I get this:

Traceback (most recent call last):
  File "train.py", line 96, in <module>
    for j, data in enumerate(testdataloader, 0):
  File "/home/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 330, in __next__
    idx, batch = self._get_batch()
  File "/home/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 309, in _get_batch
    return self.data_queue.get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt

What might be causing this problem? Thank you very much.

@LyuJ1998
Copy link
Collaborator

Hi, I have fix the bug you mentioned in another issue. Check it and your problem may disappear. If not, please make sure again you have remove all of the "try-excepy" code, especially the line 488 and line 471 in dataset/dataset_nocs.py.

@sunshantong
Copy link
Author

@LyuJ1998 Hi, thank you for your reply and update. The dataloader bug has been fixed. But I still have the same problem when I run the code. The code will still stick at testdataloader. When I set num_workers = 0, the code can run without multithreading. But it's very slow. Is it possible deadlock in dataloader? Thank you very much.

@sunshantong
Copy link
Author

I fix the problem by calling the main script by OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 python train.py Hope to be helpful to other similar situations.

@mystorm16
Copy link

mystorm16 commented Jun 1, 2022

Looks like this was a bug in Python
i fix this issue by changing my python version :3.6.0 to 3.6.7
https://stackoverflow.com/questions/53300965/pytorch-exception-in-thread-valueerror-signal-number-32-out-of-range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants