You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
After merge the commit fix maskrnn typo (#154) , when i run the training procedure, it always encounters the problem as below:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 243, in reduce_storage
RuntimeError: unable to open shared memory object </torch_30997_2076642173> in read-write mode
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 149, in _serve
send(conn, destination_pid)
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 50, in send
reduction.send_handle(conn, new_fd, pid)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 176, in send_handle
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
File "/usr/lib/python3.6/socket.py", line 460, in fromfd
nfd = dup(fd)
OSError: [Errno 24] Too many open files
Traceback (most recent call last):
File "tools/train_net.py", line 170, in <module>
main()
File "tools/train_net.py", line 163, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 60, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 631, in __next__
idx, batch = self._get_batch()
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
return self.data_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 204, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError
anyone know to fix it?
thanks.
The text was updated successfully, but these errors were encountered:
❓ Questions and Help
After merge the commit fix maskrnn typo (#154) , when i run the training procedure, it always encounters the problem as below:
anyone know to fix it?
thanks.
The text was updated successfully, but these errors were encountered: