OSError: [Errno 24] Too many open files #158

zimenglan-sysu-512 · 2018-11-14T12:11:08Z

❓ Questions and Help

After merge the commit fix maskrnn typo (#154) , when i run the training procedure, it always encounters the problem as below:

 Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 243, in reduce_storage
RuntimeError: unable to open shared memory object </torch_30997_2076642173> in read-write mode
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 149, in _serve
    send(conn, destination_pid)
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 50, in send
    reduction.send_handle(conn, new_fd, pid)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 176, in send_handle
    with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
  File "/usr/lib/python3.6/socket.py", line 460, in fromfd
    nfd = dup(fd)
OSError: [Errno 24] Too many open files

Traceback (most recent call last):
  File "tools/train_net.py", line 170, in <module>
    main()
  File "tools/train_net.py", line 163, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 60, in do_train
    for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 631, in __next__
    idx, batch = self._get_batch()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 610, in _get_batch
    return self.data_queue.get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/reductions.py", line 204, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
    return recvfds(s, 1)[0]
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
    raise EOFError
EOFError

anyone know to fix it?
thanks.

The text was updated successfully, but these errors were encountered:

zimenglan-sysu-512 · 2018-11-14T12:56:54Z

i follow OSError: Too many open files #396 to add these two lines to /etc/security/limits.conf.

*               soft    nofile         65535
*               hard    nofile         65535

then reboot to solve it.

yaohuaxin · 2019-02-27T05:03:38Z

do we really need to open so many files?

fmassa · 2019-02-28T09:18:57Z

@yaohuaxin this is due to how DataLoader with multiple worker threads work, with some particular combination of settings

zimenglan-sysu-512 closed this as completed Nov 14, 2018

fmassa added the question Further information is requested label Nov 14, 2018

jinPrelude mentioned this issue Jun 24, 2021

OSError: [Errno 24] Too many open files jinPrelude/simple-es#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSError: [Errno 24] Too many open files #158

OSError: [Errno 24] Too many open files #158

zimenglan-sysu-512 commented Nov 14, 2018 •

edited

zimenglan-sysu-512 commented Nov 14, 2018 •

edited

yaohuaxin commented Feb 27, 2019

fmassa commented Feb 28, 2019

OSError: [Errno 24] Too many open files #158

OSError: [Errno 24] Too many open files #158

Comments

zimenglan-sysu-512 commented Nov 14, 2018 • edited

❓ Questions and Help

zimenglan-sysu-512 commented Nov 14, 2018 • edited

yaohuaxin commented Feb 27, 2019

fmassa commented Feb 28, 2019

zimenglan-sysu-512 commented Nov 14, 2018 •

edited

zimenglan-sysu-512 commented Nov 14, 2018 •

edited