Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455> #31874

weilueluo · 2020-01-05T14:54:48Z

🐛 Bug

Runtime error similar to #18797.

To Reproduce

Steps to reproduce the behavior:
no idea how to do that

Expected behavior

no error

Environment

Collecting environment information...
PyTorch version: 1.2.0+cu92
Is debug build: No
CUDA used to build PyTorch: 9.2

OS: Microsoft Windows 10 家庭版 <--- this means family version
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1050
Nvidia driver version: 441.28
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.2.0+cu92
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.4.0+cu92
[conda] Could not collect

Additional context

[...]
epoch 35/40: 100%|████████████████████████████████████████████████| 174/174 [03:04<00:00,  1.73it/s]
evaluate: 100%|█████████████████████████████████████████████████████| 39/39 [00:51<00:00,  1.31s/it]
epoch 36/40: 100%|████████████████████████████████████████████████| 174/174 [03:12<00:00,  1.03s/it]
evaluate: 100%|█████████████████████████████████████████████████████| 39/39 [00:53<00:00,  1.48it/s]
epoch 37/40: 100%|████████████████████████████████████████████████| 174/174 [03:10<00:00,  1.56it/s]
evaluate: 100%|█████████████████████████████████████████████████████| 39/39 [00:54<00:00,  1.40s/it]
epoch 38/40:  52%|█████████████████████████▋                       | 91/174 [01:59<01:40,  1.21s/it]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-186500daf789> in <module>
----> 1 acc, w = run(lr=0.001, epochs=40, data_loaders=data_loaders, net=None, write_tensorboard=True, weight_decay=0, start_epoch=1)

<ipython-input-12-7ab892883eff> in run(lr, epochs, data_loaders, net, write_tensorboard, weight_decay, start_epoch)
     19     best_acc, best_w = train_net(net=net, criterion=criterion, optimizer=optimizer, 
     20                                  data_loaders=data_loaders, epochs=epochs, writer=writer,
---> 21                                  scheduler=scheduler, start_epochs=start_epoch)
     22     if writer:
     23         writer.close()

<ipython-input-11-0d98cdc0971c> in train_net(net, criterion, optimizer, data_loaders, epochs, writer, scheduler, start_epochs)
      6 
      7         loss_sum, train_acc, val_acc = train_one_epoch(net=net, criterion=criterion, optimizer=optimizer,
----> 8                                                       data_loaders=data_loaders, epoch=epoch, epochs=epochs)
      9 
     10         if writer:

<ipython-input-10-d126b1e19a5e> in train_one_epoch(net, criterion, optimizer, data_loaders, epoch, epochs)
      3     corrects_sum = 0.0
      4     total_samples = 0
----> 5     for images, labels in tqdm(data_loaders[train], desc=f'epoch {epoch}/{epochs}', ncols=100):
      6         loss, corrects = train_one_sample(net=net, criterion=criterion, optimizer=optimizer,
      7                                          inputs=images, labels=labels)

~\AppData\Roaming\Python\Python37\site-packages\tqdm\_tqdm.py in __iter__(self)
   1015                 """), fp_write=getattr(self.fp, 'write', sys.stderr.write))
   1016 
-> 1017             for obj in iterable:
   1018                 yield obj
   1019                 # Update and possibly print the progressbar.

~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    817             else:
    818                 del self.task_info[idx]
--> 819                 return self._process_data(data)
    820 
    821     next = __next__  # Python 2 compatibility

~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py in _process_data(self, data)
    844         self._try_put_index()
    845         if isinstance(data, ExceptionWrapper):
--> 846             data.reraise()
    847         return data
    848 

~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\_utils.py in reraise(self)
    367             # (https://bugs.python.org/issue2651), so we work around it.
    368             msg = KeyErrorMessage(msg)
--> 369         raise self.exc_type(msg)

RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\_utils\worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\_utils\fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\_utils\collate.py", line 80, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\_utils\collate.py", line 80, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\_utils\collate.py", line 54, in default_collate
    storage = elem.storage()._new_shared(numel)
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\storage.py", line 128, in _new_shared
    return cls._new_using_filename(size)
RuntimeError: Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455>

weilueluo · 2020-01-05T15:25:33Z

Not sure if it is related, but I also have the following error in tensorbord:

TensorBoard 2.0.2 at http://localhost:6006/ (Press CTRL+C to quit)
Fatal Python error: Aborted

Current thread 0x0007c8c0 (most recent call first):
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 1034 in GetNext
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 71 in Load
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\event_file_loader.py", line 94 in Load
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\directory_watcher.py", line 113 in _LoadInternal
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\directory_watcher.py", line 89 in Load
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\plugin_event_accumulator.py", line 177 in Reload
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\plugin_event_multiplexer.py", line 224 in Worker
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\event_processing\plugin_event_multiplexer.py", line 246 in Reload
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\backend\application.py", line 504 in _reload
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\threading.py", line 870 in run
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\threading.py", line 926 in _bootstrap_inner
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\threading.py", line 890 in _bootstrap

Thread 0x00080788 (most recent call first):
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\selectors.py", line 314 in _select
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\selectors.py", line 323 in select
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\socketserver.py", line 232 in serve_forever
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\werkzeug\serving.py", line 735 in serve_forever
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\program.py", line 284 in _run_serve_subcommand
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\program.py", line 267 in main
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 250 in _run_main
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\absl\app.py", line 299 in run
  File "C:\Users\wweilue\AppData\Roaming\Python\Python37\site-packages\tensorboard\main.py", line 66 in run_main
  File "C:\Users\wweilue\AppData\Local\Programs\Python\Python37\Scripts\tensorboard.exe\__main__.py", line 7 in <module>
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\runpy.py", line 85 in _run_code
  File "c:\users\wweilue\appdata\local\programs\python\python37\lib\runpy.py", line 193 in _run_module_as_main

peterjc123 · 2020-01-05T19:30:22Z

What about using python my_script.py instead of calling ipython?

peterjc123 · 2020-01-05T19:34:49Z

Alternatively, you may try out reducing the number of workers in the DataLoader.

weilueluo · 2020-01-05T19:38:30Z

@peterjc123 I am not able to reproduce the same error with the second time I run it. I got a memory error instead, it runs normally after reducing the number of workers.

jerryzh168 · 2020-01-08T19:18:39Z

@Redcxx is this resolved?

weilueluo · 2020-01-08T20:58:29Z

@jerryzh168 For me yes.

zimonitrome · 2020-12-02T17:43:30Z

Alternatively, you may try out reducing the number of workers in the DataLoader.

I feel like num_workers should not need to be reduced. I am trying to run 8 workers on a 16 thread CPU and it fails ~50% of the time while my CPU is working at ~25% and there are lots of memory available.

Di-Ma-S21 · 2021-08-16T15:51:28Z

@Redcxx Hi, I also encountered this error when using 'multiprocessing' function in PyTorch on Windows10. Here is my issue post: #63331. I noticed that you have resolved this problem. Could you please let me know how did you solve it? I have tried to put all the codes (including train()) under if__name__ == '__main__':, but it still could not solve the problem. Thanks in advance!

sinhong96 · 2022-03-17T11:44:28Z

@Redcxx Hi, I also encountered this error when using 'multiprocessing' function in PyTorch on Windows10. Here is my issue post: #63331. I noticed that you have resolved this problem. Could you please let me know how did you solve it? I have tried to put all the codes (including train()) under if__name__ == '__main__':, but it still could not solve the problem. Thanks in advance!

Hi, I am still facing the same problem. May I know have you solved the problem? How you solved it? thanks!

Keeyahto · 2023-03-03T22:10:46Z

I think I have solved this problem. torch.multiprocessing.set_start_method('spawn', True) I used this code in the train function and the error stopped appearing.

Originofamonia · 2023-04-04T15:53:15Z

Hi weilueluo, could you please share how you solved this problem? I also faced the same problem. Thanks

jerryzh168 added module: tensorboard triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 8, 2020

weilueluo closed this as completed Jan 8, 2020

KaiyangZhou mentioned this issue Nov 14, 2020

Couldn't open shared file mapping: <torch_12372_2591263215>, error code: <1455> KaiyangZhou/deep-person-reid#389

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455> #31874

Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455> #31874

weilueluo commented Jan 5, 2020

weilueluo commented Jan 5, 2020

Uh oh!

peterjc123 commented Jan 5, 2020

Uh oh!

peterjc123 commented Jan 5, 2020

Uh oh!

weilueluo commented Jan 5, 2020

Uh oh!

jerryzh168 commented Jan 8, 2020

Uh oh!

weilueluo commented Jan 8, 2020

Uh oh!

zimonitrome commented Dec 2, 2020

Uh oh!

Di-Ma-S21 commented Aug 16, 2021

Uh oh!

sinhong96 commented Mar 17, 2022

Uh oh!

Keeyahto commented Mar 3, 2023

Uh oh!

Originofamonia commented Apr 4, 2023

Uh oh!

Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455> #31874

Couldn't open shared file mapping: <torch_573824_1569179339>, error code: <1455> #31874

Comments

weilueluo commented Jan 5, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

weilueluo commented Jan 5, 2020

Uh oh!

peterjc123 commented Jan 5, 2020

Uh oh!

peterjc123 commented Jan 5, 2020

Uh oh!

weilueluo commented Jan 5, 2020

Uh oh!

jerryzh168 commented Jan 8, 2020

Uh oh!

weilueluo commented Jan 8, 2020

Uh oh!

zimonitrome commented Dec 2, 2020

Uh oh!

Di-Ma-S21 commented Aug 16, 2021

Uh oh!

sinhong96 commented Mar 17, 2022

Uh oh!

Keeyahto commented Mar 3, 2023

Uh oh!

Originofamonia commented Apr 4, 2023

Uh oh!