BrokenPipeError: [Errno 32] Broken pipe #2341

mjchen611 · 2017-08-08T13:06:55Z

Hi, I use Pytorch to run a triplet network(GPU), but when I got data , there was always a BrokenPipeError:[Errno 32] Broken pipe.

I thought it was something wrong in the following codes：

for batch_idx, (data1, data2, data3) in enumerate(test_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

Can you give me some suggestions? Thank you so much.

alykhantejani · 2017-08-08T13:21:11Z

Would you be able to post a snippet of code that can reproduce this?

mjchen611 · 2017-08-08T13:47:14Z

@alykhantejani

The code link was :https://github.com/andreasveit/triplet-network-pytorch/blob/master/train.py
The error ocured in train.py -- 136
The error was:

runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')
Reloaded modules: triplet_mnist_loader, triplet_image_loader, tripletnet

Number of params: 21840
Traceback (most recent call last):
File "", line 1, in
runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 258, in
main()

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 116, in main
train(train_loader, tnet, criterion, optimizer, epoch)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 137, in train
for batch_idx, (data1, data2) in enumerate(train_loader):

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 303, in iter
return DataLoaderIter(self)

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 162, in init
w.start()

File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)

File "D:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

Some part of train related codes as follows:
def train(train_loader, tnet, criterion, optimizer, epoch):
losses = AverageMeter()
accs = AverageMeter()
emb_norms = AverageMeter()

switch to train mode

tnet.train()
for batch_idx, (data1, data2, data3) in enumerate(train_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

# compute output
dista, distb, embedded_x, embedded_y, embedded_z = tnet(data1, data2, data3)
# 1 means, dista should be larger than distb
target = torch.FloatTensor(dista.size()).fill_(1)
if args.cuda:
    target = target.cuda()
target = Variable(target)

loss_triplet = criterion(dista, distb, target)
loss_embedd = embedded_x.norm(2) + embedded_y.norm(2) + embedded_z.norm(2)
loss = loss_triplet + 0.001 * loss_embedd

# measure accuracy and record loss
acc = accuracy(dista, distb)
losses.update(loss_triplet.data[0], data1.size(0))
accs.update(acc, data1.size(0))
emb_norms.update(loss_embedd.data[0]/3, data1.size(0))

# compute gradient and do optimizer step
optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch_idx % args.log_interval == 0:
    print('Train Epoch: {} [{}/{}]\t'
          'Loss: {:.4f} ({:.4f}) \t'
          'Acc: {:.2f}% ({:.2f}%) \t'
          'Emb_Norm: {:.2f} ({:.2f})'.format(
        epoch, batch_idx * len(data1), len(train_loader.dataset),
        losses.val, losses.avg, 
        100. * accs.val, 100. * accs.avg, emb_norms.val, emb_norms.avg))

log avg values to somewhere

plotter.plot('acc', 'train', epoch, accs.avg)
plotter.plot('loss', 'train', epoch, losses.avg)
plotter.plot('emb_norms', 'train', epoch, emb_norms.avg)

Thank you so much.

mjchen611 · 2017-08-08T15:43:31Z

@alykhantejani
And I use it in Windows8.1 with Cuda

soumith · 2017-08-30T21:21:16Z

we do not support windows officially yet. Maybe @peterjc123 knows what's wrong.

peterjc123 · 2017-08-31T06:59:11Z

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

ratteripenta · 2017-11-23T06:29:58Z

I can actually verify that setting the num_workers to 0 or 1 helped out. No matter the case, DataLoader always failed with me regardless of dataset with a higher value. The error has to do with multiprocessing with DataLoader:


  File "D:/Opiskelu/PyTorch Tutorials/cnn_transfer_learning_cuda.py", line 76, in <module>
    inputs, classes = next(iter(dataloaders['train']))

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 301, in __iter__
    return DataLoaderIter(self)

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 158, in __init__
    w.start()

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

peterjc123 · 2017-11-23T07:12:06Z

@karmus89 Actually this error only occurs when you try to do multiprocessing on some code with errors in it. It's unexpected that you face with this issue when your code is right. I don't know which version you are using. Can you send a small piece of code that can reproduce your issue?

ratteripenta · 2017-11-23T07:35:57Z

Will do! And remember, I'm a using Windows machine. The code is directly copied from the tutorial PyTorch: Transfer Learning Tutorial. This means that the dataset has to be downloaded and extracted as instructed.

The code to reproduce the error:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}

# The code fill fail here trying to iterate over the DataLoader with multiple num_workers (Windows only)
inputs, classes = next(iter(dataloaders['train']))

And I just made some PyTorch forum posts regarding this. The problem lies with Python's multiprocessing and Windows. Please see this PyTorch discussion reply as I don't want to overly copy paste stuff here.

Edit:

Here's the code that doesn't crash, which at the same time complies with Python's multiprocessing programming guidelines for Windows machines:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

if __name__ == "__main__":
    
    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomSizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'val': transforms.Compose([
            transforms.Scale(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }
    
    data_dir = 'hymenoptera_data'
    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                              data_transforms[x])
                      for x in ['train', 'val']}
    dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                                 shuffle=True, num_workers=4)
                  for x in ['train', 'val']}

    inputs, classes = next(iter(dataloaders['train']))

peterjc123 · 2017-11-23T08:02:05Z

@karmus89 Well, I think I have stated it where the package was published. I'm so sad that you installed the package without reading the notice.

ratteripenta · 2017-11-23T08:03:14Z

@peterjc123 Please see my edited response where I did exactly that. The requirement for wrapping the code inside of if __name__ == '__main__' code isn't immediately obvoius, as it is only required for Windows machines.

Edit:
Regarding the stating of the requirement, I indeed have missed it. I used conda to install the package directly, so I never came accross any introductory requirements. But thanks anyway! And sorry for making you sad!

Edit 2:
Wow, couldn't have known even where to look for that 😄 👍

Dehde · 2018-09-21T12:24:09Z

A question regarding the above. I am running into the above problem within a jupyter notebook. How do you solve this in a jupyter notebook? Wrapping the code in "if name == 'main' " does not change a thing. Does someone know how to translate this to jupyter notebooks?

peterjc123 · 2018-09-21T13:35:48Z

@Dehde What about setting the num_worker of the DataLoader to zero?

Dehde · 2018-09-21T20:29:01Z

@peterjc123
Thanks for the quick reply! I did not fully make myself clear, sorry: Is there a way to run pytorch on windows in jupyter notebook and still use the worker functionality, so not set them to zero? I definitely need parellelized preprocessing.. Thanks for your time!

peterjc123 · 2018-09-22T03:54:50Z

Could you show me the minimal code so that I could reproduce?

Dehde · 2018-09-22T05:47:08Z

@peterjc123
I will edit it into this post here on monday, don't have access to the code right now. Thank you!

As promised, the code I use:

`
if name == 'main':

batch_size = 256

size = (128, 128)
image_datasets = {}
image_datasets["train"] = WaterbodyDataset(masks=train_masks, images=train_imgs,
                                            transform_img=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]))

image_datasets["val"] = WaterbodyDataset(masks=val_masks, images=val_imgs,
                                            transform_img=transforms.Compose([
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                transforms.ToTensor()
                                            ]))

dataloaders = {'train': torch.utils.data.DataLoader(image_datasets['train'], batch_size=batch_size, 
                                                    shuffle=True, num_workers=1),
               'val' : torch.utils.data.DataLoader(image_datasets['val'], batch_size=batch_size, 
                                                   shuffle=False, num_workers=1)}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

hps = HyperParams()
hps.update("name=resnet34_128_deconv_pret00rained_bs32_adam_lr0.0001_wd0_pat5,"
           "arch=resnet34,input_channel=4,freeze=0,deconv=1,opt=adam,debug=0,"
           "weight_decay=0.0,patience=100,pretrained=1,lr=0.0001,print_freq=10,every_x_epoch_eval=1")
pprint(attr.asdict(hps))

model = Model(hps)
model.train(dataloaders)`

The WaterbodyDataset inherits from the pytorch dataset class.

Jerry-Jie-Xie · 2018-11-29T09:09:40Z

I also got the same error. When I set num_workers to 0, the error does not appear again. However, when I set num_workers to 1, the error is still there.

saurabh502 · 2018-12-06T02:41:07Z

When I set num_workers to 0, there is no error.

ghost · 2019-01-13T21:51:58Z

Please i need assistance with this error "BrokenPipeError: [Errno 32] Broken pipe"
code from :https://github.com/higgsfield/np-hard-deep-reinforcement-learning/blob/master/Neural%20Combinatorial%20Optimization.ipynb
i am using windows 10.

MarcinMisiurewicz · 2019-02-27T14:37:32Z

Wrap the code in if __name__ == '__main__':
but for me, nonetheless, the error sometimes appears again. I know it sounds silly, but what helps me is just
rebooting the computer.
Windows 10 here

BramVanroy · 2019-03-12T14:07:06Z

I found that the issue is still present, but only when I use a custom collate_fn.

angeloyeo · 2019-07-23T05:51:15Z

For me, just changing num_workers from 2 to 0 made the code work properly...

cp9612 · 2019-08-02T15:20:40Z

Had same issue when I ran the PyTorch Data Loading and Processing Tutorial. Changing num_workers from 2 to 0 solved the problem, but num_workers = 2 worked fine with other datasets.. I use Windows

divyanshj16 · 2019-08-03T10:18:52Z

num_workers > 0 doesn't work for windows.
Even with the new IterableDataset.

ShoufaChen · 2019-09-11T17:47:04Z

I met this same error. And when I try to find method to solve this problem, the program continues to run automatically (wait about 10 minutes ) amazing 😕

CorentinJ · 2019-09-27T23:23:16Z

I've run the exact same code multiple times with different results. Also, I've copied code that causes a broken pipe to a new file (the contents being exactly the same) and it would run fine. I think there's an external factor in play here. I can't reproduce the bug anymore, but maybe try deleting your __pycache__ directory if there's any.

germanjke · 2020-01-17T22:39:27Z

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

morawi · 2020-03-03T21:48:32Z

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

After using Ubuntu for quire some time, I am trying Windows-10 lately (just for prototyping before using the cluster machine) and bumped into the same error, setting num_workers to 0 helped. Make sure you are setting all dataloaders, train, test, and validate.

PiPiNam · 2020-03-05T01:09:06Z

I also have same problem on Win10. I got the error message '[Errno 32] Broken pipe' when I set the num_workers greater than 0.
And my code is download from Pytorch official tutorial.

I guess that is a bug for Win10, and I am looking forward to see a fixed version on next release.

paleomoon · 2020-03-24T06:28:58Z

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

morawi · 2020-03-24T13:53:18Z

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

Seems that the only way for this to work is using Linux, I am using Windows-10 for prototyping and then pushing everything to the cluster which is based on Linux.

if platform.system()=='Windows': n_cpu= 0

msminhas93 · 2020-04-28T16:05:12Z

I also encountered a similar problem in windows 10 when defining my custom torchvision dataset and trying to run it in jupyter lab. Apparently the custom dataset does not get registered as an attribute to the main module which is called by the DataLoader in the multiprocessing.py\spawn.py file. I fixed it by writing the dataset into a module and then importing it as mentioned here:

https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'RandomPatchExtractor' on <module '__main__' (built-in)>

arnabsinha99 · 2020-05-28T11:50:47Z

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

Setting num_workers to 0 worked for me. Could you explain why is this causing an error?

ltjkoomen · 2020-07-21T08:52:02Z

I have noticed this issue is closed, but I do not think this is fixed. Is there any effort to fix multi-processing dataloader on windows? Currently there are 2 options as far as I know:

wrap it in if __name__ == '__main__':, which does not always work.
do not use multi-processing on windows: if platform.system()=='Windows': n_cpu= 0

So the first one is an imperfect fix, while the second one amounts to just giving up. Is there any effort on fixing multi-processed dataloading on windows currently going on somewhere else or should we re-open this one?

BlackTeaAttenuation · 2020-10-04T15:46:53Z

Use
if __name__ == '__main__' and '__file__' in globals(): instead of if __name__ == '__main__':
That works for me. I use Jupyter notebook and windows 10.

this is the reference

doanhung95wkm · 2020-11-09T15:24:00Z

I got problem when trying to train on my custom Coco dataset (which is little bit difference from default CocoDetection Pytorch class). Add params collate_fn=utils.collate_fn worked for me:
trainloader = torch.utils.data.DataLoader(coco_train, batch_size=2, shuffle=False, num_workers=1, collate_fn=utils.collate_fn)

bigbizze · 2020-11-25T21:57:01Z

If anyone runs into this issue and none of the above works, my problem ended up being that my file name had "-" in it, as opposed to, say, "_", and multiprocessing was unable to resolve the references as a result.

willdone1337 · 2021-05-06T08:09:38Z

you must put all code for train into if name=='main'

smolboii · 2021-05-23T01:33:33Z

another thing is that, at least in my experience with using detectron2, the number of workers has to be <= your cpu cores, unlike with linux. so if you have 12 cpu cores like I do, u can't use more than 12 workers (not that that would be that beneficial to begin with, i suppose).

and with detectron2 in particular, if you use an evaluator this will then double the amount of workers as it creates N additional workers (N being num_workers) for evaluation, while the other workers are not terminated. so with 12 core cpu you can actually only have 6 workers

soumith added this to Uncategorized in Issue Status Aug 23, 2017

soumith closed this as completed Aug 30, 2017

soumith removed this from Uncategorized in Issue Status Aug 31, 2017

schettino72 mentioned this issue Feb 23, 2018

'get_var' fails while the multiprocess execution if any task uses delayed creation. pydoit/doit#164

Closed

ZhangLeUestc mentioned this issue May 9, 2018

Write failed: Broken pipe pytorch/examples#349

Open

andy840314 mentioned this issue Jul 25, 2018

Some error when training model andy840314/QANet-pytorch-#2

Open

mhubii mentioned this issue Oct 15, 2018

Tutorials: downloadable Python examples don't work from-the-box... #12645

Closed

kopytjuk mentioned this issue Dec 25, 2018

Bugfix to run the code on windows machines. martinarjovsky/WassersteinGAN#69

Merged

prajjwal1 mentioned this issue May 21, 2019

Couldn't train the network (covariance_market1501.py) prajjwal1/person-reid-incremental#2

Closed

gml16 mentioned this issue Apr 26, 2020

Segmentation and classification examples throw runtime errors Project-MONAI/MONAI#306

Closed

xuguodong03 mentioned this issue Jul 20, 2020

BrokenPipeError xuguodong03/SSKD#3

Closed

NickleDave mentioned this issue Jan 27, 2021

"BrokenPipeError" on Windows vocalpy/vak#293

Closed

Flova mentioned this issue Feb 27, 2021

BrokenPipeError: [Errno 32] Broken pipe eriklindernoren/PyTorch-YOLOv3#635

Closed

T-H-ash mentioned this issue Mar 13, 2021

Windows 10における実行時のエラー "BrokenPipeError: [Errno 32] Broken pipe" ayukat1016/gan_sample#9

Closed

Daniil-Osokin mentioned this issue Aug 1, 2021

Can we run this model on a CPU only device ? Daniil-Osokin/lightweight-human-pose-estimation.pytorch#204

Closed

tcfkaj mentioned this issue Aug 27, 2021

IOError: [Errno 32] Broken pipe in Windows version apache/mxnet#10562

Closed

xwang233 pushed a commit to xwang233/pytorch that referenced this issue Jan 24, 2023

Remove obselete #if defined(USE_CUDA) (pytorch#2341)

63044e6

akhilpm mentioned this issue Jul 19, 2023

Hello, every time I run the code, there will be a broken pipe error. What is the reason? akhilpm/DroneDetectron2#13

Closed

BrokenPipeError: [Errno 32] Broken pipe #2341

BrokenPipeError: [Errno 32] Broken pipe #2341

Comments

mjchen611 commented Aug 8, 2017 • edited Loading

alykhantejani commented Aug 8, 2017

mjchen611 commented Aug 8, 2017 • edited Loading

switch to train mode

log avg values to somewhere

mjchen611 commented Aug 8, 2017

soumith commented Aug 30, 2017

peterjc123 commented Aug 31, 2017

ratteripenta commented Nov 23, 2017 • edited Loading

peterjc123 commented Nov 23, 2017 • edited Loading

ratteripenta commented Nov 23, 2017 • edited Loading

Edit:

peterjc123 commented Nov 23, 2017 • edited Loading

ratteripenta commented Nov 23, 2017 • edited Loading

Dehde commented Sep 21, 2018

peterjc123 commented Sep 21, 2018

Dehde commented Sep 21, 2018

peterjc123 commented Sep 22, 2018

Dehde commented Sep 22, 2018 • edited Loading

Jerry-Jie-Xie commented Nov 29, 2018

saurabh502 commented Dec 6, 2018

ghost commented Jan 13, 2019

MarcinMisiurewicz commented Feb 27, 2019

BramVanroy commented Mar 12, 2019

angeloyeo commented Jul 23, 2019

cp9612 commented Aug 2, 2019 • edited Loading

divyanshj16 commented Aug 3, 2019

ShoufaChen commented Sep 11, 2019

CorentinJ commented Sep 27, 2019

germanjke commented Jan 17, 2020

morawi commented Mar 3, 2020 • edited Loading

PiPiNam commented Mar 5, 2020

paleomoon commented Mar 24, 2020

morawi commented Mar 24, 2020

msminhas93 commented Apr 28, 2020 • edited Loading

arnabsinha99 commented May 28, 2020

ltjkoomen commented Jul 21, 2020

BlackTeaAttenuation commented Oct 4, 2020 • edited Loading

doanhung95wkm commented Nov 9, 2020

bigbizze commented Nov 25, 2020

willdone1337 commented May 6, 2021

smolboii commented May 23, 2021

mjchen611 commented Aug 8, 2017 •

edited

Loading

mjchen611 commented Aug 8, 2017 •

edited

Loading

ratteripenta commented Nov 23, 2017 •

edited

Loading

peterjc123 commented Nov 23, 2017 •

edited

Loading

ratteripenta commented Nov 23, 2017 •

edited

Loading

peterjc123 commented Nov 23, 2017 •

edited

Loading

ratteripenta commented Nov 23, 2017 •

edited

Loading

Dehde commented Sep 22, 2018 •

edited

Loading

cp9612 commented Aug 2, 2019 •

edited

Loading

morawi commented Mar 3, 2020 •

edited

Loading

msminhas93 commented Apr 28, 2020 •

edited

Loading

BlackTeaAttenuation commented Oct 4, 2020 •

edited

Loading