New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus error (core dumped) model share memory #2244

acrosson opened this Issue Jul 29, 2017 · 3 comments


None yet
2 participants
Copy link

acrosson commented Jul 29, 2017

I'm getting a Bus error (core dumped) when using the share_memory method on a model.

OS : Ubuntu 16.04
It's happening in python 2.7 and 3.5, conda environment and hard install. I'm using the latest version from I've also tried installing from source, same issue.

I tried doing a basic test using this code:

import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(2563*50, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x,
        x = self.fc2(x)
        return F.log_softmax(x)

n = Net()

If the input size is small it works fine, but anything greater than some threshold throws the Bus error. If I don't call share_memory() it works fine.

I ran trace, here are the last few lines of the output.             if module is not None and module not in memo:                                                                                                                                 [567/1904]                 memo.add(module)                 yield name, module             yield module             module._apply(fn)
 --- modulename: module, funcname: _apply         for module in self.children():
 --- modulename: module, funcname: children         for name, module in self.named_children():
 --- modulename: module, funcname: named_children         memo = set()         for name, module in self._modules.items():         for param in self._parameters.values():             if param is not None:        = fn(
 --- modulename: module, funcname: <lambda>         return self._apply(lambda t: t.share_memory_())
 --- modulename: tensor, funcname: share_memory_
 --- modulename: storage, funcname: share_memory_         from torch.multiprocessing import get_sharing_strategy
 --- modulename: _bootstrap, funcname: _handle_fromlist
<frozen importlib._bootstrap>(1006): <frozen importlib._bootstrap>(1007): <frozen importlib._bootstrap>(1012): <frozen importlib._bootstrap>(1013): <frozen importlib._bootstrap>(1012): <frozen importlib._bootstra
p>(1025):         if self.is_cuda:         elif get_sharing_strategy() == 'file_system':
 --- modulename: __init__, funcname: get_sharing_strategy     return _sharing_strategy             self._share_fd_()
Bus error (core dumped)

I tried running gdb, but it wont give me a full trace.

I've tried creating a symbolic link to the as I suspect it's a similar issue, but still the same error.

Any suggestions? This is running inside a docker container btw.


This comment has been minimized.

Copy link

acrosson commented Jul 29, 2017

Okay. I think I solved it. Looks like the shared memory of the docker container wasn't set high enough. Setting a higher amount by adding --shm-size 8G to the docker run command seems to be the trick as mentioned here. Let me fully test it, if solved I'll close issue.


This comment has been minimized.

Copy link

acrosson commented Jul 30, 2017

Works fine now!


This comment has been minimized.

Copy link

dneprDroid commented Sep 13, 2018

@acrosson Do you have experience with Google Cloud ML? Sorry for disturbing, but I got this error on cloud ml job with machine params standard_gpu (NVIDIA Tesla K80 GPU, 30GB memory).
How can I configure --shm-size param on Cloud ML Job?

My config.yaml file:

  scaleTier: CUSTOM
  masterType: standard_gpu
  workerType: standard_gpu
  workerCount: 1
  parameterServerCount: 1
  parameterServerType: standard_gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment