Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issues #18

Open
mxtsai opened this issue Jun 5, 2019 · 12 comments

Comments

@mxtsai
Copy link

@mxtsai mxtsai commented Jun 5, 2019

Hi Luke,

Thank you for the awesome work. I tried running EfficientNet-B0 on my GTX 1070 (8GB RAM) with an input batch of dimension [44x1x256x256] (single channel image) and I am running into 'CUDA out of memory' (with the model in 'training' mode).

I tried running another implementation and wasn't getting this issue, and after digging in the code, it seems as if the implementation for MBConv (or the re-iteration of MBConv) was too memory hungry.

I really like your implementation of EfficientNet and if I did have more time, I would definitely have a deeper dive into your code. At the mean time, if possible, could you help me check this issue out (maybe it'll speed up training in the future?) ? Thank you!

@lukemelas

This comment has been minimized.

Copy link
Owner

@lukemelas lukemelas commented Jun 6, 2019

Hello, thanks for filing the issue. I'll look into it this weekend.

@jlevy44

This comment has been minimized.

Copy link

@jlevy44 jlevy44 commented Jun 9, 2019

I think I'm running into a similar issue as well.

@crazyblacker

This comment has been minimized.

Copy link

@crazyblacker crazyblacker commented Jun 16, 2019

I meet the same issue when I exchange backbone with B3. Model size become smaller but out of memory on my 2080Ti(11G).

@dami23

This comment has been minimized.

Copy link

@dami23 dami23 commented Jun 20, 2019

Hi @lukemelas. Thank you for your awesome work. I find the newly released version requires more GPU memory than the original one. The same project use B3 model can be run with the older version, but after the upgrade, this code cannot be run anymore on the same workstation and with error"RuntimeError: CUDA out of memory". Could you please give me any advices to face this problem? Thank you!!

@lukemelas

This comment has been minimized.

Copy link
Owner

@lukemelas lukemelas commented Jun 20, 2019

Hi @dami23 , that's strange because nothing about the B3 model changed in any way during the update (apart from the addition of drop_connect). In fact, you can see all the changes that were made here: 9a0d45f

Can you try measuring memory usage with the two versions to confirm the issue?

@xbpgithub

This comment has been minimized.

Copy link

@xbpgithub xbpgithub commented Jun 25, 2019

Hello,I also meet the similar memory issue. EfficinetNet-b3 consumes more memory than Resnet50 for FPN, is it normal? Can someone provide some help?

@twmht

This comment has been minimized.

Copy link

@twmht twmht commented Jul 11, 2019

any update on this?

@seilna

This comment has been minimized.

Copy link

@seilna seilna commented Jul 15, 2019

Similar problems with @xbpgithub, and it seems that in MBConvBlock(), relu_fn(x) and self._bn are memory bottleneck. Replacing them with Inplace BatchNorm was helpful to significantly reduce memory usage, but until now it does not support swish activation function (it only supports invertible activation functions like leaky_relu).
(I'm not sure it is right direction to focus on batchnorm to reduce memory usage, but hope it can be helpful)

@seilna

This comment has been minimized.

Copy link

@seilna seilna commented Jul 16, 2019

I also observed that in relu_fn(), using custom op for swish activation function (instead of return x * torch.sigmoid(x)) was helpful to reduce memory usage (up to 30%) as follows:

sigmoid = torch.nn.Sigmoid()
class Swish(torch.autograd.Function):
    @staticmethod
    def forward(ctx, i):
        result = i * sigmoid(i)
        ctx.save_for_backward(i)
        return result

    @staticmethod
    def backward(ctx, grad_output):
        i = ctx.saved_variables[0]
        sigmoid_i = sigmoid(i)
        return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))

swish = Swish.apply

class Swish_module(nn.Module):
    def forward(self, x):
        return swish(x)

swish_layer = Swish_module()
def relu_fn(x):
    """ Swish activation function """
    # return x * torch.sigmoid(x)
    return swish_layer(x)
@jackytu256

This comment has been minimized.

Copy link

@jackytu256 jackytu256 commented Aug 1, 2019

@seilna
perhaps, the h-swish (provided by MobilenetV3) may be another option to deal with this issue?

class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.relu = nn.ReLU6(inplace=inplace)

    def forward(self, x):
        return self.relu(x + 3) / 6


class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.sigmoid = h_sigmoid(inplace=inplace)

    def forward(self, x):
        return x * self.sigmoid(x)

alt text

@tmabraham

This comment has been minimized.

Copy link

@tmabraham tmabraham commented Oct 5, 2019

Any update on this? @DrHB mentioned this over here and it would be nice if this repository was updated with this fix.

@lukemelas

This comment has been minimized.

Copy link
Owner

@lukemelas lukemelas commented Oct 24, 2019

The latest version of the repo (released a week ago) should fix this issue!

A memory-efficient swish function is now used by default. However, this swish function is incompatible with model exporting (ONNX), so you also have the option of using model.set_swish(memory_efficient=False).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.