AssertionError: more than one group is unsupported on GPU #386

zoeStartover · 2022-06-17T09:40:49Z

I met a issue when I used Conv2d in model.

Assertion as follow:
File "/home/data/anaconda3/anaconda/envs/mpc/lib/python3.7/site-packages/crypten/cuda/cuda_tensor.py", line 195, in __patched_conv_ops
), f"more than one group is unsupported on GPU (groups = {groups})"
AssertionError: more than one group is unsupported on GPU (groups = 256)

256 is the channel in conv2d.
I define the model as "self.conv3 = nn.Conv2d(128, 256, 5, 1, 2)"

How can I solve the problem?

lvdmaaten · 2022-08-20T22:25:40Z

Can you please provide a minimal repro so that I can reproduce this issue?

sonnguyenasu · 2022-10-31T23:46:27Z

Hello,
I think the problem is with this line in gradient calculation

CrypTen/crypten/gradients.py

Line 1733 in efe8eda

grad_kernel = input.conv2d(

Basically on GPU you have to make groups=1, but this line in conv2d grad computation has groups > 1, which will cause the problem for all model that has convolution operation.

zoeStartover · 2022-11-18T05:58:47Z

Hello, I think the problem is with this line in gradient calculation

CrypTen/crypten/gradients.py

Line 1733 in efe8eda

grad_kernel = input.conv2d(

Basically on GPU you have to make groups=1, but this line in conv2d grad computation has groups > 1, which will cause the problem for all model that has convolution operation.

Unfortunately I didn't have the code snnipet which caused this problem after that, so I didn't post it.
But I think you are right. It's very kind of you. Do you have any solutions?

Tobias512 · 2022-11-24T10:48:44Z

Can you please provide a minimal repro so that I can reproduce this issue?

Hi,
I ran into the same issue. Here is some code that produces the error:

import torchvision
import crypten

class model_CNN(torch.nn.Module):
    def __init__(self):
        super(model_CNN, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 16, kernel_size=5, padding=0)
        self.fc1 = torch.nn.Linear(16 * 12 * 12, 100)
        self.fc2 = torch.nn.Linear(100, 10)
        
    def forward(self, x):
        out = self.conv1(x)
        out = torch.nn.functional.relu(out)
        out = torch.nn.functional.max_pool2d(out, 2)
        out = out.view(-1, 16 * 12 * 12)
        out = self.fc1(out)
        out = torch.nn.functional.relu(out)
        out = self.fc2(out)
        return out

if __name__ == "__main__":
    crypten.init()

    # load data
    train_data = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=torchvision.transforms.Compose([
         torchvision.transforms.ToTensor(),
         torchvision.transforms.Normalize((0.1307,), (0.3081,))
    ]))
    train_loader = torch.utils.data.DataLoader(train_data, batch_size=100, shuffle=False, pin_memory=True)
    data, labels = next(iter(train_loader))

    data_enc = crypten.cryptensor(data).cuda()
    label_enc = crypten.cryptensor(torch.nn.functional.one_hot(labels)).cuda()

    # load model
    model_plaintext = model_CNN()
    dummy_input = torch.empty(size=(1, 1, 28, 28))
    model = crypten.nn.from_pytorch(model_plaintext, dummy_input)
    model.encrypt()
    model.cuda()

    loss_fn = crypten.nn.CrossEntropyLoss()

    output = model(data_enc)
    loss = loss_fn(output, label_enc)
    loss.backward()  # AssertionError occurs during backward pass of Conv2D layer

The AssertionError is during the backward pass of the Conv2D layer in the model. Other models without Conv2D layer work without problem.

Tobias512 · 2022-12-09T12:50:07Z

Hi,
I found a fix for the backward pass working. You need to change this function:

CrypTen/crypten/cuda/cuda_tensor.py

Lines 190 to 223 in 909df45

    
           def __patched_conv_ops(op, x, y, *args, **kwargs): 
        
               if "groups" in kwargs: 
        
                   groups = kwargs["groups"] 
        
                   assert ( 
        
                       groups == 1 
        
                   ), f"more than one group is unsupported on GPU (groups = {groups})" 
        
                   del kwargs["groups"] 
        
               bs, c, *img = x.size() 
        
               c_out, c_in, *ks = y.size() 
        
               kernel_elements = functools.reduce(operator.mul, ks) 
        
               nb = 3 if kernel_elements < 256 else 4 
        
               nb2 = nb**2 
        
               x_encoded = CUDALongTensor.__encode_as_fp64(x, nb).data 
        
               y_encoded = CUDALongTensor.__encode_as_fp64(y, nb).data 
        
               repeat_idx = [1] * (x_encoded.dim() - 1) 
        
               x_enc_span = x_encoded.repeat(nb, *repeat_idx) 
        
               y_enc_span = torch.repeat_interleave(y_encoded, repeats=nb, dim=0) 
        
               x_enc_span = x_enc_span.transpose_(0, 1).reshape(bs, nb2 * c, *img) 
        
               y_enc_span = y_enc_span.reshape(nb2 * c_out, c_in, *ks) 
        
               c_z = c_out if op in ["conv1d", "conv2d"] else c_in 
        
               z_encoded = getattr(torch, op)( 
        
                   x_enc_span, y_enc_span, *args, **kwargs, groups=nb2 
        
               ) 
        
               z_encoded = z_encoded.reshape(bs, nb2, c_z, *z_encoded.size()[2:]).transpose_( 
        
                   0, 1 
        
               ) 
        
               return CUDALongTensor.__decode_as_int64(z_encoded, nb)

The function needs to be changed as follows:

def __patched_conv_ops(op, x, y, *args, **kwargs):
        if "groups" in kwargs:
            groups = kwargs["groups"]
            del kwargs["groups"]
        else:
            groups = 1

        bs, c, *img = x.size()
        c_out, c_in, *ks = y.size()
        kernel_elements = functools.reduce(operator.mul, ks)

        nb = 3 if kernel_elements < 256 else 4
        nb2 = nb**2

        x_encoded = CUDALongTensor.__encode_as_fp64(x, nb).data
        y_encoded = CUDALongTensor.__encode_as_fp64(y, nb).data

        repeat_idx = [1] * (x_encoded.dim() - 1)
        x_enc_span = x_encoded.repeat(nb, *repeat_idx)
        y_enc_span = torch.repeat_interleave(y_encoded, repeats=nb, dim=0)

        x_enc_span = x_enc_span.transpose_(0, 1).reshape(bs, nb2 * c, *img)
        y_enc_span = y_enc_span.reshape(nb2 * c_out, c_in, *ks)

        c_z = c_out if op in ["conv1d", "conv2d"] else c_in

        z_encoded = getattr(torch, op)(
            x_enc_span, y_enc_span, *args, **kwargs, groups=(nb2 * groups)
        )
        z_encoded = z_encoded.reshape(bs, nb2, c_z, *z_encoded.size()[2:]).transpose_(
            0, 1
        )
        return CUDALongTensor.__decode_as_int64(z_encoded, nb)

I successfully trained models with this fix using a GPU, so I am pretty sure the backpropagation is calculated correctly.

kwmaeng91 · 2023-06-27T02:53:25Z

Are there any updates on this? I am experiencing the same problem, and @Tobias512 's solution does not work. It gives me some dimension mismatch, which I haven't looked into too closely yet.

Tobias512 · 2023-06-27T05:29:18Z

I found a better solution looking at CryptGPU. They implement it as follows:

https://github.com/jeffreysijuntan/CryptGPU/blob/2ff57b2b4d718f9665f4b2ac8245b0bcd7e65165/crypten/cuda/cuda_tensor.py#L183-L217

I changed the CrypTen implementation to:

@staticmethod
def __patched_conv_ops(op, x, y, *args, **kwargs):
        if "groups" in kwargs:
            groups = kwargs["groups"]
            #del kwargs["groups"]
        else:
            groups = 1

        bs, c, *img = x.size()
        c_out, c_in, *ks = y.size()
        kernel_elements = functools.reduce(operator.mul, ks)

        nb = 3 if kernel_elements < 256 else 4
        nb2 = nb**2

        x_encoded = CUDALongTensor.__encode_as_fp64(x, nb).data
        y_encoded = CUDALongTensor.__encode_as_fp64(y, nb).data

        repeat_idx = [1] * (x_encoded.dim() - 1)
        x_enc_span = x_encoded.repeat(nb, *repeat_idx)
        y_enc_span = torch.repeat_interleave(y_encoded, repeats=nb, dim=0)

        x_enc_span = x_enc_span.transpose_(0, 1).reshape(bs, nb2 * c, *img)
        y_enc_span = y_enc_span.reshape(nb2 * c_out, c_in, *ks)

        c_z = c_out if op in ["conv1d", "conv2d"] else c_in

        if "groups" in kwargs:
            kwargs["groups"] *= nb2
        else:
            kwargs["groups"] = nb2

        z_encoded = getattr(torch, op)(
            x_enc_span, y_enc_span, *args, **kwargs
        )

        groups = kwargs["groups"] // nb2 if op in ["conv_transpose1d", "conv_transpose2d"] else 1
        z_encoded = z_encoded.reshape(bs, nb2, c_z * groups, *z_encoded.size()[2:]).transpose_(
            0, 1
        )

        return CUDALongTensor.__decode_as_int64(z_encoded, nb)

It nearly the same code a before but the groups argument is set differently depending on if groups is in kwargs. @kwmaeng91 I hope this really fixes the bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: more than one group is unsupported on GPU #386

AssertionError: more than one group is unsupported on GPU #386

zoeStartover commented Jun 17, 2022

lvdmaaten commented Aug 20, 2022

sonnguyenasu commented Oct 31, 2022

zoeStartover commented Nov 18, 2022

Tobias512 commented Nov 24, 2022 •

edited

Loading

Tobias512 commented Dec 9, 2022

kwmaeng91 commented Jun 27, 2023

Tobias512 commented Jun 27, 2023

AssertionError: more than one group is unsupported on GPU #386

AssertionError: more than one group is unsupported on GPU #386

Comments

zoeStartover commented Jun 17, 2022

lvdmaaten commented Aug 20, 2022

sonnguyenasu commented Oct 31, 2022

zoeStartover commented Nov 18, 2022

Tobias512 commented Nov 24, 2022 • edited Loading

Tobias512 commented Dec 9, 2022

kwmaeng91 commented Jun 27, 2023

Tobias512 commented Jun 27, 2023

Tobias512 commented Nov 24, 2022 •

edited

Loading