Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) #106

Open
aleemsidra opened this issue Jul 28, 2023 · 4 comments

Comments

@aleemsidra
Copy link

aleemsidra commented Jul 28, 2023

Hi! I am trying to use LoRA for my convolution layers: self.conv = Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False). I used lora counterpart of nn. Conv2D aslora.Conv2d(n_chans_in, n, self.kernel_size, padding=self.padding, bias=False, r=2, lora_alpha= 2)

The shape of tensors is: x.shape = torch.Size([32, 1, 256, 256]), self.lora_B = torch.Size([48, 6]), self.lora_A.shape = torch.Size([6, 3]).

The part (self.lora_B @ self.lora_A).view(self.conv.weight.shape) faces the following issue:

/Documents/Domain_Apatation/UDAS/src/LoRA/loralib/layers.py:315, in forward(self, x)
    312 if self.r > 0 and not self.merged:
    313     return self.conv._conv_forward(
--> 314         x, 
    315 
    316         self.conv.weight + (self.lora_B @ self.lora_A).view(self.conv.weight.shape) * self.scaling,
    317         self.conv.bias
    318     )
    319 return self.conv(x)

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

The number of columns in self.lora_B is 6 and number of rowsself.lora_Ais 6, which makes matrix multiplication valid. But still I face this issue. Can you please help me resolve this bug.

@edwardjhu
Copy link
Collaborator

Can you try this operation on CPU to exclude GPU-related issues?

@aleemsidra
Copy link
Author

aleemsidra commented Jul 31, 2023

@edwardjhu I did that as:

lora_b =self.lora_B.detach().cpu()
lora_b.shape
(48, 6)
lora_a =self.lora_A.detach().cpu()
lora_a.shape
(6, 3)

Given the dimensions lora_b @ lora_a are comaptible for matrix multiplication.

self.conv.weight.shape
torch.Size([16, 1, 3, 3])

Now I tested the following by replacing view with reshape, it worked

a = self.conv._conv_forward(x.detach().cpu(), self.conv.weight.detach().cpu() + (self.lora_B.detach().cpu() @ self.lora_A.detach().cpu()).reshape(self.conv.weight.detach().cpu().shape) * self.
    ...: scaling,  self.conv.bias )

I want to understand that why this thing didnot work on CUDA. since inputs are all the same? I would like to process my computation on GPU.

@edwardjhu
Copy link
Collaborator

Does reshape resolve the issue on GPU as well?

@aleemsidra
Copy link
Author

no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants