CUDA problems in causal linear product #58

xyltt · 2020-12-26T07:25:31Z

Hi,
My machine has 4 gpus, but when I use the gpu-1 (where the default gpu is 0), I found the cuda code be computed on the gpu-0. And, the code can not be computed when I use multiple gpus one time. There is a out of memory error.

boredtylin · 2020-12-26T09:51:15Z

Same issue here. When the data are put in devices other than cuda:0, the output is always zero's.

To reproduce the err:

import torch
from fast_transformers.causal_product import causal_dot_product

q = k = v = torch.randn(5, 10, 10, 10).to(0)
print(causal_dot_product(q, k, v)) # this should produce the right result.

q = k = v = torch.randn(5, 10, 10, 10).to(1)
print(causal_dot_product(q, k, v)) #the output is all zero's

katie-cathy-hunt · 2021-03-01T03:30:04Z

Hi @angeloskath!
When do you plan to fix the bug?

angeloskath · 2021-03-01T16:03:37Z

@katie-cathy-hunt I will push a fix today. Sorry this took so long.

Cheers,
Angelos

katie-cathy-hunt · 2021-03-01T18:01:49Z

@angeloskath
Thanks for the quick response and help!

bbelgodere · 2021-03-01T22:11:04Z

@angeloskath I just rebuilt my environment to try your patch, but am running into a new issue

import torch
>>> from fast_transformers.causal_product import causal_dot_product
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dccstor/bmbelgod1/projects/fast-transformers/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ModuleNotFoundError: No module named 'fast_transformers.causal_product.causal_product_cpu'

I can import fast_transformers, but if I try to import fast_transformers.causal_product I get the same error.

I verified I had pulled your fix

 sed -n 59,63p fast_transformers/aggregate/aggregate_cuda.cu
) {
    // Make sure that we are using the correct GPU device
    torch::DeviceGuard _guard(X.device());

    int N = X.size(0);

and it's in the environment

pip list | grep fast
pytorch-fast-transformers 0.3.0

No errors in the build/install log

angeloskath · 2021-03-01T22:18:41Z

Hmm, that is weird. What did you do to rebuild? Could I bother you to do a rm -r build and then rebuild?

(Next step should be to provide prebuilt binaries for common setups to avoid all these issues)

bbelgodere · 2021-03-02T02:10:32Z

I thought I may have induced the error myself, I am using a conda environment with cuda installed via conda, which only installs the shared libraries, not nvcc. Looked through your setup.py and it doesn't produce an error/message if it doesn't find nvcc. I then loaded the module to add cuda 11 (same version pytorch is compiled against) into my path.

I verified call(["nvcc"], stdout=DEVNULL, stderr=DEVNULL) returned 1

then I removed build and dist directories, then python setup.py install

Still no luck

python
Python 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 21:08:20)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from fast_transformers.causal_product import causal_dot_product
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dccstor/bmbelgod1/projects/fast-transformers/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ModuleNotFoundError: No module named 'fast_transformers.causal_product.causal_product_cpu'

This is on RHEL 8.2, Python 3.7.9, Pytorch 1.7.1

bbelgodere · 2021-03-02T17:47:41Z

@angeloskath I apologize, everything is working correctly, I started a python repl in the fast transformers directory after the install and it was looking for a local library first since there is a subdirectory called fast transformers... my mistake

angeloskath added the bug Something isn't working label Jan 31, 2021

angeloskath closed this as completed in fc76505 Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA problems in causal linear product #58

CUDA problems in causal linear product #58

xyltt commented Dec 26, 2020

boredtylin commented Dec 26, 2020

katie-cathy-hunt commented Mar 1, 2021

angeloskath commented Mar 1, 2021

katie-cathy-hunt commented Mar 1, 2021

bbelgodere commented Mar 1, 2021

angeloskath commented Mar 1, 2021

bbelgodere commented Mar 2, 2021

bbelgodere commented Mar 2, 2021

CUDA problems in causal linear product #58

CUDA problems in causal linear product #58

Comments

xyltt commented Dec 26, 2020

boredtylin commented Dec 26, 2020

katie-cathy-hunt commented Mar 1, 2021

angeloskath commented Mar 1, 2021

katie-cathy-hunt commented Mar 1, 2021

bbelgodere commented Mar 1, 2021

angeloskath commented Mar 1, 2021

bbelgodere commented Mar 2, 2021

bbelgodere commented Mar 2, 2021