Causal model running on GPU #7

Warvito · 2020-10-26T21:36:30Z

Hi, I am trying to run the LM model with the causal = True on the GPU but I am getting some issues.

I am trying to run the following example:

import torch
from torch import nn
from performer_pytorch import PerformerLM

model = PerformerLM(
    num_tokens = 20000,
    max_seq_len = 2048,             # max sequence length
    dim = 512,                      # dimension
    depth = 6,                      # layers
    heads = 8,                      # heads
    causal = True,                 # auto-regressive or not
    nb_features = 256,              # number of random features, if not set, will default to (d * log(d)), where d is the dimension of each head
    generalized_attention = False,  # defaults to softmax approximation, but can be set to True for generalized attention
    kernel_fn = nn.ReLU(),          # the kernel function to be used, if generalized attention is turned on, defaults to Relu
    reversible = True,              # reversible layers, from Reformer paper
    ff_chunks = 10,                 # chunk feedforward layer, from Reformer paper
).cuda()

x = torch.randint(0, 20000, (1, 2048)).cuda()
model(x) # (1, 2048, 20000)

And I am getting this error:

Traceback (most recent call last):
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-a530c03a976e>", line 20, in <module>
    model(x) # (1, 2048, 20000)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 253, in forward
    x = self.performer(x, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 238, in forward
    return self.net(x, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/reversible.py", line 160, in forward
    out =  _ReversibleFunction.apply(x, blocks, args)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/reversible.py", line 113, in forward
    x = block(x, **kwarg)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/reversible.py", line 65, in forward
    y1 = x1 + self.f(x2, record_rng=self.training, **f_args)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/reversible.py", line 40, in forward
    return self.net(*args, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 170, in forward
    return self.fn(self.norm(x), **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 216, in forward
    out = self.fast_attention(q, k, v)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 159, in forward
    out = attn_fn(q, k, v)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/performer_pytorch/performer_pytorch.py", line 110, in causal_linear_attention
    return CausalDotProduct.apply(q, k, v)
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/fast_transformers/causal_product/__init__.py", line 48, in forward
    product
TypeError: 'NoneType' object is not callable

My system has:
TITAN RTX
CUDA Version: 10.2
Driver Version: 440.100

The text was updated successfully, but these errors were encountered:

lucidrains · 2020-10-26T22:27:58Z

@Warvito ahh, so not often spoken about is the fact that the auto-regressive flavor of linear attention actually incurs a pretty big memory cost (x sequence length) and requires special CUDA code to be performant (it is probably why google chose to do this in Jax)

EPFL wrote up a nice implementation, but i think it is somehow failing to be imported on your machine https://github.com/idiap/fast-transformers/blob/master/fast_transformers/causal_product/__init__.py#L12

lucidrains · 2020-10-26T22:37:26Z

@Warvito could you try python-ing into the interactive session and run

> import fast_transformers.causal_product.causal_product_cuda

and see what happens?

Warvito · 2020-10-26T23:14:56Z

@lucidrains Thank you for the quick reply.

I tried the command that you asked and I got the following error:

Traceback (most recent call last):
  File "/home/walter/Desktop/minGPT/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-87c39b6d500c>", line 1, in <module>
    import fast_transformers.causal_product.causal_product_cuda
  File "/home/walter/pycharm-2020.1.1/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'fast_transformers.causal_product.causal_product_cuda'

I have the 0.3.0 version installed here, and it works as expected when using casual=False.
I tried to uninstall pytorch-fast-transformers and install it again. But it did not worked.

I had the chance to try also in a system with a V100 and CUDA 11. And it worked as expected.
I also tried it on Google colab with a Tesla T4 and CUDA 10.1. And it worked as expected. Maybe it is something related with the RTX architecture? In any case, it might be a issue from pytorch-fast-transformers.

Thank you again for the quick reply, and thank you very much for all your repositories. ^^

lucidrains · 2020-10-26T23:26:27Z

@Warvito I'm in the dark as much as you are :( I have been putting off custom CUDA code for as long as I could, but the results of this paper was irresistible

arti32lehtonen · 2020-10-27T10:42:28Z

I had the same issue. I am not sure what worked for me but after some steps training with casual=True is working.

my steps:

Add CUDA to the PATH variable
export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
Create LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Create new environment and install fast-transformers as in that issue comment: causal_product_cuda.cu，Error compiling objects for extension idiap/fast-transformers#23 (comment)
Install performer-pytorch after that

yygle · 2020-11-06T07:02:18Z

@arti32lehtonen is right, make sure c++ tool chain (gcc) and cuda tool chain (nvcc) is available in your environment. If not, use export command make it visible (try "nvcc --version" after that), then reinstall the package.

Warvito · 2020-11-12T14:44:00Z

Thx @arti32lehtonen and @yygle !
I tried your suggestions and it worked!

Warvito closed this as completed Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Causal model running on GPU #7

Causal model running on GPU #7

Warvito commented Oct 26, 2020

lucidrains commented Oct 26, 2020 •

edited

lucidrains commented Oct 26, 2020

Warvito commented Oct 26, 2020

lucidrains commented Oct 26, 2020

arti32lehtonen commented Oct 27, 2020

yygle commented Nov 6, 2020

Warvito commented Nov 12, 2020

Causal model running on GPU #7

Causal model running on GPU #7

Comments

Warvito commented Oct 26, 2020

lucidrains commented Oct 26, 2020 • edited

lucidrains commented Oct 26, 2020

Warvito commented Oct 26, 2020

lucidrains commented Oct 26, 2020

arti32lehtonen commented Oct 27, 2020

yygle commented Nov 6, 2020

Warvito commented Nov 12, 2020

lucidrains commented Oct 26, 2020 •

edited