New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: invalid argument when running tests/attention/test_improved_clustered_transformer_gpu.py #42
Comments
Hi,
It should be divisible by the number of attention heads Could you try the following code and see if it works for you?
Thanks, |
When d_model is 2048, it will also meet the error. I found that when d_model is larger than 1540, the script will meet the runtime error. You can change the query_dim in your script to be 2048 and can reproduce the case. |
Hi, You are right, I get an error when I set query_dim=2048. However, note that d_model is not equivalent to query_dim. query_dim refers to the query embedding size used for a single attention head, With that said, there are two sources of bug which arise due to the use of shared memory optimization in our Cuda implementations. I will write down the constraints placed by these sources
Hashing Kernel is used by both clustered and improved-clustered attention. Sparse Dot Product is only used by improved-clustered. Note that the constraint is required to be met for a single attention head, you can still set n_heads to anything to get a much higher d_model. For instance, if you set query_dim = 192 and n_heads = 30, it gives a d_model of 5760 and runs without error. Is it only for testing or do you need the query_dim of each head to be that high? I will keep the bug open to provide fixes to some of these or graceful exits. Thanks, |
Thank you for your reply. |
I wouldn't advise changing the sparse dot product as the first step. I would rather introduce a linear layer that would project from 2048 to d_model where d_model = query_dim * n_heads. I have provided an example of using the Transformer on top of arbitrary features:
Let me know if this works or you have other questions. --Apoorv |
Many thanks for your reply. |
Hi, I will try to help. Given that your code is running, there could only be a few things going wrong.
I would suspect this to be the cause of the issue. If it is, I would advise to use
For other practical tips or an example of improved-clustered on a toy task you can now look at colab notebook we provide I could help more if you could share the transformer architecture with some dummy input passed over a colab notebook or here. Thanks, |
I am assuming that this was resolved. I will close this issue. Feel free to open again. Thanks, |
I have changed some hyperparameters of test_improved_clustered_transformer_gpu.py as shown in the following figure
When 'input length' is 475 and 'd_model' is larger than 1540, the script will meet the "RuntimeError: CUDA error: invalid argument."
Could you tell me why it happened?
The text was updated successfully, but these errors were encountered: