Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError:CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #5

Open
hxzwqw opened this issue Sep 9, 2022 · 1 comment
Open

RuntimeError:CUDA error: CUBLAS_STATUS_EXECUTION_FAILED #5

hxzwqw opened this issue Sep 9, 2022 · 1 comment

Comments

@hxzwqw
Copy link

hxzwqw commented Sep 9, 2022

COMMAND: python main.py --encoder cnn --decoder rnn --encoder-dropout 0.05 --decoder-dropout 0.2

Namespace(batch_size=1, cuda=True, decoder='rnn', decoder_dropout=0.2, decoder_hidden=256, dims=6, dynamic_graph=True, edge_types=4, encoder='cnn', encoder_dropout=0.05, encoder_hidden=256, epochs=500, factor=True, gamma=0.5, hard=True, load_folder='', lr=0.0005, lr_decay=200, no_cuda=False, no_factor=False, num_residues=77, number_exp=56, number_expstart=0, prediction_steps=1, prior=True, save_folder='logs', seed=42, skip_first=True, temp=0.5, timesteps=50, var=5e-05)
Testing with dynamically re-computed graph.
Using factor graph CNN encoder.
Using learned recurrent interaction net decoder.
Using prior
[0.91 0.03 0.03 0.03]
Start Training...
Traceback (most recent call last):
File "main.py", line 419, in
epoch, best_val_loss)
File "main.py", line 209, in train
logits = encoder(data, rel_rec, rel_send)
File "/home/hxz/anaconda3/envs/netw_2.3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/hxz/Mdel/NRI-MD/NRI-MD/modules.py", line 213, in forward
edges = self.node2edge_temporal(inputs, rel_rec, rel_send)
File "/media/hxz/Mdel/NRI-MD/NRI-MD/modules.py", line 182, in node2edge_temporal
receivers = torch.matmul(rel_rec, x)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Description:
I have tried the default pdb file and even the default setting(without cnn or rnn),the error always occur.
I have also tried to follow the error report and tried to debug, it seems endless to track.
Thanks if someone or the authur can give me some advice.

@DivingWhale
Copy link

I met the same situation. Did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants