Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank for your work! Please comment,when training ,report another error. #2

Open
showfaker66 opened this issue Jan 6, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@showfaker66
Copy link

RuntimeError: CUDA error: device-side assert triggered.
` for i, data in enumerate(dataloader):
inputs, labels = data
# inputs, labels = Variable(inputs), Variable(labels)-1
inputs = inputs.squeeze(0).to(device)
labels = labels.to(device, dtype=torch.long)

    optimizer.zero_grad()
    outputs = model(inputs).expand(1, -1, -1)

    loss = criterion(outputs[0], labels[0])`
@matyasbohacek
Copy link
Owner

Thank you for reporting! Could you please provide the full error trace? Thank you. (It is always ideal to have the CUDA_LAUNCH_BLOCKING=1 flag when running, so any low-CUDA errors shall be triggered)

@matyasbohacek matyasbohacek self-assigned this Jan 27, 2022
@matyasbohacek matyasbohacek added the bug Something isn't working label Jan 27, 2022
@showfaker66
Copy link
Author

Thank you for you reply! The complete error appears below.
C:/cb/pytorch_1000000000000/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "train.py", line 274, in
train(args)
File "train.py", line 174, in train
train_loss, _, , train_acc = train_epoch(slrt_model, train_loader, cel_criterion, sgd_optimizer, device)
File "I:\action_recognition\spoter-main-hand-sign\spoter\utils.py", line 25, in train_epoch
loss.backward()
File "D:\anaconda\envs\ctpgr\lib\site-packages\torch\tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\anaconda\envs\ctpgr\lib\site-packages\torch\autograd_init
.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered

@muhammad-ahmed-ghani
Copy link

@matyasbohacek Hi have you resolved this error ? I am also getting the same error

@RodGal-2020
Copy link

I'm having the same problem, is there a solution for it?

@RodGal-2020
Copy link

Hey, I have found a solution!

Go to datasets/czech_slr_dataset.py, and around line 105, find the following:

label = torch.Tensor([self.labels[idx] - 1])

That -1 is the cause of our problems, because while working with WLASL100, labels go from 0 to 99 and, as a result, when we call the class CzechSLRDataset, we recieve something like tensor([[-1]]), but there is no class labelled with -1. This explains the CUDA error and the t >= 0 & t < num_labels.

Taking that into account, the following fix worked for me:

label = torch.Tensor([self.labels[idx]]) # Just drop the "-1"

Hope this helps! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants