-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
안녕하세요
Inference 코드를 실행할 때는 에러가 나지 않지만
Training 코드 실행 시 에러가 납니다.
import torch
import torchvision
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
BATCH = 32
model = torchvision.models.resnet50(num_classes=10)
model = model.cuda()
model.train()
loss_fn = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
input_shape = [BATCH, 3, 32, 32]
dummy_input = torch.randn(*input_shape).cuda()
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=True)
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)
label = torch.zeros(BATCH, dtype=torch.long).cuda()
loss = loss_fn(output, label)
loss.backward()
optimizer.step()
위의 코드대로 실행 시 prepare에서
TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
With rtol=1e-05 and atol=1e-05, found 297 element(s) (out of 320) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0005762577056884766 (-0.9435920119285583 vs. -0.9441682696342468), which occurred at index (15, 3).
위와 같은 에러가 나서 어떤 부분이 잘못된 것인지 질문드립니다.
환경 :
Ubuntu : 18.04
Linux : 5.4.0
Pytorch : 1.7.0
Python : 3.7.10
cuDNN, CUDA는 각각 Nimble에서 요구하는 환경입니다.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels