-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalmodule: nnRelated to torch.nnRelated to torch.nnmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Bug
Gradient is incorrect for torch.nn.functional.nll_loss
for CUDA.
To Reproduce
import torch
from torch.autograd import gradcheck
from torch.nn.functional import nll_loss, log_softmax
device = "cuda"
reduction = "mean"
torch.manual_seed(0)
input = log_softmax(torch.rand((1, 2), device=device, dtype=torch.float64), dim=1).requires_grad_(True)
target = torch.randint(0, 2, (1,), device=device, dtype=torch.int64)
gradcheck(lambda input, target: nll_loss(input, target, reduction=reduction), (input, target))
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[-1.0133],
[ 0.0000]], device='cuda:0', dtype=torch.float64)
analytical:tensor([[-1.],
[ 0.]], device='cuda:0', dtype=torch.float64)
The failure only happens on CUDA and if reduction="mean"
(default) or reduction="sum"
is selected. reduction="none"
uses a different code path than the other two and does not show the behavior.
Expected behavior
torch.nn.functional.nll_loss
should pass the gradient check. This is a regression since it does for torch==1.9.0
.
Environment
This was detected in master CI while adding an OpInfo
for nll_loss
in #63854.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @albanD @gqchen @pearu @nikitaved @soulitzer @lezcano @Varal7 @mruberry
Metadata
Metadata
Assignees
Labels
high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalRelated to torch.autograd, and the autograd engine in generalmodule: nnRelated to torch.nnRelated to torch.nnmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module