Gradient is incorrect for `torch.nn.functional.nll_loss` for CUDA

## 🐛 Bug

Gradient is incorrect for `torch.nn.functional.nll_loss` for CUDA.

## To Reproduce

```python
import torch
from torch.autograd import gradcheck
from torch.nn.functional import nll_loss, log_softmax

device = "cuda"
reduction = "mean"

torch.manual_seed(0)
input = log_softmax(torch.rand((1, 2), device=device, dtype=torch.float64), dim=1).requires_grad_(True)
target = torch.randint(0, 2, (1,), device=device, dtype=torch.int64)

gradcheck(lambda input, target: nll_loss(input, target, reduction=reduction), (input, target))
```

```
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[-1.0133],
        [ 0.0000]], device='cuda:0', dtype=torch.float64)
analytical:tensor([[-1.],
        [ 0.]], device='cuda:0', dtype=torch.float64)
```

The failure only happens on CUDA and if `reduction="mean"` (default) or `reduction="sum"` is selected. `reduction="none"` uses a different code path than the other two and does not show the behavior.

## Expected behavior

`torch.nn.functional.nll_loss` should pass the gradient check. This is a regression since it does for `torch==1.9.0`. 

## Environment

This was detected in master CI while adding an `OpInfo` for `nll_loss` in  #63854. 



cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @albanD @gqchen @pearu @nikitaved @soulitzer @Lezcano @Varal7 @mruberry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gradient is incorrect for `torch.nn.functional.nll_loss` for CUDA #64163

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gradient is incorrect for torch.nn.functional.nll_loss for CUDA #64163

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Gradient is incorrect for `torch.nn.functional.nll_loss` for CUDA #64163