JIT related failures using Core nightly 20220426

### 🐛 Describe the bug

The latest main branch [fails](https://app.circleci.com/pipelines/github/pytorch/vision/16983/workflows/d1839dd6-4cef-4db5-a9a7-dfdc43d07d03/jobs/1376924) the `cmake_*` jobs with the following error:
```
+ ./test_frcnn_tracing
Loading model
Model loaded
terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/torchvision/models/detection/faster_rcnn.py", line 93, in forward
    roi_heads = self.roi_heads
    image_sizes = images0.image_sizes
    _26 = (roi_heads).forward(features0, proposals, image_sizes, targets4, )
           ~~~~~~~~~~~~~~~~~~ <--- HERE
    detections, detector_losses, = _26
    transform0 = self.transform
  File "code/__torch__/torchvision/models/detection/roi_heads.py", line 341, in forward
      labels8 = torch.index(labels7, _104)
      nms_thresh = self.nms_thresh
      keep0 = _90(boxes4, scores3, labels8, nms_thresh, )
              ~~~ <--- HERE
      detections_per_img = self.detections_per_img
      keep1 = torch.slice(keep0, 0, None, detections_per_img)
  File "code/__torch__/torchvision/ops/boxes.py", line 49, in batched_nms
    _14 = False
  if _14:
    _17 = _9(boxes, scores, idxs, iou_threshold, )
          ~~ <--- HERE
    _16 = _17
  else:
  File "code/__torch__/torchvision/ops/boxes.py", line 74, in _batched_nms_vanilla
    iou_threshold: float) -> Tensor:
  _26 = __torch__.torch.functional._return_output
  keep_mask = torch.zeros_like(scores, dtype=11)
              ~~~~~~~~~~~~~~~~ <--- HERE
  _27 = _26(idxs, True, False, False, None, )
  for _28 in range(torch.len(_27)):

Traceback of TorchScript, original code (most recent call last):
  File "/root/project/torchvision/models/detection/generalized_rcnn.py", line 105, in forward
            features = OrderedDict([("0", features)])
        proposals, proposal_losses = self.rpn(images, features, targets)
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
                                      ~~~~~~~~~~~~~~ <--- HERE
        detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)  # type: ignore[operator]
    
  File "/root/project/torchvision/models/detection/roi_heads.py", line 716, in forward
    
            # non-maximum suppression, independently done per class
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
                   ~~~~~~~~~~~~~~~~~~~ <--- HERE
            # keep only topk scoring predictions
            keep = keep[: self.detections_per_img]
  File "/root/project/torchvision/ops/boxes.py", line 72, in batched_nms
    # https://github.com/pytorch/vision/issues/1311#issuecomment-781329339
    if boxes.numel() > (4000 if boxes.device.type == "cpu" else 20000) and not torchvision._is_tracing():
        return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
               ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
        return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
  File "/root/project/torchvision/ops/boxes.py", line 105, in _batched_nms_vanilla
) -> Tensor:
    # Based on Detectron2 implementation, just manually call nms() on each class independently
    keep_mask = torch.zeros_like(scores, dtype=torch.bool)
                ~~~~~~~~~~~~~~~~ <--- HERE
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1650956998902/work/torch/csrc/autograd/functions/utils.h":65, please report a bug to PyTorch. 

packaging/build_cmake.sh: line 106:  2593 Aborted                 (core dumped) ./test_frcnn_tracing

Exited with code exit status 134
```


More JIT-related [failures](https://app.circleci.com/pipelines/github/pytorch/vision/16989/workflows/ff36e1d0-ba6a-4907-aabf-fabe3f95d4ff/jobs/1378001) can be seen at `unittest_linux_gpu_py3.8`:
```
test_jit[4-dtype0-cuda-mean-2-1.0]
Traceback (most recent call last):
  File "/home/circleci/project/test/test_ops.py", line 1566, in test_jit
    scripted_focal_loss = script_fn(inputs, targets, gamma=gamma, alpha=alpha, reduction=reduction)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: required keyword attribute 'cache_id' is undefined
```

It looks like this is due to an upstream change on PyTorch Core. 

### Versions

The breakage on the `cmake_*` jobs started appearing on the 20220426 nightly, see 66ed6937a8325b26a7dea4bf8d1a84195f5d544a. 

~~Though other breakages existed using nightly 20220425, the cmake jobs were passing normally (see de31e4b8bf9b4a7e0668d19059a5ac4760dceee1). All of those breakages are due to upstream changes and they are documented at #5873~~

We have merged #5875 to solve the above issues related to nightly 20220425. For the new issues caused by 20220426 please check the CI results of commit cc53cd01cf0383ed748c6574d81b76e16971660c which provides a clearer view of the issue.

cc @seemethere

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT related failures using Core nightly 20220426 #5881

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JIT related failures using Core nightly 20220426 #5881

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions