Building torchvision for 3.0 cuda compute capability #6470

take5v · 2022-08-22T23:50:25Z

🐛 Describe the bug

I have successfully built PyTorch from source for my legacy hardware on Windows 10. I'm building torchvision for cuda 10.2, sm30 having the following env variables USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0" NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_30,code=sm_30" TORCH_NVCC_FLAGS="-Xfatbin -compress-all".

When I trace detectron2 GeneralizedRCNN I get an error: RuntimeError: default_program(21): error: identifier "__ldg" is undefined.

Full stack trace:

  File "<repo>/detectron/export_model.py", line 194, in <module>
    exported_model = export_tracing(torch_model, sample_inputs, "model.ts")
  File "<repo>/detectron/export_model.py", line 120, in export_tracing
    ts_model = torch.jit.trace(traceable_model, (image,))  # type: ignore
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 733, in trace
    return trace_module(
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 934, in trace_module
    module._c._create_method_from_trace(
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\detectron2\export\flatten.py", line 294, in forward
    outputs = self.inference_func(self.model, *inputs_orig_format)
  File "<repo>/detectron/export_model.py", line 111, in inference_func
    inst = model.inference(inputs, do_postprocess=False)[0]
  File "<py3.8>\lib\site-packages\detectron2\modeling\meta_arch\rcnn.py", line 213, in inference
    results, _ = self.roi_heads(images, features, proposals, None)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 747, in forward
    pred_instances = self._forward_box(features, proposals)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 815, in _forward_box
    pred_instances, _ = self.box_predictor.inference(predictions, proposals)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 479, in inference
    return fast_rcnn_inference(
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 79, in fast_rcnn_inference
    result_per_image = [
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 80, in <listcomp>
    fast_rcnn_inference_single_image(
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 162, in fast_rcnn_inference_single_image     
    keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
  File "<py3.8>\lib\site-packages\detectron2\layers\nms.py", line 20, in batched_nms
    return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 1094, in wrapper
    return compiled_fn(*args, **kwargs)
RuntimeError: default_program(21): error: identifier "__ldg" is undefined

1 error detected in the compilation of "default_program".

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_unsqueeze_add(float* t0, float* t1, float* aten_add) {
{
  if (512 * blockIdx.x + threadIdx.x<17808 ? 1 : 0) {
    float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
    float v_1 = __ldg(t1 + (512 * blockIdx.x + threadIdx.x) / 4);
    aten_add[512 * blockIdx.x + threadIdx.x] = v + v_1;
  }
}
}


### Versions

PyTorch version: 1.8.2a0+e0495a7
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: GeForce GTX 680
Nvidia driver version: 441.22
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.2
[pip3] torch==1.8.2a0+e0495a7
[pip3] torchvision==0.9.0a0+761d09f
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

take5v · 2022-08-23T07:24:28Z

Detectron2 GeneralizedRCNN model was trained and exported on another machine:

Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.0
Libc version: N/A

Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building torchvision for 3.0 cuda compute capability #6470

Building torchvision for 3.0 cuda compute capability #6470

take5v commented Aug 22, 2022

take5v commented Aug 23, 2022

Building torchvision for 3.0 cuda compute capability #6470

Building torchvision for 3.0 cuda compute capability #6470

Comments

take5v commented Aug 22, 2022

🐛 Describe the bug

take5v commented Aug 23, 2022