Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building torchvision for 3.0 cuda compute capability #6470

Open
take5v opened this issue Aug 22, 2022 · 1 comment
Open

Building torchvision for 3.0 cuda compute capability #6470

take5v opened this issue Aug 22, 2022 · 1 comment

Comments

@take5v
Copy link

take5v commented Aug 22, 2022

馃悰 Describe the bug

I have successfully built PyTorch from source for my legacy hardware on Windows 10. I'm building torchvision for cuda 10.2, sm30 having the following env variables USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0" NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_30,code=sm_30" TORCH_NVCC_FLAGS="-Xfatbin -compress-all".

When I trace detectron2 GeneralizedRCNN I get an error: RuntimeError: default_program(21): error: identifier "__ldg" is undefined.

Full stack trace:

  File "<repo>/detectron/export_model.py", line 194, in <module>
    exported_model = export_tracing(torch_model, sample_inputs, "model.ts")
  File "<repo>/detectron/export_model.py", line 120, in export_tracing
    ts_model = torch.jit.trace(traceable_model, (image,))  # type: ignore
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 733, in trace
    return trace_module(
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 934, in trace_module
    module._c._create_method_from_trace(
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\detectron2\export\flatten.py", line 294, in forward
    outputs = self.inference_func(self.model, *inputs_orig_format)
  File "<repo>/detectron/export_model.py", line 111, in inference_func
    inst = model.inference(inputs, do_postprocess=False)[0]
  File "<py3.8>\lib\site-packages\detectron2\modeling\meta_arch\rcnn.py", line 213, in inference
    results, _ = self.roi_heads(images, features, proposals, None)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
    result = self._slow_forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 747, in forward
    pred_instances = self._forward_box(features, proposals)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 815, in _forward_box
    pred_instances, _ = self.box_predictor.inference(predictions, proposals)
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 479, in inference
    return fast_rcnn_inference(
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 79, in fast_rcnn_inference
    result_per_image = [
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 80, in <listcomp>
    fast_rcnn_inference_single_image(
  File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 162, in fast_rcnn_inference_single_image     
    keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
  File "<py3.8>\lib\site-packages\detectron2\layers\nms.py", line 20, in batched_nms
    return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
  File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 1094, in wrapper
    return compiled_fn(*args, **kwargs)
RuntimeError: default_program(21): error: identifier "__ldg" is undefined

1 error detected in the compilation of "default_program".

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_unsqueeze_add(float* t0, float* t1, float* aten_add) {
{
  if (512 * blockIdx.x + threadIdx.x<17808 ? 1 : 0) {
    float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
    float v_1 = __ldg(t1 + (512 * blockIdx.x + threadIdx.x) / 4);
    aten_add[512 * blockIdx.x + threadIdx.x] = v + v_1;
  }
}
}


### Versions

PyTorch version: 1.8.2a0+e0495a7
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: N/A

Python version: 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: GeForce GTX 680
Nvidia driver version: 441.22
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.2
[pip3] torch==1.8.2a0+e0495a7
[pip3] torchvision==0.9.0a0+761d09f
[conda] Could not collect
@take5v
Copy link
Author

take5v commented Aug 23, 2022

Detectron2 GeneralizedRCNN model was trained and exported on another machine:

Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.0
Libc version: N/A

Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant