You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully built PyTorch from source for my legacy hardware on Windows 10. I'm building torchvision for cuda 10.2, sm30 having the following env variables USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0" NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_30,code=sm_30" TORCH_NVCC_FLAGS="-Xfatbin -compress-all".
When I trace detectron2 GeneralizedRCNN I get an error: RuntimeError: default_program(21): error: identifier "__ldg" is undefined.
Full stack trace:
File "<repo>/detectron/export_model.py", line 194, in <module>
exported_model = export_tracing(torch_model, sample_inputs, "model.ts")
File "<repo>/detectron/export_model.py", line 120, in export_tracing
ts_model = torch.jit.trace(traceable_model, (image,)) # type: ignore
File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 733, in trace
return trace_module(
File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 934, in trace_module
module._c._create_method_from_trace(
File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "<py3.8>\lib\site-packages\detectron2\export\flatten.py", line 294, in forward
outputs = self.inference_func(self.model, *inputs_orig_format)
File "<repo>/detectron/export_model.py", line 111, in inference_func
inst = model.inference(inputs, do_postprocess=False)[0]
File "<py3.8>\lib\site-packages\detectron2\modeling\meta_arch\rcnn.py", line 213, in inference
results, _ = self.roi_heads(images, features, proposals, None)
File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "<py3.8>\lib\site-packages\torch\nn\modules\module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 747, in forward
pred_instances = self._forward_box(features, proposals)
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\roi_heads.py", line 815, in _forward_box
pred_instances, _ = self.box_predictor.inference(predictions, proposals)
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 479, in inference
return fast_rcnn_inference(
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 79, in fast_rcnn_inference
result_per_image = [
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 80, in <listcomp>
fast_rcnn_inference_single_image(
File "<py3.8>\lib\site-packages\detectron2\modeling\roi_heads\fast_rcnn.py", line 162, in fast_rcnn_inference_single_image
keep = batched_nms(boxes, scores, filter_inds[:, 1], nms_thresh)
File "<py3.8>\lib\site-packages\detectron2\layers\nms.py", line 20, in batched_nms
return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
File "<py3.8>\lib\site-packages\torch\jit\_trace.py", line 1094, in wrapper
return compiled_fn(*args, **kwargs)
RuntimeError: default_program(21): error: identifier "__ldg" is undefined
1 error detected in the compilation of "default_program".
nvrtc compilation failed:
#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)
template<typename T>
__device__ T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}
template<typename T>
__device__ T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}
extern "C" __global__
void fused_unsqueeze_add(float* t0, float* t1, float* aten_add) {
{
if (512 * blockIdx.x + threadIdx.x<17808 ? 1 : 0) {
float v = __ldg(t0 + 512 * blockIdx.x + threadIdx.x);
float v_1 = __ldg(t1 + (512 * blockIdx.x + threadIdx.x) / 4);
aten_add[512 * blockIdx.x + threadIdx.x] = v + v_1;
}
}
}
### Versions
PyTorch version: 1.8.2a0+e0495a7
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: N/A
Python version: 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: GeForce GTX 680
Nvidia driver version: 441.22
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.2
[pip3] torch==1.8.2a0+e0495a7
[pip3] torchvision==0.9.0a0+761d09f
[conda] Could not collect
The text was updated successfully, but these errors were encountered:
Detectron2 GeneralizedRCNN model was trained and exported on another machine:
Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.24.0
Libc version: N/A
Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[conda] Could not collect
馃悰 Describe the bug
I have successfully built PyTorch from source for my legacy hardware on Windows 10. I'm building torchvision for cuda 10.2, sm30 having the following env variables USE_CUDA=1 USE_CUDNN=1 USE_MKLDNN=1 TORCH_CUDA_ARCH_LIST="3.0" NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_30,code=sm_30" TORCH_NVCC_FLAGS="-Xfatbin -compress-all".
When I trace detectron2 GeneralizedRCNN I get an error: RuntimeError: default_program(21): error: identifier "__ldg" is undefined.
Full stack trace:
The text was updated successfully, but these errors were encountered: