Can not run test case and inference #260

ghost · 2018-03-09T06:11:42Z

I installed detection, everything seems to be fine until I ran the test case

=============================
python test_spatial_narrow_as_op.py

It failed with the following message:

Found Detectron ops lib: /home/xxx/anaconda2/lib/libcaffe2_detectron_ops_gpu.so
I0308 22:07:29.731431 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAs.
FI0308 22:07:30.126979 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAs.
I0308 22:07:30.382014 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAs.
.I0308 22:07:30.383860 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAs.
I0308 22:07:30.384814 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAs.
I0308 22:07:30.385095 29562 operator.cc:173] Operator with engine CUDNN is not available for operator SpatialNarrowAsGradient.
E

ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient
self._run_test(A, B, check_grad=True)
File "test_spatial_narrow_as_op.py", line 49, in _run_test
res, grad, grad_estimated = gc.CheckSimple(op, [A, B], 0, [0])

success = RunOperatorOnce(op)

File "/home/xxxx/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 179, in RunOperatorOnce
return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

======================================================================
FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "test_spatial_narrow_as_op.py", line 68, in test_large_forward
self._run_test(A, B)
File "test_spatial_narrow_as_op.py", line 54, in _run_test
np.testing.assert_allclose(C, C_ref, rtol=1e-5, atol=1e-08)
File "/home/xxx/anaconda2/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)

raise AssertionError(msg)

AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-08

(mismatch 100.0%)
x: array([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],...
y: array([[[[ 3.099715e-01, -1.291913e+00, -2.825952e-01, ...,
-2.258663e-01, -8.814982e-01, 4.408140e-01],
[ 1.377446e+00, 1.170039e+00, 1.164714e-01, ...,...

Ran 3 tests in 1.078s

FAILED (failures=1, errors=1)

=======================================================

If I run with the inference

python2 tools/infer_simple.py
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml
--output-dir /tmp/detectron-visualizations
--image-ext jpg
--wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
demo

I got the following error:

I0308 22:03:05.297256 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.297796 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.298099 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.298406 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.298660 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.298704 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.299007 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.299317 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.299623 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.299666 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.299965 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.300297 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.300607 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.300649 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.300714 31934 operator.cc:173] Operator with engine CUDNN is not available for operator StopGradient.
I0308 22:03:05.300990 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.301300 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.301609 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.301867 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.301910 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.302211 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.302521 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.302832 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.302876 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.303180 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.303493 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.303802 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.303844 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.304164 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.304476 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.304787 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.304831 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sum.
I0308 22:03:05.305145 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.
I0308 22:03:05.305461 31934 operator.cc:173] Operator with engine CUDNN is not available for operator AffineChannel.

I0308 22:03:05.460626 31934 operator.cc:173] Operator with engine CUDNN is not available for operator Sigmoid.
I0308 22:03:05.460695 31934 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 1.714e-05 secs
I0308 22:03:05.460738 31934 net_dag.cc:61] Number of parallel execution chains 5 Number of operators = 18
INFO infer_simple.py: 111: Processing demo/16004479832_a748d55f21_k.jpg -> /tmp/detectron-visualizations/16004479832_a748d55f21_k.jpg.pdf
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "gpu_0/res2_0_branch2c_bn" input: "gpu_0/res2_0_branch1_bn" output: "gpu_0/res2_0_branch2c_bn" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 } debug_info: " File "tools/infer_simple.py", line 147, in \n main(args)\n File
*** Aborted at 1520575410 (unix time) try "date -d @1520575410" if you are using GNU date ***
PC: @ 0x7f1d09bad428 gsignal
*** SIGABRT (@0x3e800007cbe) received by PID 31934 (TID 0x7f1cba292700) from PID 31934; stack trace: ***
@ 0x7f1d0a663390 (unknown)
@ 0x7f1d09bad428 gsignal
@ 0x7f1d09baf02a abort
@ 0x7f1d031bdb39 __gnu_cxx::__verbose_terminate_handler()
@ 0x7f1d031bc1fb __cxxabiv1::__terminate()
@ 0x7f1d031bc234 std::terminate()
@ 0x7f1d031d7c8a execute_native_thread_routine_compat
@ 0x7f1d0a6596ba start_thread
@ 0x7f1d09c7f41d clone
Aborted

Operating system: Ubuntu
Compiler version: gcc
CUDA version: 9.0
cuDNN version: 7.0
NVIDIA driver version: ?
GPU models (for all devices if they are not all the same): TITAN
PYTHONPATH environment variable: ?
python --version output: ?
Anything else that seems relevant: ?

The text was updated successfully, but these errors were encountered:

mlprt · 2018-03-12T17:26:04Z

Similar errors:

On python tests/test_spatial_narrow_as_op.py:

Found Detectron ops lib: /home/xxxx/anaconda3/envs/detectron/lib/libcaffe2_detectron_ops_gpu.so
F.E

ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "tests/test_spatial_narrow_as_op.py", line 59, in test_small_forward_and_gradient
self._run_test(A, B, check_grad=True)
File "tests/test_spatial_narrow_as_op.py", line 49, in _run_test
res, grad, grad_estimated = gc.CheckSimple(op, [A, B], 0, [0])
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 284, in CheckSimple
outputs_with_grads
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/gradient_checker.py", line 201, in GetLossAndGrad
workspace.RunOperatorsOnce(grad_ops)
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 184, in RunOperatorsOnce
success = RunOperatorOnce(op)
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 179, in RunOperatorOnce
return C.run_operator_once(StringifyProto(operator))
RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

======================================================================
FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

Traceback (most recent call last):
File "tests/test_spatial_narrow_as_op.py", line 68, in test_large_forward
self._run_test(A, B)
File "tests/test_spatial_narrow_as_op.py", line 54, in _run_test
np.testing.assert_allclose(C, C_ref, rtol=1e-5, atol=1e-08)
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-08

(mismatch 100.0%)
x: array([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],...
y: array([[[[-1.243985, -2.407127, 1.165339, ..., -0.023202, -0.096644,
-0.096511],
[-0.640857, -0.977031, 0.745425, ..., -0.049333, -1.520961,...

Ran 3 tests in 0.519s

FAILED (failures=1, errors=1)

On python2 tools/infer_simple.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --output-dir /tmp/detectron-visualizations --image-ext jpg --wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl demo

WARNING cnn.py: 40: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py: 57: Loading weights from: /tmp/detectron-download-cache/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
I0312 13:09:24.344396 378 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000140145 secs
I0312 13:09:24.344605 378 net_dag.cc:61] Number of parallel execution chains 63 Number of operators = 402
I0312 13:09:24.362937 378 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 0.000125812 secs
I0312 13:09:24.363134 378 net_dag.cc:61] Number of parallel execution chains 30 Number of operators = 358
I0312 13:09:24.364900 378 net_dag_utils.cc:118] Operator graph pruning prior to chain compute took: 8.807e-06 secs
I0312 13:09:24.364929 378 net_dag.cc:61] Number of parallel execution chains 5 Number of operators = 18
INFO infer_simple.py: 111: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
E0312 13:09:24.806742 393 net_dag.cc:203] Exception from operator '' (type 'Sum'): caffe2::EnforceNotMet: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "gpu_0/res2_0_branch2c_bn" input: "gpu_0/res2_0_branch1_bn" output: "gpu_0/res2_0_branch2c_bn" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 } debug_info: " File "tools/infer_simple.py", line 147, in \n main(args)\n File "tools/infer_simple.py", line 99, in main\n model = infer_engine.initialize_model_from_cfg()\n File "/home/xxxx/opt/detectron/lib/core/test_engine.py", line 266, in initialize_model_from_cfg\n model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 124, in create\n return get_func(model_type_func)(model)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 89, in generalized_rcnn\n freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 229, in build_generic_detection_model\n optim.build_data_parallel_model(model, _single_gpu_build_func)\n File "/home/xxxx/opt/detectron/lib/modeling/optimizer.py", line 54, in build_data_parallel_model\n single_gpu_build_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 169, in _single_gpu_build_func\n blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 62, in add_fpn_ResNet101_conv5_body\n model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5\n File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body\n conv_body_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 46, in add_ResNet101_conv5_body\n return add_ResNet_convX_body(model, (3, 4, 23, 3))\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 101, in add_ResNet_convX_body\n s, dim_in = add_stage(model, 'res2', p, n1, dim_in, 256, dim_bottleneck, 1)\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 83, in add_stage\n inplace_sum=i < n - 1\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 187, in add_residual_block\n s = model.net.Sum([tr, sc], tr)\n File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/core.py", line 2047, in \n op_type, *args, **kwargs)\n File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/core.py", line 2024, in _CreateAndAddToSelf\n op = CreateOperator(op_type, inputs, outputs, **kwargs)\n"
Original python traceback for operator 14 in network generalized_rcnn in exception above (most recent call last):
File "tools/infer_simple.py", line 147, in
File "tools/infer_simple.py", line 99, in main
File "/home/xxxx/opt/detectron/lib/core/test_engine.py", line 266, in initialize_model_from_cfg
File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 124, in create
File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 89, in generalized_rcnn
File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 229, in build_generic_detection_model
File "/home/xxxx/opt/detectron/lib/modeling/optimizer.py", line 54, in build_data_parallel_model
File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 169, in _single_gpu_build_func
File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 62, in add_fpn_ResNet101_conv5_body
File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body
File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 46, in add_ResNet101_conv5_body
File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 101, in add_ResNet_convX_body
File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 83, in add_stage
File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 187, in add_residual_block
Traceback (most recent call last):
File "tools/infer_simple.py", line 147, in
main(args)
File "tools/infer_simple.py", line 117, in main
model, im, None, timers=timers
File "/home/xxxx/opt/detectron/lib/core/test.py", line 65, in im_detect_all
scores, boxes, im_scales = im_detect_bbox(model, im, box_proposals)
File "/home/xxxx/opt/detectron/lib/core/test.py", line 154, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 230, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/workspace.py", line 192, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "gpu_0/res2_0_branch2c_bn" input: "gpu_0/res2_0_branch1_bn" output: "gpu_0/res2_0_branch2c_bn" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 } debug_info: " File "tools/infer_simple.py", line 147, in \n main(args)\n File "tools/infer_simple.py", line 99, in main\n model = infer_engine.initialize_model_from_cfg()\n File "/home/xxxx/opt/detectron/lib/core/test_engine.py", line 266, in initialize_model_from_cfg\n model = model_builder.create(cfg.MODEL.TYPE, train=False, gpu_id=gpu_id)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 124, in create\n return get_func(model_type_func)(model)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 89, in generalized_rcnn\n freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 229, in build_generic_detection_model\n optim.build_data_parallel_model(model, _single_gpu_build_func)\n File "/home/xxxx/opt/detectron/lib/modeling/optimizer.py", line 54, in build_data_parallel_model\n single_gpu_build_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/model_builder.py", line 169, in _single_gpu_build_func\n blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 62, in add_fpn_ResNet101_conv5_body\n model, ResNet.add_ResNet101_conv5_body, fpn_level_info_ResNet101_conv5\n File "/home/xxxx/opt/detectron/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body\n conv_body_func(model)\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 46, in add_ResNet101_conv5_body\n return add_ResNet_convX_body(model, (3, 4, 23, 3))\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 101, in add_ResNet_convX_body\n s, dim_in = add_stage(model, 'res2', p, n1, dim_in, 256, dim_bottleneck, 1)\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 83, in add_stage\n inplace_sum=i < n - 1\n File "/home/xxxx/opt/detectron/lib/modeling/ResNet.py", line 187, in add_residual_block\n s = model.net.Sum([tr, sc], tr)\n File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/core.py", line 2047, in \n op_type, *args, **kwargs)\n File "/home/xxxx/anaconda3/envs/detectron/lib/python2.7/site-packages/caffe2/python/core.py", line 2024, in _CreateAndAddToSelf\n op = CreateOperator(op_type, inputs, outputs, **kwargs)\n"

OS: Ubuntu 17.04
CUDA 9.0
cuDNN 7
NVIDIA Driver 390.12
GPU: TITAN Xp
$PYTHONPATH: empty
python --version: Python 2.7.14 :: Anaconda, Inc.

gecong · 2018-03-13T23:32:01Z

same here

anatlin · 2018-03-18T22:12:27Z

any update on this?

xmengli · 2018-03-19T04:35:25Z

@gecong @anatlin

RuntimeError: [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
input: "A" input: "B" input: "C_grad" output: "A_grad" name: "" type: "SpatialNarrowAsGradient" device_option { device_type: 1 cuda_gpu_id: 0 } is_gradient_op: true

I solved this by adding export PYTHONPATH=$PYTHONPATH:/home/user/caffe2/build in bashrc file

ghost · 2018-03-20T04:06:32Z

@xmengli999

I do not seem to have a caffe2/build director on my machine.

I installed caffe with anaconda, and I have the following directories under

~/anaconda2/pkgs/caffe2-cuda9.0-cudnn7-0.8.dev-py27h4e2c0f2_0$ ls -ltr
total 24
drwxrwxr-x 3 gcong gcong 4096 Mar 7 22:04 share
drwxrwxr-x 6 gcong gcong 4096 Mar 7 22:04 include
drwxrwxr-x 2 gcong gcong 4096 Mar 7 22:04 bin
drwxrwxr-x 2 gcong gcong 4096 Mar 7 22:04 test
drwxrwxr-x 4 gcong gcong 4096 Mar 7 22:04 lib
drwxrwxr-x 4 gcong gcong 4096 Mar 7 22:04 info

Could you let me know how I can change the PYTHONPATH?

Thanks a lot

anatlin · 2018-03-21T21:21:42Z

@rbgirshick is this a caffe2 issue?

mihaifieraru · 2018-03-24T23:16:51Z

I experience the same issue, any update?

ghost · 2018-04-03T21:10:15Z

same issue here!

xmengli · 2018-04-04T00:35:54Z

@deeprun I build from source. You can have a try.

olegantonyan · 2018-04-06T06:18:19Z

same problem. haven't tried to build caffe from sources
opensuse tumbleweed, cuda 9.0 cudnn 7.1

gzaripov · 2018-04-10T17:56:03Z

The same. cuda 9.0 cudnn 7.1

rafagjordana · 2018-04-18T13:25:08Z

Same here, cuda 9.0 and cudnn 7.1.2

AgrawalAmey · 2018-04-19T20:25:15Z

Facing the same issue on azure data science vm, running on ubuntu 16.04, anaconda 2 and tesla p40. The build directory is included in PYTHONPATH.

apli · 2018-04-20T07:23:49Z

Same problem

wuharvey · 2018-05-02T16:19:44Z

Bump

fengyicoder · 2018-05-03T00:48:17Z

Same problem, any update?

macsermkiat · 2018-05-03T06:57:03Z

Me too
EDIT : I already fixed this by

I uninstalled CUDA 9.1, because My GPU is Quadro M4000, which support only CUDA 8.0

sudo apt-get remove cuda-9.1
sudo apt-get install cuda-8.0

Make sure to install matched cuDNN version (7.1.2) and NCCL for CUDA 8.0
Uninstall Caffe2 and install again use conda install -c caffe2 caffe2_cuda8.0_cudnn7
Fix .bashrc point to the right directory

export PATH="~/anaconda3/bin:$PATH:/usr/local/cuda-8.0/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH"

So it's that you have to make sure your GPU support right version of CUDA

BanuSelinTosun · 2018-07-11T16:46:03Z

@AgrawalAmey
I have been circling around the same issue for almost 1-1.5 weeks now on Azure Linux Ubuntu 16.04 DSVM. Did you come up with a resolution?

gadcam · 2018-07-11T23:29:52Z

Looks like this PR is trying to solve the problem (or a part of it at least) pytorch/pytorch#7062, can you still reproduce if you use a version of Caffe2/PyTorch including this commit ?

Another track to follow could be fireice-uk/xmr-stak-nvidia#159 (comment) (see http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ to get the correct CUDA_ARCH number)

gadcam · 2018-07-11T23:36:31Z

@BanuSelinTosun Just as a side question : did you try this ? https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro#caffe2

AgrawalAmey · 2018-07-12T04:45:22Z

@BanuSelimTosun Sorry, I couldn't find any solution for the issue.

BanuSelinTosun · 2018-07-12T16:42:38Z

@gadcam
Yes, that was the very 1st thing I tried. The problem with Azure DSVMs are they already have Cuda 9 with cudnn 7 where Detectron want Cuda 8 & cudnn 6. There is caffe2 installation with cuda 9 and cudnn7 and it is a) not working with detectron due to the version and also it is installed in python 3 not python 2, b) conflicting with new caffe2 installations when cuda 8 & cudnn 6 is installed. Even if I create everything in a new environment.

@AgrawalAmey
I tried to file an issue on this to Azure computing before July 4th, and they are not taking it very seriously. I even talked to one of the Principal Manager in Azure. He just suggested me new approaches which did not work.

gadcam · 2018-07-12T22:36:44Z

@BanuSelinTosun I had to install the Detectron in a very similar environment.

What I would do (I do not know if it is possible in your environment)

Uninstall Caffe2
Uninstall CUDA & cuDNN
Reinstall correct versions of CUDA & cuDNN (you could also have to switch the GC driver version in some setup if I recall correctly)
Build Caffe2 again specifying a CUDA_ARCH (you could also have to check that it links to correct and for Python PYTHON_EXECUTABLE / PYTHON_INCLUDE_DIR / PYTHON_LIBRARY
Run the tests
If everything goes well up to this point than you should be able to install the Detectron

I think it will not be that easy but maybe it will raise some new errors which will give us some new hints to go further.

BTW I do not think Detectron needs CUDA 8: my install is with CUDA Version 9.0.176 & cuDNN 7.0.5.

EDIT : maybe this can be of some help https://docs.microsoft.com/en-US/azure/virtual-machines/linux/n-series-driver-setup#ubuntu-1604-lts

BanuSelinTosun · 2018-07-13T06:17:00Z

@gadcam, thank you for helping with this issue. I had working caffe2 in the Azure DSVMs with Cuda 9 and cudnn 7 It should be working with those if it worked for you. I can install (run the make file) of detectron, that does not have a problem. But when I run the test for the detectron, I am getting that Failures=1, errors=1 error and failing. It always circles back to the same problem as it seems. :-( I did not try running with inference, should I try that first?

…

On Thu, Jul 12, 2018 at 3:37 PM, Camille Barneaud ***@***.***> wrote: @BanuSelinTosun <https://github.com/BanuSelinTosun> I had to install the Detectron in a very similar environment. What I would do (I do not know if it is possible in your environment) - Uninstall Caffe2 - Uninstall CUDA & cuDNN - Reinstall correct versions of CUDA & cuDNN (you could also have to switch the GC driver version in some setup if I recall correctly) - Build Caffe2 again specifying a CUDA_ARCH (you could also have to check that it links to correct and for Python PYTHON_EXECUTABLE / PYTHON_INCLUDE_DIR / PYTHON_LIBRARY - Run the tests - If everything goes well up to this point than you should be able to install the Detectron I think it will not be that easy but maybe it will raise some new errors which will give us some new hints to go further. BTW I do not think Detectron needs CUDA 8: my install is with CUDA Version 9.0.176 & cuDNN 7.0.5. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#260 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AaMxPVOVdlhkkoaG7i7-8M4widWNRMYtks5uF8-ogaJpZM4SjzRk> .

BanuSelinTosun · 2018-07-13T17:03:03Z

Ok. I have a working detectron now.
My solution was using the docker image path. It works. does not matter whether you have cuda 9 or 8, cudnn 7 or 6, whatever caffe2 version... it works!
And @AgrawalAmey it works in Azure Linux DSVM. :D

remcova · 2018-10-10T07:47:12Z

Ok. I have a working detectron now.
My solution was using the docker image path. It works. does not matter whether you have cuda 9 or 8, cudnn 7 or 6, whatever caffe2 version... it works!
And @AgrawalAmey it works in Azure Linux DSVM. :D

Could you explain your solution a bit more in details? I have this problem and I have a hard time to solve it.

paritoshgote · 2018-10-31T22:57:58Z

@BanuSelinTosun : Could you please explain your solution using docker in more detail?

yfzon · 2018-11-20T02:55:23Z

I met the similar error when I want to use the tensorflow op compiled by nvcc: Could not launch cub::DeviceSegmentedRadixSort::SortPairsDescending to sort input, temp_storage_bytes: 599295, status: no kernel image is available for execution on the device. I found this issue and knew that it's caused by the gpu compute capability. I use Tesla40 and add -gencode arch=compute_61,code=compute_61 my compile file. Solved it finally. Hope it can help you.

rbgirshick mentioned this issue Mar 9, 2018

"No kernel image is available for execution on the device Error" when running test_spatial_narrow_as_op.py #263

Closed

shirishr mentioned this issue Apr 19, 2018

Post-install issue: Anaconda3, Ubuntu 16.04 Python 3.6 Caffe2 pytorch/pytorch#6562

Closed

gadcam mentioned this issue Jul 11, 2018

OSError: /usr/local/lib/libcaffe2_detectron_ops_gpu.so: undefined symbol: #502

Closed

PuyaYavari mentioned this issue Aug 24, 2018

detectron issue ( test_spatial_narrow_as_op.py error) #634

Closed

vacancy mentioned this issue Apr 26, 2019

Does this work for 0.4.1 / how to install? vacancy/PreciseRoIPooling#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not run test case and inference #260

Can not run test case and inference #260

ghost commented Mar 9, 2018

mlprt commented Mar 12, 2018

gecong commented Mar 13, 2018

anatlin commented Mar 18, 2018

xmengli commented Mar 19, 2018

ghost commented Mar 20, 2018

anatlin commented Mar 21, 2018

mihaifieraru commented Mar 24, 2018

ghost commented Apr 3, 2018

xmengli commented Apr 4, 2018

olegantonyan commented Apr 6, 2018

gzaripov commented Apr 10, 2018

rafagjordana commented Apr 18, 2018

AgrawalAmey commented Apr 19, 2018

apli commented Apr 20, 2018

wuharvey commented May 2, 2018

fengyicoder commented May 3, 2018

macsermkiat commented May 3, 2018 •

edited

BanuSelinTosun commented Jul 11, 2018

gadcam commented Jul 11, 2018

gadcam commented Jul 11, 2018

AgrawalAmey commented Jul 12, 2018

BanuSelinTosun commented Jul 12, 2018 •

edited

gadcam commented Jul 12, 2018 •

edited

BanuSelinTosun commented Jul 13, 2018 via email •

edited

BanuSelinTosun commented Jul 13, 2018

remcova commented Oct 10, 2018

paritoshgote commented Oct 31, 2018

yfzon commented Nov 20, 2018 •

edited

Can not run test case and inference #260

Can not run test case and inference #260

Comments

ghost commented Mar 9, 2018

============================= python test_spatial_narrow_as_op.py

ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

====================================================================== FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

mlprt commented Mar 12, 2018

Found Detectron ops lib: /home/xxxx/anaconda3/envs/detectron/lib/libcaffe2_detectron_ops_gpu.so F.E

ERROR: test_small_forward_and_gradient (main.SpatialNarrowAsOpTest)

====================================================================== FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

gecong commented Mar 13, 2018

anatlin commented Mar 18, 2018

xmengli commented Mar 19, 2018

ghost commented Mar 20, 2018

anatlin commented Mar 21, 2018

mihaifieraru commented Mar 24, 2018

ghost commented Apr 3, 2018

xmengli commented Apr 4, 2018

olegantonyan commented Apr 6, 2018

gzaripov commented Apr 10, 2018

rafagjordana commented Apr 18, 2018

AgrawalAmey commented Apr 19, 2018

apli commented Apr 20, 2018

wuharvey commented May 2, 2018

fengyicoder commented May 3, 2018

macsermkiat commented May 3, 2018 • edited

BanuSelinTosun commented Jul 11, 2018

gadcam commented Jul 11, 2018

gadcam commented Jul 11, 2018

AgrawalAmey commented Jul 12, 2018

BanuSelinTosun commented Jul 12, 2018 • edited

gadcam commented Jul 12, 2018 • edited

BanuSelinTosun commented Jul 13, 2018 via email • edited

BanuSelinTosun commented Jul 13, 2018

remcova commented Oct 10, 2018

paritoshgote commented Oct 31, 2018

yfzon commented Nov 20, 2018 • edited

=============================
python test_spatial_narrow_as_op.py

======================================================================
FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

Found Detectron ops lib: /home/xxxx/anaconda3/envs/detectron/lib/libcaffe2_detectron_ops_gpu.so
F.E

======================================================================
FAIL: test_large_forward (main.SpatialNarrowAsOpTest)

macsermkiat commented May 3, 2018 •

edited

BanuSelinTosun commented Jul 12, 2018 •

edited

gadcam commented Jul 12, 2018 •

edited

BanuSelinTosun commented Jul 13, 2018 via email •

edited

yfzon commented Nov 20, 2018 •

edited