cudaCheckError() failed : invalid device function on AMI #55

VitaliKaiser · 2017-01-07T20:08:16Z

Hi,

I tried to get the experiment working on Amazon GPU Cloud machine with a K520 graphic card with cuda 8. I got pretty much warnings, but I think the problem is some cuda function not working on the GPU. Here is some of the output:

assign pretrain model weights to conv2_1
assign pretrain model biases to conv2_1
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:89: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:90: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
cudaCheckError() failed : invalid device function
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x4cae120: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:671] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3775] stream 0x4caea80 did not memcpy device-to-host; source: 0x723f3cf00
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 10679 Aborted                 (core dumped) python ./tools/train_net.py --device ${DEV} --device_id ${DEV_ID} --weights data/pretrain_model/VGG_imagenet.npy --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train ${EXTRA_ARGS}

Can you give me hint what the problem could be?

Thanks in advance

The text was updated successfully, but these errors were encountered:

VitaliKaiser · 2017-01-11T14:42:03Z

I found my problem :) It was a problem with Tensorflow itself. Official binaries wont work on AWS AMI because the use the 3.5 compute ability of NVV. Building Tensorflow from source with correct settings solved this problem for me.

VitaliKaiser closed this as completed Jan 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudaCheckError() failed : invalid device function on AMI #55

cudaCheckError() failed : invalid device function on AMI #55

VitaliKaiser commented Jan 7, 2017 •

edited

Loading

VitaliKaiser commented Jan 11, 2017

cudaCheckError() failed : invalid device function on AMI #55

cudaCheckError() failed : invalid device function on AMI #55

Comments

VitaliKaiser commented Jan 7, 2017 • edited Loading

VitaliKaiser commented Jan 11, 2017

VitaliKaiser commented Jan 7, 2017 •

edited

Loading