Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudaCheckError() failed : invalid device function on AMI #55

Closed
VitaliKaiser opened this issue Jan 7, 2017 · 1 comment
Closed

cudaCheckError() failed : invalid device function on AMI #55

VitaliKaiser opened this issue Jan 7, 2017 · 1 comment

Comments

@VitaliKaiser
Copy link

VitaliKaiser commented Jan 7, 2017

Hi,

I tried to get the experiment working on Amazon GPU Cloud machine with a K520 graphic card with cuda 8. I got pretty much warnings, but I think the problem is some cuda function not working on the GPU. Here is some of the output:

assign pretrain model weights to conv2_1
assign pretrain model biases to conv2_1
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:89: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
Faster-RCNN_TF/tools/../lib/rpn_msr/proposal_target_layer_tf.py:90: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
cudaCheckError() failed : invalid device function
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x4cae120: CUDA_ERROR_DEINITIALIZED; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_DEINITIALIZED
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:671] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3775] stream 0x4caea80 did not memcpy device-to-host; source: 0x723f3cf00
./experiments/scripts/faster_rcnn_end2end.sh: line 57: 10679 Aborted                 (core dumped) python ./tools/train_net.py --device ${DEV} --device_id ${DEV_ID} --weights data/pretrain_model/VGG_imagenet.npy --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train ${EXTRA_ARGS}

Can you give me hint what the problem could be?

Thanks in advance

@VitaliKaiser
Copy link
Author

I found my problem :) It was a problem with Tensorflow itself. Official binaries wont work on AWS AMI because the use the 3.5 compute ability of NVV. Building Tensorflow from source with correct settings solved this problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant