RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) Command terminated by signal 11 #107

xzy295461445 · 2017-05-26T06:42:26Z

i use my own datasets replace the voc2007 and have some issue. Can you please suggest solutions?

here is the log.

##`+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-05-26_14-23-40
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-05-26_14-23-40

set +x
'[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
[[ ! -z '' ]]
CUDA_VISIBLE_DEVICES=0
time python ./tools/trainval_net.py --weight data/imagenet_weights/vgg16.ckpt --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --net vgg16 --set ANCHOR_SCALES '[8,16,32]' ANCHOR_RATIOS '[0.5,1,2]' TRAIN.STEPSIZE 50000
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, net='vgg16', set_cfgs=['ANCHOR_SCALES', '[8,16,32]', 'ANCHOR_RATIOS', '[0.5,1,2]', 'TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.ckpt')
Using config:
{'ANCHOR_RATIOS': [0.5, 1, 2],
'ANCHOR_SCALES': [8, 16, 32],
'DATA_DIR': '/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'vgg16',
'GPU_ID': 0,
'MATLAB': 'matlab',
'PIXEL_MEANS': array([[[ 102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'crop',
'POOLING_SIZE': 7,
'RESNET': {'BN_TRAIN': False, 'FIXED_BLOCKS': 1, 'MAX_POOL': False},
'RNG_SEED': 3,
'ROOT_DIR': '/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn',
'TEST': {'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'gt',
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': {'ASPECT_GROUPING': False,
'BATCH_SIZE': 256,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'DISPLAY': 20,
'DOUBLE_BIAS': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.001,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
'STEPSIZE': 50000,
'SUMMARY_INTERVAL': 180,
'TRUNCATED': False,
'USE_ALL_GT': True,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0005},
'USE_GPU_NMS': False}
Loaded dataset voc_2007_trainval for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
1528 roidb entries
Output will be saved to /media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default
TensorFlow summaries will be saved to /media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default
Loaded dataset voc_2007_test for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
328 validation roidb entries
Filtered 0 roidb entries: 1528 -> 1528
Filtered 0 roidb entries: 328 -> 328
2017-05-26 14:24:11.316553: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 14:24:11.316569: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 14:24:11.316572: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 14:24:11.316575: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 14:24:11.316577: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Solving...
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading initial model weights from data/imagenet_weights/vgg16.ckpt
Varibles restored: vgg_16/conv1/conv1_1/biases:0
Varibles restored: vgg_16/conv1/conv1_2/weights:0
Varibles restored: vgg_16/conv1/conv1_2/biases:0
Varibles restored: vgg_16/conv2/conv2_1/weights:0
Varibles restored: vgg_16/conv2/conv2_1/biases:0
Varibles restored: vgg_16/conv2/conv2_2/weights:0
Varibles restored: vgg_16/conv2/conv2_2/biases:0
Varibles restored: vgg_16/conv3/conv3_1/weights:0
Varibles restored: vgg_16/conv3/conv3_1/biases:0
Varibles restored: vgg_16/conv3/conv3_2/weights:0
Varibles restored: vgg_16/conv3/conv3_2/biases:0
Varibles restored: vgg_16/conv3/conv3_3/weights:0
Varibles restored: vgg_16/conv3/conv3_3/biases:0
Varibles restored: vgg_16/conv4/conv4_1/weights:0
Varibles restored: vgg_16/conv4/conv4_1/biases:0
Varibles restored: vgg_16/conv4/conv4_2/weights:0
Varibles restored: vgg_16/conv4/conv4_2/biases:0
Varibles restored: vgg_16/conv4/conv4_3/weights:0
Varibles restored: vgg_16/conv4/conv4_3/biases:0
Varibles restored: vgg_16/conv5/conv5_1/weights:0
Varibles restored: vgg_16/conv5/conv5_1/biases:0
Varibles restored: vgg_16/conv5/conv5_2/weights:0
Varibles restored: vgg_16/conv5/conv5_2/biases:0
Varibles restored: vgg_16/conv5/conv5_3/weights:0
Varibles restored: vgg_16/conv5/conv5_3/biases:0
Varibles restored: vgg_16/fc6/biases:0
Varibles restored: vgg_16/fc7/biases:0
Loaded.
Fix VGG16 layers..
/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:26: RuntimeWarning: invalid value encountered in log
targets_dw = np.log(gt_widths / ex_widths)
Command terminated by signal 11
62.03user 5.27system 0:57.96elapsed 116%CPU (0avgtext+0avgdata 3723648maxresident)k
382896inputs+16outputs (296major+3462186minor)pagefaults 0swaps`

The text was updated successfully, but these errors were encountered:

HTLife · 2017-05-28T01:08:00Z

I also encounter the same error.

Before iteration 55, everything goes fine.

Fix VGG16 layers..
Fixed.
iter: 20 / 7000, total loss: 0.315415
 >>> rpn_loss_cls: 0.120561
 >>> rpn_loss_box: 0.016272
 >>> loss_cls: 0.137040
 >>> loss_box: 0.041542
 >>> lr: 0.001000
speed: 1.417s / iter
iter: 40 / 7000, total loss: 0.740965
 >>> rpn_loss_cls: 0.077266
 >>> rpn_loss_box: 0.005625
 >>> loss_cls: 0.416012
 >>> loss_box: 0.242062
 >>> lr: 0.001000
speed: 1.133s / iter
/notebooks/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:31: RuntimeWarning: invalid value encountered in log
  targets_dh = np.log(gt_heights / ex_heights)
iter: 60 / 7000, total loss: nan
 >>> rpn_loss_cls: 0.681259
 >>> rpn_loss_box: nan
 >>> loss_cls: 2.784790
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 0.925s / iter

After iter=55, rpn_loss_box will become nan which caused by wrong value in bbox_transform.

lib/model/bbox_transform.py
Line20:
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
gt_rois[:, 1] become greater than gt_rois[:, 3] and result gt_heights to negative value.

gt_rois[:, 3]  188.75
gt_rois[:, 1]  81918.8
gt_heights  -81729.0

HTLife · 2017-05-28T03:55:28Z

My temporary solution is to ignore incorrect value (ymin > ymax).

Checking gt_boxes value as follows:

lib/model/train_val.py
https://github.com/endernewton/tf-faster-rcnn/blob/master/lib/model/train_val.py#L219
Line:219

      blobs = self.data_layer.forward() 
	  if blobs['gt_boxes'][0][1] > blobs['gt_boxes'][0][3]:
        iter += 1    
        continue

This modification could let the program running without getting 'Nan'.

Beside this temporary solution.
I start to figure out what is the cause of this error.
We could guess "ymin and ymax value" of brounding box ground truth is wrong.
However, after I exeamine my brounding box data, ymin is always smaller than ymax.

@endernewton What would you suggest to find out the source of the Nan problem?

endernewton · 2017-05-28T23:52:44Z

i am not sure your setting, your application.. it is hard to help. sorry

xzy295461445 · 2017-05-30T12:24:46Z

i run it again and it changed. could you please tell me what happened?

Fix VGG16 layers..
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/layer_utils/anchor_target_layer.py", line 90, in anchor_target_layer
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/layer_utils/anchor_target_layer.py", line 163, in _compute_targets
return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/bbox_transform.py", line 26, in bbox_transform
targets_dw = np.log(gt_widths / ex_widths) if ex_widths != 0 else 0
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2017-05-30 20:17:20.974332: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_2: see error log.
Traceback (most recent call last):
File "./tools/trainval_net.py", line 136, in
max_iters=args.max_iters)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/train_val.py", line 386, in train_net
sw.train_model(sess, max_iters)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/train_val.py", line 285, in train_model
self.net.train_step(sess, blobs, train_op)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/nets/network.py", line 374, in train_step
feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_2: see error log.
[[Node: vgg_16/anchor/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](vgg_16/rpn_cls_score/BiasAdd, _recv_Placeholder_2_0, _recv_Placeholder_1_0, vgg_16/anchor/PyFunc/input_3, vgg_16/ANCHOR_default/generate_anchors, vgg_16/anchor/PyFunc/input_5)]]

Caused by op u'vgg_16/anchor/PyFunc', defined at:
File "./tools/trainval_net.py", line 136, in
max_iters=args.max_iters)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/train_val.py", line 386, in train_net
sw.train_model(sess, max_iters)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/model/train_val.py", line 105, in train_model
anchor_ratios=cfg.ANCHOR_RATIOS)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/nets/network.py", line 305, in create_architecture
rois, cls_prob, bbox_pred = self.build_network(sess, training)
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 68, in build_network
rpn_labels = self._anchor_target_layer(rpn_cls_score, "anchor")
File "/media/y/B0AAA15CAAA11FB8/linux/tf-faster-rcnn/tools/../lib/nets/network.py", line 149, in _anchor_target_layer
[tf.float32, tf.float32, tf.float32, tf.float32])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_2: see error log.
[[Node: vgg_16/anchor/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/cpu:0"](vgg_16/rpn_cls_score/BiasAdd, _recv_Placeholder_2_0, _recv_Placeholder_1_0, vgg_16/anchor/PyFunc/input_3, vgg_16/ANCHOR_default/generate_anchors, vgg_16/anchor/PyFunc/input_5)]]

Command exited with non-zero status 1
18.59user 3.10system 0:24.67elapsed 87%CPU (0avgtext+0avgdata 3754036maxresident)k
659096inputs+16outputs (377major+1686078minor)pagefaults 0swaps

xzy295461445 · 2017-06-01T03:11:37Z

In the bbox_transform , the gt_width is odds. I alter it . I can't ensure i am right. but it work.
cloes

abhiML · 2017-06-15T07:02:38Z

I am getting a similar error. My ex_widths is coming to be nan after 100th iteration. This its giving a runtime warning and then exiting after a few more iterations. Any clues?
@HTLife

lonlonago · 2017-10-17T07:36:59Z

@xzy295461445, how did you alter it? do you solve the problem?

lonlonago · 2017-10-23T08:52:23Z

@xzy295461445 , @HTLife , @abhiML , I get the same problem with train my data , the rpn_box_loss is nan, after some research, it's because in the file 'pascal_voc.py', the function '_load_pascal_annotation' has Make pixel indexes 0-based,the code is :
x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
but if your data is not based 1, such as my data is based 0, then it will get -1 in the data, may be you can try to delete the -1 operation,hope helpful!

VisintZJ · 2017-11-12T07:22:41Z

@xzy295461445 how did you alter it?Did you have solved this problem?

xzy295461445 · 2017-11-12T07:26:47Z

@VisintZJ Can you train with the V0C datasets?

VisintZJ · 2017-11-12T07:29:18Z

@xzy295461445 Yes, there is no question when I train with the VOC datasets

xzy295461445 · 2017-11-12T08:08:12Z

When I make the xml file of my own datasets, the width and height is contrary、

VisintZJ · 2017-11-15T12:43:15Z

@xzy295461445 Thank you! I solve my problem after checking my training data sets and I found the reason——there are some wrong data in my data. :(

Site1997 · 2018-04-01T09:25:46Z

It is perhaps due to the errors of "bbox" coodinates ( x < 0 or x > img_width ) in your Annotations. (At least for my case)

liangxiaotian · 2018-04-25T01:31:42Z

if your dataset's bbox xmin = 0 or ymin = 0, you should change code in pascal_voc.py
# x1 = float(bbox.find('xmin').text) - 1
# y1 = float(bbox.find('ymin').text) - 1
# x2 = float(bbox.find('xmax').text) - 1
# y2 = float(bbox.find('ymax').text) - 1
to
# x1 = float(bbox.find('xmin').text)
# y1 = float(bbox.find('ymin').text)
# x2 = float(bbox.find('xmax').text)
# y2 = float(bbox.find('ymax').text)
if your dataset's bbox xmax= width or ymin = height ,you should change code in imdb.py
# boxes[:, 0] = widths[i] - oldx2 - 1
# boxes[:, 2] = widths[i] - oldx1 - 1
to
boxes[:, 0] = widths[i] - oldx2
boxes[:, 2] = widths[i] - oldx1
if your dataset's bbox xmax > width or ymax > height, you should delete it or relabel it

TianChenone · 2018-09-26T08:27:24Z

If you have checked the xmin ymin xmax ymax and ensure that xmin>0 and xmax<width, ymin>0 and ymax<height, but the problem is still there. Maybe you can try delete the file in /data/chache and rerun the code.

xzy295461445 closed this as completed Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) Command terminated by signal 11 #107

RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) Command terminated by signal 11 #107

xzy295461445 commented May 26, 2017

HTLife commented May 28, 2017

HTLife commented May 28, 2017 •

edited

endernewton commented May 28, 2017

xzy295461445 commented May 30, 2017

xzy295461445 commented Jun 1, 2017

abhiML commented Jun 15, 2017

lonlonago commented Oct 17, 2017

lonlonago commented Oct 23, 2017

VisintZJ commented Nov 12, 2017

xzy295461445 commented Nov 12, 2017

VisintZJ commented Nov 12, 2017

xzy295461445 commented Nov 12, 2017

VisintZJ commented Nov 15, 2017

Site1997 commented Apr 1, 2018

liangxiaotian commented Apr 25, 2018 •

edited

TianChenone commented Sep 26, 2018

RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) Command terminated by signal 11 #107

RuntimeWarning: invalid value encountered in log targets_dw = np.log(gt_widths / ex_widths) Command terminated by signal 11 #107

Comments

xzy295461445 commented May 26, 2017

HTLife commented May 28, 2017

HTLife commented May 28, 2017 • edited

endernewton commented May 28, 2017

xzy295461445 commented May 30, 2017

xzy295461445 commented Jun 1, 2017

abhiML commented Jun 15, 2017

lonlonago commented Oct 17, 2017

lonlonago commented Oct 23, 2017

VisintZJ commented Nov 12, 2017

xzy295461445 commented Nov 12, 2017

VisintZJ commented Nov 12, 2017

xzy295461445 commented Nov 12, 2017

VisintZJ commented Nov 15, 2017

Site1997 commented Apr 1, 2018

liangxiaotian commented Apr 25, 2018 • edited

TianChenone commented Sep 26, 2018

HTLife commented May 28, 2017 •

edited

liangxiaotian commented Apr 25, 2018 •

edited