Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error When Training Yolov3 with Image Size 1280x720 #204

Open
tristochief opened this issue Aug 2, 2018 · 4 comments
Open

Error When Training Yolov3 with Image Size 1280x720 #204

tristochief opened this issue Aug 2, 2018 · 4 comments

Comments

@tristochief
Copy link

When I train with a dataset with image size: (1280 by 720) and change the batch size to either 1, 10 or 16, I get the following error:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]

Here is the code where this error is happening (The line is in *bold)

def yolo_body(inputs, num_anchors, num_classes):
    """Create YOLO_V3 model CNN body in Keras."""
    darknet = Model(inputs, darknet_body(inputs))
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(256, (1,1)),
            UpSampling2D(2))(x)
    print('got here')
    ****x = Concatenate()([x,darknet.layers[152].output])****
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x)
    x = Concatenate()([x,darknet.layers[92].output])
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))

    return Model(inputs, [y1,y2,y3])

I am certain that it is to do with the image size, because I have tried with 740 by 416, and it ran several epoches before encountering a completely different error.

here is the full output from the terminal:

/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Using TensorFlow backend.
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
2018-08-03 06:39:11.817243: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:65:00.0
totalMemory: 10.91GiB freeMemory: 10.46GiB
2018-08-03 06:39:11.817284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-08-03 06:39:12.008818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-03 06:39:12.008854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-08-03 06:39:12.008859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-08-03 06:39:12.009089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10129 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
got here
Create YOLOv3 model with 9 anchors and 80 classes.
Load weights model_data/yolo.h5.
Freeze the first 249 layers of total 252 layers.
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Train on 2296 samples, val on 255 samples, with batch size 1.
Epoch 1/50
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
	 [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
	 [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_2819)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 190, in <module>
    _main()
  File "train.py", line 65, in _main
    callbacks=[logging, checkpoint])
  File "/home/tris/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1415, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training_generator.py", line 213, in fit_generator
    class_weight=class_weight)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1215, in train_on_batch
    outputs = self.train_function(ins)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2672, in __call__
    return self._legacy_call(inputs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2654, in _legacy_call
    **self.session_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
	 [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
	 [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopyolo_loss/while_2/strided_slice_1/stack_2/_2819)]]

Caused by op 'concatenate_1/concat', defined at:
  File "train.py", line 190, in <module>
    _main()
  File "train.py", line 33, in _main
    freeze_body=2, weights_path='model_data/yolo.h5') # make sure you know what you freeze
  File "train.py", line 116, in create_model
    model_body = yolo_body(image_input, num_anchors//3, num_classes)
  File "/home/tris/Documents/beacohealth/HHCM/OD/Software/yolo/keras-yolo3/yolo3/model.py", line 79, in yolo_body
    x = Concatenate()([x,darknet.layers[152].output])
  File "/home/tris/.local/lib/python3.5/site-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/layers/merge.py", line 155, in call
    return self._merge_function(inputs)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/layers/merge.py", line 357, in _merge_function
    return K.concatenate(inputs, axis=self.axis)
  File "/home/tris/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1934, in concatenate
    return tf.concat([to_dense(x) for x in tensors], axis)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1181, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 949, in concat_v2
    "ConcatV2", values=values, axis=axis, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,256,80,44] vs. shape[1] = [1,512,80,45]
	 [[Node: concatenate_1/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](concatenate_1/concat-0-TransposeNHWCToNCHW-LayoutOptimizer, add_19/add, concatenate_1/concat-2-LayoutOptimizer)]]
	 [[Node: yolo_loss/while_2/strided_slice_1/stack_1/_2925 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3886_yolo_loss/while_2/strided_slice_1/stack_1", tensor_type=DT_INT3```
@MiVaVo
Copy link

MiVaVo commented Sep 3, 2018

Did you manage to solve this problem ?

@l33tl4bs
Copy link

l33tl4bs commented Oct 2, 2018

@tristochief 720x1280 image size is not multiple of 32 (see code) - you could add two 8px black bars to get to 736x1280 and it'll work! ;)

@tristochief
Copy link
Author

@tristochief 720x1280 image size is not multiple of 32 (see code) - you could add two 8px black bars to get to 736x1280 and it'll work! ;)

thanks!

@Pari-singh
Copy link

Hey, my image size is the same. However I solved it using cropping. But I am encountering a separate problem:

Here is the complete error, I got
019-06-27 20:42:22.389956: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-27 20:42:22.393907: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_1_4: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-27 20:42:22.393964: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tensor_array_ops.cc:661 : Invalid argument: TensorArray replica_0/model_3/yolo_loss/TensorArray_2_5: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
2019-06-27 20:42:23.419646: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Traceback (most recent call last):
File "train.py", line 521, in
_main()
File "train.py", line 178, in _main
initial_epoch=0
File "/opt/conda/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/opt/conda/lib/python3.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/opt/conda/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/root/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in call
run_metadata_ptr)
File "/root/.local/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: TensorArray replica_0/model_3/yolo_loss/TensorArray_3: Could not read from TensorArray index 0. Furthermore, the element shape is not fully defined: [?,?,3]. It is possible you are working with a resizeable TensorArray and stop_gradients is not allowing the gradients to be written. If you set the full element_shape property on the forward TensorArray, the proper all-zeros tensor will be returned instead of incurring this error.
[[{{node replica_0/model_3/yolo_loss/TensorArrayStack/TensorArrayGatherV3}}]]
[[{{node replica_1/model_3/yolo_loss/add_17}}]]

Any lead to the solution or help is welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants