Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Argument error for train_shapes.ipynb #265

Open
yasiemir opened this issue Feb 20, 2018 · 6 comments
Open

Invalid Argument error for train_shapes.ipynb #265

yasiemir opened this issue Feb 20, 2018 · 6 comments

Comments

@yasiemir
Copy link

yasiemir commented Feb 20, 2018

running train_shapes with the following config on a machine with CPU only:

class ShapesConfig(Config):
    
    NAME = "shapes"  
    GPU_COUNT = 1
    IMAGES_PER_GPU = 8
    NUM_CLASSES = 1 + 3  # background + 3 shapes   
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128    
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels
    TRAIN_ROIS_PER_IMAGE = 32
    STEPS_PER_EPOCH = 100
    VALIDATION_STEPS = 3
    POST_NMS_ROIS_TRAINING = 200
    POST_NMS_ROIS_INFERENCE = 100
    BACKBONE_STRIDES = [4,8,16]
    RPN_ANCHOR_SCALES = (32, 64)
    MAX_GT_INSTANCES = 10

and and training with layers='heads':

model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=1, 
            layers='heads')

generates the following InvalidArgumentError:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: indices[1] = 3949 is not in [0, 3840)
	 [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-10-2101606c7d8e> in <module>()
      6             learning_rate=config.LEARNING_RATE,
      7             epochs=1,
----> 8             layers='heads')

/home/meer/Documents/dsb/Mask_RCNN-master/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs, layers)
   2208             max_queue_size=100,
   2209             workers=workers,
-> 2210             use_multiprocessing=True,
   2211         )
   2212         self.epoch = max(self.epoch, epochs)

/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name +
     90                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2175                     outs = self.train_on_batch(x, y,
   2176                                                sample_weight=sample_weight,
-> 2177                                                class_weight=class_weight)
   2178 
   2179                     if not isinstance(outs, list):

/usr/local/lib/python3.5/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1847             ins = x + y + sample_weights
   1848         self._make_train_function()
-> 1849         outputs = self.train_function(ins)
   1850         if len(outputs) == 1:
   1851             return outputs[0]

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2473         session = get_session()
   2474         updated = session.run(fetches=fetches, feed_dict=feed_dict,
-> 2475                               **self.session_kwargs)
   2476         return updated[:len(self.outputs)]
   2477 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1118     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:
   1122       results = []

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1315     if handle is None:
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:
   1319       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1334         except KeyError:
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 
   1338   def _extend_graph(self):

InvalidArgumentError: indices[1] = 3949 is not in [0, 3840)
	 [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]

Caused by op 'ROI/Gather_22', defined at:
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.5/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/usr/lib/python3/dist-packages/tornado/ioloop.py", line 866, in start
    handler_func(fd_obj, events)
  File "/usr/lib/python3/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.5/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.5/dist-packages/ipykernel/zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-ae82d0e47c39>", line 3, in <module>
    model_dir=MODEL_DIR)
  File "/home/meer/Documents/dsb/Mask_RCNN-master/model.py", line 1735, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/home/meer/Documents/dsb/Mask_RCNN-master/model.py", line 1851, in build
    config=config)([rpn_class, rpn_bbox])
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 617, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/meer/Documents/dsb/Mask_RCNN-master/model.py", line 271, in call
    names=["pre_nms_anchors"])
  File "/home/meer/Documents/dsb/Mask_RCNN-master/utils.py", line 678, in batch_slice
    output_slice = graph_fn(*inputs_slice)
  File "/home/meer/Documents/dsb/Mask_RCNN-master/model.py", line 269, in <lambda>
    anchors = utils.batch_slice(ix, lambda x: tf.gather(anchors, x),
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 2486, in gather
    params, indices, validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1834, in gather
    validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): indices[1] = 3949 is not in [0, 3840)
	 [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]
@woodm1979
Copy link

I'm also running into this issue. I'll try and debug it tonight/tomorrow.

@yasiemir
Copy link
Author

I think its related to #211

@woodm1979
Copy link

Somehow we're getting more Scores than we are anchors. ... which seems to be impossible, so there may be a mismatch in combining some things somehow.

@FruVirus
Copy link

Are you using the default ResNet-101 with FPN structure? If so, you have a mismatch in the number of elements between RPN_ANCHOR_SCALES and BACKBONE_STRIDES. Also, you have RPN_ANCHOR_SCALES defined twice.

In general, if you are using ResNet-101 with FPN and taking the C4 backbone, there are 5 levels in the FPN. Thus, you should have 5 elements in RPN_ANCHOR_SCALES and 5 elements in BACKBONE_STRIDES.

@dxs66
Copy link

dxs66 commented Oct 7, 2018

When i run the train_shapes.ipynb, I met the result is that FileNotFoundError: [Errno 2] Could not find weight files in /home/ubuntu66/logs/shapes20181007T1338. How can i solve this problem?

@PROFRAGU
Copy link

did anybody fix this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants