Problem with AdamOptimizer? #11

gangeshwark · 2017-03-22T09:39:37Z

Hi,
You have a great tutorial. Thanks!
While running your tutorial 2 code, i ran into this error:

InvalidArgumentError Traceback (most recent call last)
in ()
5 for batch in range(max_batches):
6 fd = next_feed()
----> 7 _, l = sess.run([train_op, loss], fd)
8 loss_track.append(l)
9

InvalidArgumentError: Cannot assign a device to node 'Adam/update_Variable/sub_3': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
NoOp: GPU CPU
AssignSub: GPU CPU
ScatterAdd: GPU CPU
StridedSlice: GPU CPU
Shape: GPU CPU
Unique: CPU
Sub: GPU CPU
Const: GPU CPU
VariableV2: GPU CPU
UnsortedSegmentSum: GPU CPU
Identity: GPU CPU
Gather: GPU CPU
Mul: GPU CPU
RealDiv: GPU CPU
Assign: GPU CPU
Sqrt: GPU CPU
Enter: GPU CPU
Add: GPU CPU
Switch: GPU CPU
[[Node: Adam/update_Variable/sub_3 = Sub[T=DT_FLOAT, _class=["loc:@variable"]](Adam/update_Variable/sub_3/x, Adam/beta2)]]

Caused by op u'Adam/update_Variable/sub_3', defined at:
File "/home/gangeshwark/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/gangeshwark/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 7, in
train_op = tf.train.AdamOptimizer(lr).minimize(loss)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 289, in minimize
name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 413, in apply_gradients
update_ops.append(processor.update_op(self, grad))
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 66, in update_op
return optimizer._apply_sparse_duplicate_indices(g, self._v)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 557, in _apply_sparse_duplicate_indices
return self._apply_sparse(gradient_no_duplicate_indices, var)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 156, in _apply_sparse
v_scaled_g_values = (grad.values * grad.values) * (1 - beta2_t)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 808, in r_binary_op_wrapper
return func(x, y, name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2775, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Adam/update_Variable/sub_3': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
NoOp: GPU CPU
AssignSub: GPU CPU
ScatterAdd: GPU CPU
StridedSlice: GPU CPU
Shape: GPU CPU
Unique: CPU
Sub: GPU CPU
Const: GPU CPU
VariableV2: GPU CPU
UnsortedSegmentSum: GPU CPU
Identity: GPU CPU
Gather: GPU CPU
Mul: GPU CPU
RealDiv: GPU CPU
Assign: GPU CPU
Sqrt: GPU CPU
Enter: GPU CPU
Add: GPU CPU
Switch: GPU CPU
[[Node: Adam/update_Variable/sub_3 = Sub[T=DT_FLOAT, _class=["loc:@variable"]](Adam/update_Variable/sub_3/x, Adam/beta2)]]

I think it is something to do with AdamOptimizer(). However, the code works with GradientDescentOptimizer.
Any idea how to solve this?

Thanks in advance!

ematvey · 2017-03-22T14:14:13Z

Hi! Traceback you've shown indicates that you're running ops on GPU. I guess you've changed device manually (unless TF have some environment override for default device I'm not aware of). Could you post the code you modified?

Also, do you by any change run optimizer like tf.train.AdamOptimizer().minimize(loss, colocate_gradients_with_ops=True)?

gangeshwark · 2017-03-22T15:09:31Z

Hi,
I did not modify your code at all. Just a dry run gave me that error. So I don't think it was due to any changes in the code.
Also, no I didn't run the optimizer like the one you mentioned.

ematvey · 2017-03-22T18:28:11Z

This is extremely strange. Trace clearly shows you are running the tutorial on GPU, while nothing in it changes the device (and it doesn't change it for me).

Please try running with soft device placement: sess = tf.InteractiveSession(config=tf.ConfigProto(allow_soft_placement=True))

ematvey · 2017-03-25T13:29:33Z

I assume the issue was resolved. Closing it now.

ravikg · 2017-07-14T12:10:19Z

I had similar issue and after looking into the issue it seems the issue is with embedding. It seems there is no GPU implementation of embedding. One can declare the embedding variable to use cpu then it works fine, e.g.:

with tf.device("/cpu:0"):
    embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)

Ref: tensorflow/tensorflow#5117

ematvey closed this as completed Mar 25, 2017

felipk101 mentioned this issue Oct 29, 2018

Could not satisfy explicit device specification? fizyr/keras-maskrcnn#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with AdamOptimizer? #11

Problem with AdamOptimizer? #11

gangeshwark commented Mar 22, 2017

ematvey commented Mar 22, 2017

gangeshwark commented Mar 22, 2017

ematvey commented Mar 22, 2017

ematvey commented Mar 25, 2017

ravikg commented Jul 14, 2017

Problem with AdamOptimizer? #11

Problem with AdamOptimizer? #11

Comments

gangeshwark commented Mar 22, 2017

ematvey commented Mar 22, 2017

gangeshwark commented Mar 22, 2017

ematvey commented Mar 22, 2017

ematvey commented Mar 25, 2017

ravikg commented Jul 14, 2017