Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GPU produces error after upgrading to 0.4.0 #361

Closed
luke18 opened this issue Apr 24, 2019 · 2 comments
Closed

Using GPU produces error after upgrading to 0.4.0 #361

luke18 opened this issue Apr 24, 2019 · 2 comments
Assignees

Comments

@luke18
Copy link

luke18 commented Apr 24, 2019

I updated TFF to 0.4.0 yesterday and found that the image classification tutorial cannot be run correctly now (which worked in previous TFF versions). After executing

#@test {"timeout": 600, "output": "ignore"}
state, metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics))

The following error message occurs. I believe this problem is also partly mentioned here. Plus, I am interested in knowing of optimizing GPU usage for TFF possibly by looping on the clients in parallel (as the previous closed issue). Or is there any advice on speeding up the federated training? Now it takes really long time when having a large neural network. Thank you!


InvalidArgumentError Traceback (most recent call last)
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1333 try:
-> 1334 return fn(*args)
1335 except errors.OpError as e:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1318 return self._call_tf_sessionrun(
-> 1319 options, feed_dict, fetch_list, target_list, run_metadata)
1320

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1406 self._session, options, feed_dict, fetch_list, target_list,
-> 1407 run_metadata)
1408

InvalidArgumentError: Could not colocate node with its resource and reference inputs; devices /job:localhost/replica:0/task:0/device:CPU:0 and /job:localhost/replica:0/task:0/device:GPU:0 are not compatible.
[[{{node ReduceDataset}}]]
[[{{node subcomputation/StatefulPartitionedCall_1}}]]
[[{{node subcomputation/StatefulPartitionedCall_1}}]]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)
in ()
1 #@test {"timeout": 600, "output": "ignore"}
----> 2 state, metrics = iterative_process.next(state, federated_train_data)
3 print('round 1, metrics={}'.format(metrics))

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/function_utils.py in call(self, *args, **kwargs)
598 context = self._context_stack.current
599 arg = pack_args(self._type_signature.parameter, args, kwargs, context)
--> 600 return context.invoke(self, arg)
601
602

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in invoke(self, fn, arg)
698 else:
699 computed_arg = None
--> 700 result = computed_comp.value(computed_arg)
701 py_typecheck.check_type(result, ComputedValue)
702 type_utils.check_assignable_from(comp.type_signature.result,

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in (x)
841 return ComputationContext(context, {comp.parameter_name: arg})
842
--> 843 return ComputedValue(lambda x: self._compute(comp.result, _wrap(x)),
844 comp.type_signature)
845

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
782 computation_types.FunctionType)
783 if comp.argument is not None:
--> 784 computed_arg = self._compute(comp.argument, context)
785 type_utils.check_assignable_from(computed_fn.type_signature.parameter,
786 computed_arg.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
--> 746 return self._compute_tuple(comp, context)
747 elif isinstance(comp, computation_building_blocks.Reference):
748 return self._compute_reference(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_tuple(self, comp, context)
800 result_type_elements = []
801 for k, v in anonymous_tuple.to_elements(comp):
--> 802 computed_v = self._compute(v, context)
803 type_utils.check_assignable_from(v.type_signature,
804 computed_v.type_signature)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute(self, comp, context)
742 return self._compute_compiled(comp, context)
743 elif isinstance(comp, computation_building_blocks.Call):
--> 744 return self._compute_call(comp, context)
745 elif isinstance(comp, computation_building_blocks.Tuple):
746 return self._compute_tuple(comp, context)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _compute_call(self, comp, context)
789 else:
790 computed_arg = None
--> 791 result = computed_fn.value(computed_arg)
792 py_typecheck.check_type(result, ComputedValue)
793 type_utils.check_assignable_from(computed_fn.type_signature.result,

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in (x)
869 arg_type = comp.type_signature.parameter
870 return ComputedValue(
--> 871 lambda x: my_method(fit_argument(x, arg_type, context)),
872 comp.type_signature)
873 else:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in _federated_map(self, arg)
912 fn = arg.value[0]
913 result_val = [
--> 914 fn(ComputedValue(x, mapping_type.parameter)).value for x in arg.value[1]
915 ]
916 result_type = computation_types.FederatedType(mapping_type.result,

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in (.0)
912 fn = arg.value[0]
913 result_val = [
--> 914 fn(ComputedValue(x, mapping_type.parameter)).value for x in arg.value[1]
915 ]
916 result_type = computation_types.FederatedType(mapping_type.result,

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in (x)
773 'but found '{}' instead.'.format(computation_oneof))
774 else:
--> 775 return ComputedValue(lambda x: run_tensorflow(comp, x),
776 comp.type_signature)
777

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/reference_executor.py in run_tensorflow(comp, arg)
342 if init_op:
343 sess.run(init_op)
--> 344 result_val = graph_utils.fetch_value_in_session(sess, result)
345 return capture_computed_value_from_graph(result_val,
346 comp.type_signature.result)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/graph_utils.py in fetch_value_in_session(sess, value)
759 if not tf.contrib.framework.is_tensor(v):
760 raise ValueError('Unsupported value type {}.'.format(str(v)))
--> 761 flattened_results = sess.run(flattened_value)
762
763 def _to_unicode(v):

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
927 try:
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
931 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1150 if final_fetches or final_targets or (handle and feed_dict_tensor):
1151 results = self._do_run(handle, final_targets, final_fetches,
-> 1152 feed_dict_tensor, options, run_metadata)
1153 else:
1154 results = []

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1326 if handle is None:
1327 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328 run_metadata)
1329 else:
1330 return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1346 pass
1347 message = error_interpolation.interpolate(message, self._graph)
-> 1348 raise type(e)(node_def, op, message)
1349
1350 def _extend_graph(self):

InvalidArgumentError: Could not colocate node with its resource and reference inputs; devices /job:localhost/replica:0/task:0/device:CPU:0 and /job:localhost/replica:0/task:0/device:GPU:0 are not compatible.
[[{{node ReduceDataset}}]]
[[node subcomputation/StatefulPartitionedCall_1 (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/tensorflow_deserialization.py:122) ]]
[[node subcomputation/StatefulPartitionedCall_1 (defined at /home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow_federated/python/core/impl/tensorflow_deserialization.py:122) ]]

@luke18 luke18 changed the title Using GPU produces error upgrading to 0.4.0 Using GPU produces error after upgrading to 0.4.0 Apr 24, 2019
@ZacharyGarrett
Copy link
Collaborator

We believe GPU support may be working again in release 0.5.0, would you try upgrading?

@jkr26
Copy link
Collaborator

jkr26 commented Dec 6, 2019

Closing as this seems to be handled with relatively few issues by the local_executor.

@jkr26 jkr26 closed this as completed Dec 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants