Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to opset 11 is causing issues #886

Closed
ttdd11 opened this issue Apr 15, 2020 · 36 comments
Closed

Moving to opset 11 is causing issues #886

ttdd11 opened this issue Apr 15, 2020 · 36 comments

Comments

@ttdd11
Copy link

ttdd11 commented Apr 15, 2020

Trying to build a model for opset 11. For version 1.5.1, I am getting an error regarding inferring shapes and dtypes. For version 1.5.6, the optimizer doesn't seem to be working and it failing because of deep_copy. Any help would be greatly appreciated.

Trace version 1.5.1:

2020-04-15 05:32:20,772 - WARNING - From C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\verbose_logging.py:71: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2020-04-15 05:32:37,748 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.5.1/0c735a
2020-04-15 05:32:37,756 - INFO - Using opset <onnx, 11>
2020-04-15 05:32:42,706 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__219, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,789 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__242, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,810 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__247, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,867 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__264, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,873 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__269, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,913 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__280, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:42,924 - WARNING - ONNX Failed to infer shapes and dtypes for [Resize__285, type: Resize]
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\schemas.py", line 157, in infer_onnx_shape_dtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 2 is out of bounds
2020-04-15 05:32:43,345 - INFO -
2020-04-15 05:32:44,187 - WARNING - Failed to optimize model proto
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\graph.py", line 1167, in optimize_model_proto
graph = GraphUtil.create_graph_from_onnx_model(onnx_model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\graph.py", line 1206, in create_graph_from_onnx_model
inferred_model = shape_inference.infer_shapes(onnx_model_proto)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\onnx\shape_inference.py", line 35, in infer_shapes
inferred_model_str = C.infer_shapes(model_str)
RuntimeError: input 1 is out of bounds
2020-04-15 05:32:44,218 - INFO -
2020-04-15 05:32:44,218 - INFO - Successfully converted TensorFlow model C:/Users/tmp/net.pb to ONNX
2020-04-15 05:32:45,539 - INFO - ONNX model is saved at C:/Users/tmp/net.onnx

Trace version 1.5.6:

2020-04-15 05:33:53,665 - WARNING - From C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\verbose_logging.py:72: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2020-04-15 05:34:08,464 - INFO - Using tensorflow=1.14.0, onnx=1.6.0, tf2onnx=1.5.6/80edd7
2020-04-15 05:34:08,464 - INFO - Using opset <onnx, 11>
2020-04-15 05:34:11,773 - INFO - Optimizing ONNX model
2020-04-15 05:34:11,873 - WARNING - Failed to apply optimize_transpose
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init_.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,103 - WARNING - Failed to apply fold_constants
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init
.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,215 - WARNING - Failed to apply loop_optimizer
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init
.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,323 - WARNING - Failed to apply merge_duplication
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init
.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,540 - WARNING - Failed to apply remove_identity
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init
.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,649 - WARNING - Failed to apply remove_back_to_back
Traceback (most recent call last):
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\site-packages\tf2onnx\optimizer_init
.py", line 50, in optimize_graph
current = copy.deepcopy(graph)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\tmp.conda\envs\tensorflow_gpu2\lib\copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2020-04-15 05:34:12,689 - INFO - After optimization: no change
2020-04-15 05:34:12,826 - INFO -
2020-04-15 05:34:12,827 - INFO - Successfully converted TensorFlow model C:/Users/tmp/net.pb to ONNX
2020-04-15 05:34:14,159 - INFO - ONNX model is saved at C:/Users/tmp/net.onnx

@guschmue
Copy link
Collaborator

tf2onnx-1.5.1 did not have support for opset 11 but we are accepting the --opset 11 so we tag the model with opset 11 (lame excuse - we don't fail because it makes it easier for us when we are in the middle of adding a new opset). We'll discuss changing that to fail when the opset is not fully implemented.

If you upgrade to tf2onnx-1.5.6 (pip install tf2onnx -U) things should work.

@guschmue guschmue reopened this Apr 15, 2020
@ttdd11
Copy link
Author

ttdd11 commented Apr 15, 2020

@guschmue Did you take a look at the trace for 1.5.6? I can't seem to get for optimizer working for that version either.

@guschmue
Copy link
Collaborator

got it. There is another bug like this, something new that we have not been able to reproduce.
What python version is this ? Anaconda or system python ?

@ttdd11
Copy link
Author

ttdd11 commented Apr 15, 2020

Anaconda I think version 3.6. What do you recommend I can try some things if that helps.

@guschmue
Copy link
Collaborator

I'm looking for some way of reproducing this but so far anaconda/3.6 is happy on linux on windows.

@ttdd11
Copy link
Author

ttdd11 commented Apr 15, 2020 via email

@guschmue
Copy link
Collaborator

Sure, since it is failing for you I hope it would fail for me.

@ttdd11
Copy link
Author

ttdd11 commented Apr 15, 2020

Do you have an email I can send this to? I probably shouldn't post the model here.

@ttdd11
Copy link
Author

ttdd11 commented Apr 16, 2020

@guschmue I tried many variants of onnx, tensorflow and tf2onnx and all version are unhappy with opset 11.

I also built the master branch and tried that, same issues regardless of the tensorflow version.

@guschmue
Copy link
Collaborator

This deepcopy issue seems unrelated to the opset, something wrong in the optimizer. There are 2 other deepcopy issues that came in recently and they all look the same, failing in the back_to_back optimizer making me think that the identity optimizer that runs before back_to_back must have some issue. Going to review that code.
If you can share the model you can send a link to guschmue@microsoft.com.

@ttdd11
Copy link
Author

ttdd11 commented Apr 16, 2020

Just sent, thanks for taking a look.

@guschmue
Copy link
Collaborator

So I tried your model on tf-1.14, tf-2.2, tf2onnx-1.5.6 and tf2onnx-master with python3.6 and python3.7 on both windows and linux ... all working for me.
The only thing in my env that might be different is that I only use anacoda.
Let me check with some team mates if they have ever seen the deepcopy error.

@ttdd11
Copy link
Author

ttdd11 commented Apr 16, 2020

I'm not sure what you mean by I only use anaconda? That's also what I am using.

What version of anaconda are you using? I may re-install and send all my instructions. What version of cuda are you using?

@guschmue
Copy link
Collaborator

I'm using cuda-10.1 on linux and used a cpu build on windows. Don't think cuda would impact tf2onnx except in a few cases where the graph is a little different if tensorflow finds cuda.
Some people use the system python, that is why I mention that I only use anaconda.

@ttdd11
Copy link
Author

ttdd11 commented Apr 16, 2020

I'm going to re-install anaconda and re-build my environments. Can you email me back the .onnx export so I can try it further down? I'm just moving to opset 11 to address some downstream issues.

@jignparm
Copy link
Contributor

@ttdd11, can you run the packages without creating a conda environment to see if that makes any difference?

@ttdd11
Copy link
Author

ttdd11 commented Apr 16, 2020

@jignparm as in run the same without calling activate env?

@jignparm
Copy link
Contributor

Yes -- BTW, this is not likely to be the issue, just trying to isolate the differences.

@ttdd11
Copy link
Author

ttdd11 commented Apr 17, 2020

@jignparm This is a bit tricky, my anaconda environment isn't happy without calling an activate. I'll have to change some path variables to test this out.

@ttdd11
Copy link
Author

ttdd11 commented Apr 17, 2020

@jignparm would it be just as good of a test if I ran this using system python and packages?

@jignparm
Copy link
Contributor

my anaconda environment isn't happy without calling an activate.

That's odd. You should be able to install Anaconda multiple times in separate folders (i.e. have a secondary installations).

system python and packages?

Not sure what configuration the system Python is. Like I mentioned above, this is not likely to be the root cause (simply a difference), so if it ends up being too difficult to test, feel free to skip it (I assumed it would be a quick test, and hence proposed it).

Other users have seen similar errors, but so far we have not been able to reproduce them, which was the reason for the far-fetched test, to rule out Python environment issues.

@ttdd11
Copy link
Author

ttdd11 commented Apr 17, 2020

It's a known issue with numpy and anaconda. I'll see how the afternoon plays out (it's just a path issue that's pretty easily resolved).

Just a thought, are you guys building onnx from source?

@jignparm
Copy link
Contributor

The Onnx package is not built from source. It's the released version from Pypi.

@buddhapuneeth
Copy link
Contributor

@jignparm my issue is also linked to this. So I am commenting here. I am able to narrow down the issue. For me issue is happening only with CPython 3.6 (internal version) not with the anaconda Python 3.6. I compared copy.py files in both, there is no difference. Not sure of the exact point of failure.
Is there any alternate mechanisms you can think of for deepcopy()?

@jignparm
Copy link
Contributor

@buddhapuneeth, it looks like Anaconda 3.6 works for you, but CPython 3.6 throws this error.

For @ttdd11 , Anaconda is throwing an error as well.

I installed CPython 3.6 on windows and converted a large ssd_resnet101_v1_fpn_shared_box_predictor_oid_512x512_sync_2019_01_20 model, but still could not reproduce the deepcopy error.

Even if we use an alternative to deepcopy, it will be difficult to verify without being able to reproduce the error.

I'll investigate to see if there are any dangling/bad references after conversion.

Could you disable the optimizers (need to modify code) to isolate it to a particular optimizer? One suspicion is the back_to_back optimizer in optimizer\__init__.py. If you remove it from the dictionary in that file, it'll disable it.

@buddhapuneeth
Copy link
Contributor

I removed BackToBackOptimizer and tried, still the same issue for other optimizers.

@jignparm
Copy link
Contributor

Thanks for the quick check! Any idea if disabling all optimizers still results in this issue?

@buddhapuneeth
Copy link
Contributor

If I disable all optimizers, then there is no issue.
One basic doubt here, we are copying the graph here as a fallback mechanism in case optimization fails...right? So I should not assume optimizations will be successful every time?

@jignparm
Copy link
Contributor

@buddhapuneeth , yes that's correct -- if an optimizer fails, then the 'current' graph will not be updated to the 'new' graph (i.e. the one optimizer will be modifying until it succeeds or hits an error and exits). So optimizers are not required to succeed at every iteration -- if any of the throws an error, only that optimizer is aborted.

It's interesting that disabling all the optimizers solves the issue. It means most likely one of them is a culprit, and probably not the the back_to_back_optimizer.

IdentityOptimizer is another suspicious source of the deepcopy error. If you enable only only that one optimizer, and if you observe the deepcopy error, it should be a good enough hint for us to look for a fix for it.

@buddhapuneeth
Copy link
Contributor

@jignparm I tried with all combinations and it is failing in all. Also it is failing for one specific model, I tried with other models for which it is working fine. I am assuming, some nodetype is causing issue while deep copy while resolving references as the final error says:
'ReferenceError: weakly-referenced object no longer exists'

As mentioned in the first ticket, I am using a temp' workaround:
logger.verbose("Apply %s", name)
try:
current = copy.deepcopy(graph)
except Exception:
logger.verbose("Failed to do deepcopy")
current = graph
opt = factory()
graph = opt.optimize(current) or graph

These are the logs for transpose optimization. I am having some 400 nodes of ~30 types.
2020-04-28 16:45:58,627 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2020-04-28 16:45:58,651 - DEBUG - tf2onnx.optimizer: Failed to do deepcopy
2020-04-28 16:45:58,744 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: Add -3 (..), Const -36 (..), Identity -3 (..), Reshape +1 (..), Transpose -16 (..)
2020-04-28 16:45:58,744 - VERBOSE - tf2onnx.optimizer: Apply fold_constants..........

I am not able to narrow down the culprit node, as there are lot of them.
Any suggestions from your side?

@buddhapuneeth
Copy link
Contributor

@jignparm issue is resolved for me on upgrading to cpython37 from cpython36.

@jignparm
Copy link
Contributor

jignparm commented May 1, 2020

Thanks @buddhapuneeth for trying out several combinations, and great that cpython37 is working for you without any errors.

It might still be helpful to debug on 3.6 to see if there's a true bug in any of the optimizers.

Each optimizer runs independently from the others, but it's possible that the graph is corrupted by optimizer A and we don't see the error until optimizer B runs.

There are 6 of them activated (see below), and since the dictionary is sorted by key values, they will run in alphabetical order until no more optimizations can be performed.

It should be possible to comment out all 6 of them, which makes the error disappear, and then activate only 1 of them at a time, to see which optimizer is causing the error. There is still likely to be a subtle bug in these, and it's less likely that deep-copy is buggy -- so isolating it down to 1 optimizer would be very good information.

https://github.com/onnx/tensorflow-onnx/blob/master/tf2onnx/optimizer/__init__.py#L22-L29

@buddhapuneeth
Copy link
Contributor

Hi @jignparm, as I mentioned in my second last comment. I tried all the combinations of optimizers. If I comment all the optimizers, there is no error. And with any of the optimizer the error is happening.
I also tried to find the exact node at which deepcopy is failing, but due to ~400 nodes running recursively, I was not able to narrow it down. I believe, it is something to do with nodes in the graph and nothing to do with optimizers.

@jignparm
Copy link
Contributor

jignparm commented May 2, 2020

Thanks for the clarifying -- I didn't realize that <1> with ALL optimizers disabled, there's no error, and <2> with ANY one optimizer enabled, you see this error.

I believe, it is something to do with nodes in the graph and nothing to do with optimizers.

That sounds reasonable to me as well. If some node is corrupted before starting any of the optimizers -- that would explain <2> above.

In that case, a good way to isolate the exact node (or rewriter) is put a debug line just below the 'try' at 2 locations:

Something like the snippet below. The print statement just before the failure is probably the operator that is causing the corruption for your model.

try:
   print (func)                # print what op or rewriter is being called....
   dontuse = deepcopy(g)       # if deepcopy fails, then the previous func() caused corruption
   ...
except ... :

@jignparm
Copy link
Contributor

@ttdd, @buddhapuneeth PR #972 should resolve this issue. Let me know if you see any errors. It took a while before we got a model to reproduce the error systematically -- hopefully this resolves it finally.

@guschmue
Copy link
Collaborator

guschmue commented Jul 7, 2020

I assume this is fixed.

@guschmue guschmue closed this as completed Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants