Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve mapping and add more tests for make_tensor #4270

Merged
merged 44 commits into from
Sep 28, 2022

Conversation

jcwchen
Copy link
Member

@jcwchen jcwchen commented Jun 13, 2022

Describe your changes

  • Refactor mapping
  • Deprecate TENSOR_TYPE_TO_STORAGE_TENSOR_TYPE
  • Use functions instead of more maps to prevent confusions and save binary size
  • Add more tests (test_make_tensor) for more tensor types

Motivation and Context

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@jcwchen jcwchen requested a review from a team as a code owner June 13, 2022 20:53
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
onnx/mapping.py Outdated Show resolved Hide resolved
onnx/mapping.py Outdated Show resolved Hide resolved
onnx/mapping.py Outdated Show resolved Hide resolved
onnx/mapping.py Outdated Show resolved Hide resolved
Copy link
Contributor

@garymm garymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go much further and make only a small number of very useful, user-friendly things public and hide the rest.

Everything currently inside mapping.py should be made private, since I think it's all implemtnation details that clients shouldn't rely on.

I propose:

  • Add helper.tensor_field_for_data_type(TensorProto, DataType), helper.tensor_field_for_numpy_dtype(TensorProto, np.dtype). These do all of the mapping lookups and the getattr call.
  • Move mapping.py to _mapping.py
  • Add a new mapping.py that does something like:
import warnings
warnings.warn("onnx.mapping is deprecated and will be removed in a future release. Use functions in onnx.helper instead.", ...)

# for temporary backwards compatibility
from _mapping import *

If there's something else in mapping.py that is truly needed by users, we can expose that in helper.py as well.

WDYT?

onnx/test/hub_test.py Outdated Show resolved Hide resolved
onnx/test/parser_test.py Outdated Show resolved Hide resolved
onnx/test/helper_test.py Outdated Show resolved Hide resolved
onnx/test/helper_test.py Outdated Show resolved Hide resolved
onnx/test/helper_test.py Outdated Show resolved Hide resolved
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
onnx/test/helper_test.py Outdated Show resolved Hide resolved
onnx/test/helper_test.py Outdated Show resolved Hide resolved
onnx/helper.py Outdated Show resolved Hide resolved
onnx/mapping.py Outdated Show resolved Hide resolved
@jcwchen jcwchen marked this pull request as draft June 17, 2022 16:53
@jcwchen jcwchen changed the title Improve mapping and add more tests for make_tensor [WIP] Improve mapping and add more tests for make_tensor Jun 17, 2022
…o jcw/improve-raw

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

# Conflicts:
#	onnx/mapping.py
#	onnx/test/helper_test.py
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
…o jcw/improve-raw

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

# Conflicts:
#	onnx/numpy_helper.py
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Jul 14, 2022

This pull request introduces 1 alert when merging 8df7a7a into 0fc92e4 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@jcwchen jcwchen added run release CIs Use this label to trigger release tests in CI labels Jul 15, 2022
@jcwchen jcwchen changed the title [WIP] Improve mapping and add more tests for make_tensor Improve mapping and add more tests for make_tensor Jul 15, 2022
@jcwchen jcwchen marked this pull request as ready for review July 15, 2022 00:47
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@@ -36,10 +36,9 @@ def to_array(tensor: TensorProto, base_dir: str = "") -> np.ndarray:
raise TypeError("The element type in the input tensor is not defined.")

tensor_dtype = tensor.data_type
np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[tensor_dtype]
storage_type = mapping.TENSOR_TYPE_TO_STORAGE_TENSOR_TYPE[tensor_dtype]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PR #4510, I remove the use of dictionary TENSOR_TYPE_TO_STORAGE_TENSOR_TYPE. It think use int8 or uint8 to store bool should not be allowed.

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@@ -156,13 +155,9 @@ def to_list(sequence: SequenceProto) -> List[Any]:
"""
lst: List[Any] = []
elem_type = sequence.elem_type
value_field = mapping.STORAGE_ELEMENT_TYPE_TO_FIELD[elem_type]
values = getattr(sequence, value_field)
values = helper.get_attr_from_sequence_elem_type(sequence, elem_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I think it would be simpler to rewrite the code as below:

   if elem_type == SequenceProto.TENSOR:
      return [to_array(v) for v in sequence.tensor_values]
   elif elem_type == SequenceProto.SPARSE_TENSOR:
      return [to_array(v) for v in sequence.sparse_tensor_values]
   elif elem_type == SequenceProto.SEQUENCE:
      return [to_list(v) for v in sequence.sequence_values]
   etc.

I don't think we need to define the helper methods like get_attr_from_seqience_elem_type.

onnx/helper.py Outdated
return cast(str, mapping.STORAGE_ELEMENT_TYPE_TO_FIELD[elem_type])


def get_attr_from_sequence_elem_type(tensor: SequenceProto, elem_type: int) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these utility/helper methods. Please see my comment in numpy_helper.py on rewriting the code there. I think splitting the code into these two files actually makes the logic of either one harder to understand. (I understand this is a problem with the pre-existing code introduced by someone else previously, and not this PR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, yes I think keeping if-else statement makes more sense, but for maintainability, having a common function for the same mapping seems not too bad (we don't need to change the mapping in several places if mapping is changed). However, if the attribute mapping for protos is quite stable, I am fine with having if-else statements. I have removed attribute map for SequenceProto and OptionalProto and used if-else statements instead.

Not sure whether it is a concern -- that also means that previously the attribute mappings for SequenceProto and OptionalProto are public used functions and in the future they will be removed.

@@ -318,10 +313,9 @@ def to_optional(optional: OptionalProto) -> Optional[Any]:
elem_type = optional.elem_type
if elem_type == OptionalProto.UNDEFINED:
return opt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to return None

# TODO: create a map and replace conditional branches
if elem_type == OptionalProto.TENSOR or elem_type == OptionalProto.SPARSE_TENSOR:
if elem_type in (OptionalProto.TENSOR, OptionalProto.SPARSE_TENSOR):
opt = to_array(value)
elif elem_type == OptionalProto.SEQUENCE:
opt = to_list(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to return to_list(optional.tensor_value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise for all the cases. We don't need the helper utility method, whose purpose is only to do this switch/case anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

onnx/helper.py Outdated
return mapping.TENSOR_TYPE_MAP[int(tensor_dtype)].name


def tensor_dtype_to_storage_numpy_type(tensor_dtype: int) -> np.dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can omit this function. Users can call the two functions themselves. It is not easy to document/describe, and too many such functions in the API can confuse a reader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Just omitted.

TensorProto.COMPLEX128,
}
],
ids=lambda tensor_dtype: helper.tensor_dtype_to_string(tensor_dtype),

Check notice

Code scanning / CodeQL

Unnecessary lambda

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
for t in helper.get_all_tensor_types()
if t not in {TensorProto.BFLOAT16, TensorProto.STRING}
],
ids=lambda tensor_dtype: helper.tensor_dtype_to_string(tensor_dtype),

Check notice

Code scanning / CodeQL

Unnecessary lambda

This 'lambda' is just a simple wrapper around a callable object. Use that object directly.
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@@ -148,6 +148,16 @@ Runnable IPython notebooks:
- [make_model.ipynb](/onnx/examples/make_model.ipynb)
- [Protobufs.ipynb](/onnx/examples/Protobufs.ipynb)

## Conversion utilities for mapping attributes in ONNX IR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: suggest moving this to end of the existing sub-section on Manipulating TensorProto and Numpy Array instead of this new sub-section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@jcwchen jcwchen linked an issue Sep 27, 2022 that may be closed by this pull request
@jcwchen jcwchen merged commit de24599 into onnx:main Sep 28, 2022
@jcwchen jcwchen deleted the jcw/improve-raw branch September 28, 2022 04:22
broune pushed a commit to broune/onnx that referenced this pull request May 6, 2023
* Improve mapping and add more tests for make_tensor

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix mypy typecheck failures

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix issues

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* first fix by reviews

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* use double quotes instead of single one

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix mypy typecheck

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* move to function from mapping to helper

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* remove duplicate import

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix flake8

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix getattr error due to np.bool

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* use helper instead of mapping in all places

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* update test coverage

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix typecheck

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* update test coverage

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* update operators.md

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix warnings

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* introduce get_attr_from_sequence_elem_type and optional

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* remove wrong import

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* use namedtuple for better readability

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix warnings and conflicts

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* import KeysView from typing instead of collections

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix mypy failures

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* isort and black

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* add docs

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* use dobule quote and f-string

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* remove space

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* dtype instead of type

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* more dtype

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix isort and remove tensor_dtype_to_storage_numpy_type

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* remove get_attr_from_sequence_elem_type and optional

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix mypy

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* move section in PythonAPI

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix lint

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* fix lint part 2

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* cover call examples in PythonAPI

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run release CIs Use this label to trigger release tests in CI test utility
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

provide a nicer API for accessing the proper member of TensorProto
4 participants