-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve mapping and add more tests for make_tensor #4270
Conversation
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go much further and make only a small number of very useful, user-friendly things public and hide the rest.
Everything currently inside mapping.py should be made private, since I think it's all implemtnation details that clients shouldn't rely on.
I propose:
- Add
helper.tensor_field_for_data_type(TensorProto, DataType)
,helper.tensor_field_for_numpy_dtype(TensorProto, np.dtype)
. These do all of the mapping lookups and thegetattr
call. - Move
mapping.py
to_mapping.py
- Add a new
mapping.py
that does something like:
import warnings
warnings.warn("onnx.mapping is deprecated and will be removed in a future release. Use functions in onnx.helper instead.", ...)
# for temporary backwards compatibility
from _mapping import *
If there's something else in mapping.py that is truly needed by users, we can expose that in helper.py
as well.
WDYT?
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
…o jcw/improve-raw Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> # Conflicts: # onnx/mapping.py # onnx/test/helper_test.py
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
…o jcw/improve-raw Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> # Conflicts: # onnx/numpy_helper.py
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
This pull request introduces 1 alert when merging 8df7a7a into 0fc92e4 - view on LGTM.com new alerts:
|
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@@ -36,10 +36,9 @@ def to_array(tensor: TensorProto, base_dir: str = "") -> np.ndarray: | |||
raise TypeError("The element type in the input tensor is not defined.") | |||
|
|||
tensor_dtype = tensor.data_type | |||
np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[tensor_dtype] | |||
storage_type = mapping.TENSOR_TYPE_TO_STORAGE_TENSOR_TYPE[tensor_dtype] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PR #4510, I remove the use of dictionary TENSOR_TYPE_TO_STORAGE_TENSOR_TYPE
. It think use int8 or uint8 to store bool should not be allowed.
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
onnx/numpy_helper.py
Outdated
@@ -156,13 +155,9 @@ def to_list(sequence: SequenceProto) -> List[Any]: | |||
""" | |||
lst: List[Any] = [] | |||
elem_type = sequence.elem_type | |||
value_field = mapping.STORAGE_ELEMENT_TYPE_TO_FIELD[elem_type] | |||
values = getattr(sequence, value_field) | |||
values = helper.get_attr_from_sequence_elem_type(sequence, elem_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, I think it would be simpler to rewrite the code as below:
if elem_type == SequenceProto.TENSOR:
return [to_array(v) for v in sequence.tensor_values]
elif elem_type == SequenceProto.SPARSE_TENSOR:
return [to_array(v) for v in sequence.sparse_tensor_values]
elif elem_type == SequenceProto.SEQUENCE:
return [to_list(v) for v in sequence.sequence_values]
etc.
I don't think we need to define the helper methods like get_attr_from_seqience_elem_type
.
onnx/helper.py
Outdated
return cast(str, mapping.STORAGE_ELEMENT_TYPE_TO_FIELD[elem_type]) | ||
|
||
|
||
def get_attr_from_sequence_elem_type(tensor: SequenceProto, elem_type: int) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these utility/helper methods. Please see my comment in numpy_helper.py
on rewriting the code there. I think splitting the code into these two files actually makes the logic of either one harder to understand. (I understand this is a problem with the pre-existing code introduced by someone else previously, and not this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, yes I think keeping if-else statement makes more sense, but for maintainability, having a common function for the same mapping seems not too bad (we don't need to change the mapping in several places if mapping is changed). However, if the attribute mapping for protos is quite stable, I am fine with having if-else statements. I have removed attribute map for SequenceProto and OptionalProto and used if-else statements instead.
Not sure whether it is a concern -- that also means that previously the attribute mappings for SequenceProto and OptionalProto are public used functions and in the future they will be removed.
onnx/numpy_helper.py
Outdated
@@ -318,10 +313,9 @@ def to_optional(optional: OptionalProto) -> Optional[Any]: | |||
elem_type = optional.elem_type | |||
if elem_type == OptionalProto.UNDEFINED: | |||
return opt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to return None
onnx/numpy_helper.py
Outdated
# TODO: create a map and replace conditional branches | ||
if elem_type == OptionalProto.TENSOR or elem_type == OptionalProto.SPARSE_TENSOR: | ||
if elem_type in (OptionalProto.TENSOR, OptionalProto.SPARSE_TENSOR): | ||
opt = to_array(value) | ||
elif elem_type == OptionalProto.SEQUENCE: | ||
opt = to_list(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to return to_list(optional.tensor_value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise for all the cases. We don't need the helper utility method, whose purpose is only to do this switch/case anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
onnx/helper.py
Outdated
return mapping.TENSOR_TYPE_MAP[int(tensor_dtype)].name | ||
|
||
|
||
def tensor_dtype_to_storage_numpy_type(tensor_dtype: int) -> np.dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can omit this function. Users can call the two functions themselves. It is not easy to document/describe, and too many such functions in the API can confuse a reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Just omitted.
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
TensorProto.COMPLEX128, | ||
} | ||
], | ||
ids=lambda tensor_dtype: helper.tensor_dtype_to_string(tensor_dtype), |
Check notice
Code scanning / CodeQL
Unnecessary lambda
for t in helper.get_all_tensor_types() | ||
if t not in {TensorProto.BFLOAT16, TensorProto.STRING} | ||
], | ||
ids=lambda tensor_dtype: helper.tensor_dtype_to_string(tensor_dtype), |
Check notice
Code scanning / CodeQL
Unnecessary lambda
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
docs/PythonAPIOverview.md
Outdated
@@ -148,6 +148,16 @@ Runnable IPython notebooks: | |||
- [make_model.ipynb](/onnx/examples/make_model.ipynb) | |||
- [Protobufs.ipynb](/onnx/examples/Protobufs.ipynb) | |||
|
|||
## Conversion utilities for mapping attributes in ONNX IR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: suggest moving this to end of the existing sub-section on Manipulating TensorProto and Numpy Array
instead of this new sub-section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved.
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
* Improve mapping and add more tests for make_tensor Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix mypy typecheck failures Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix issues Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * first fix by reviews Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use double quotes instead of single one Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix mypy typecheck Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * move to function from mapping to helper Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove duplicate import Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix flake8 Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix getattr error due to np.bool Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use helper instead of mapping in all places Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * update test coverage Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix typecheck Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * update test coverage Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * update operators.md Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix warnings Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * introduce get_attr_from_sequence_elem_type and optional Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove wrong import Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use namedtuple for better readability Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix warnings and conflicts Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * import KeysView from typing instead of collections Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix mypy failures Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * isort and black Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * add docs Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * use dobule quote and f-string Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove space Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * dtype instead of type Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * more dtype Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix isort and remove tensor_dtype_to_storage_numpy_type Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * remove get_attr_from_sequence_elem_type and optional Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix mypy Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * move section in PythonAPI Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix lint Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * fix lint part 2 Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> * cover call examples in PythonAPI Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com> Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Describe your changes
Motivation and Context