Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print utility extension #4246

Merged
merged 30 commits into from Jun 21, 2022
Merged

Print utility extension #4246

merged 30 commits into from Jun 21, 2022

Conversation

gramalingam
Copy link
Contributor

@gramalingam gramalingam commented Jun 2, 2022

Description

Extend the printer utilities and expose them in Python.

  • Create a class to encapsulate printer
  • Add support for indentation for more readable output for sub-graphs
  • Add support for printing ModelProto
  • Expose top-level functionality to Python

Motivation and Context

#4231

Limitations

Need support for external tensors. A textual-representation is not suitable for large tensor constants.

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@gramalingam gramalingam requested review from a team as code owners June 2, 2022 18:56
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
onnx/defs/printer.cc Outdated Show resolved Hide resolved
onnx/defs/printer.cc Outdated Show resolved Hide resolved
break;
case TensorProto::DataType::TensorProto_DataType_STRING: {
const char* sep = "{";
for (auto& elt : tensor.string_data()) {
os << sep << "\"" << elt << "\"";
output << sep << "\"" << elt << "\"";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escaping "?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for escape character.

onnx/defs/printer.cc Outdated Show resolved Hide resolved
onnx/printer.py Outdated Show resolved Hide resolved
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>


def to_text(proto: Union[onnx.ModelProto, onnx.FunctionProto, onnx.GraphProto]) -> Text:
if isinstance(proto, onnx.ModelProto):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better this way. There should be a documentation for this function. I still think the call to SerializeToString() should be avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we avoid the serialization? (Of course, we can reimplement the pretty-printer in Python, but I assume that's not what you mean? That would lead to a potential divergence of behavior between C++ and Python. Since this usage is mostly for debugging and understanding etc., I think the efficiency issue is not that critical.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model is serialized here and then deserialized in C++. I think these steps should be avoided. The only change to make that happen is to change the C++ signature to take FunctionProto, ModelProto or a GraphProto and remove the call to ParseFromString. Users may call this function on deep learning models to compare two graphs. Avoiding serialization could make it a lot faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a Python ModelProto and a C++ ModelProto are different types, with different representations. How do you convert between those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I don't think that Python serialization and C++ deserialization are inverses of each other (at least, it is not documented to be so, and I doubt it is). So, their composition is not a "no-op", but it serves to convert from the python-representation to C++-representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some measurements. I was hoping to see that serialization is independant of the size but it does not. So call SerializeToString and then ParseFromString is really inefficient.

Figure_1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Python ModelProto and a C++ ModelProto are different: the Python ModelProto wraps the C++ ModelProto, probably in a capsule. I expect that the Python and C++ SerializeToString to produce the exact same sequence of bytes. That's what pybind11 does. They use swig to do it for ModelProto. It should work in a similar way.

Of course, SerializeToString will produce the same sequence of bytes, since that is defined by the Protobuf semantics. But that is not the question. The question is whether the C++ in-memory-representation is available in a Python ModelProto (as you imply). If so, what is the API/call we can use to get the C++ in-memory-representation from a Python ModelProto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The question is about the Deserialize part, that is: creating the in-memory-representation used by the two languages).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why the in-memory-representation would be different. I found this: https://github.com/pybind/pybind11_protobuf but I have some trouble to link it with onnx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #4274.

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
onnx/defs/parser.h Outdated Show resolved Hide resolved
onnx/defs/printer.cc Outdated Show resolved Hide resolved
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Copy link
Contributor

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thank you!

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@gramalingam
Copy link
Contributor Author

@liqunfu : could you take a look? This requires approval from archinfra sig too. Thanks!

Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one nit comment. Exciting to see Python printer has more supports!

# SPDX-License-Identifier: Apache-2.0
import onnx
import unittest
from onnx import helper, parser, printer, GraphProto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import alphabetically. Justin have an ongoing PR for it so going forward it would be great if we can always keep the order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this manually, but out of curiosity, what will happen with/after Justins PR? Is there a specific auto-format tool/command recommended for people to apply? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think all these format tools will be included into style.sh as mentioned here. Then developer can simply run single script to check whether there is any error before submitting CIs. For auto-apply, perhaps we can also introduce another script if needed.

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@gramalingam gramalingam enabled auto-merge (squash) June 21, 2022 20:30
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@gramalingam gramalingam merged commit b064f0a into onnx:main Jun 21, 2022
@gramalingam gramalingam deleted the python-print-2 branch June 21, 2022 22:47
broune pushed a commit to broune/onnx that referenced this pull request May 6, 2023
* Expose printing methods via Python

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Use indentation in printing

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Remove extra space

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Encapsulate printer logic as a class

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Delete VS config files

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add printer test

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add print support for ModelProto

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add to-string template

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Cleanup duplicate code

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Change Text to str

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix template specialization compile error

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Remove unused variable

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Add more print tests

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Flake8 warnings

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Address PR feedback

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* FIx mypy error

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix mypy errors

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix typing

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Format printer.cc

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Format code

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Support for escape char in strings

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Address PR comments

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Format file

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

* Fix python imports

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants