New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Print utility extension #4246
Print utility extension #4246
Conversation
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
onnx/defs/printer.cc
Outdated
break; | ||
case TensorProto::DataType::TensorProto_DataType_STRING: { | ||
const char* sep = "{"; | ||
for (auto& elt : tensor.string_data()) { | ||
os << sep << "\"" << elt << "\""; | ||
output << sep << "\"" << elt << "\""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escaping "
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added support for escape character.
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
|
||
|
||
def to_text(proto: Union[onnx.ModelProto, onnx.FunctionProto, onnx.GraphProto]) -> Text: | ||
if isinstance(proto, onnx.ModelProto): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better this way. There should be a documentation for this function. I still think the call to SerializeToString() should be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we avoid the serialization? (Of course, we can reimplement the pretty-printer in Python, but I assume that's not what you mean? That would lead to a potential divergence of behavior between C++ and Python. Since this usage is mostly for debugging and understanding etc., I think the efficiency issue is not that critical.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model is serialized here and then deserialized in C++. I think these steps should be avoided. The only change to make that happen is to change the C++ signature to take FunctionProto, ModelProto or a GraphProto and remove the call to ParseFromString. Users may call this function on deep learning models to compare two graphs. Avoiding serialization could make it a lot faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a Python ModelProto and a C++ ModelProto are different types, with different representations. How do you convert between those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I don't think that Python serialization and C++ deserialization are inverses of each other (at least, it is not documented to be so, and I doubt it is). So, their composition is not a "no-op", but it serves to convert from the python-representation to C++-representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Python ModelProto and a C++ ModelProto are different: the Python ModelProto wraps the C++ ModelProto, probably in a capsule. I expect that the Python and C++ SerializeToString to produce the exact same sequence of bytes. That's what pybind11 does. They use swig to do it for ModelProto. It should work in a similar way.
Of course, SerializeToString will produce the same sequence of bytes, since that is defined by the Protobuf semantics. But that is not the question. The question is whether the C++ in-memory-representation is available in a Python ModelProto (as you imply). If so, what is the API/call we can use to get the C++ in-memory-representation from a Python ModelProto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The question is about the Deserialize part, that is: creating the in-memory-representation used by the two languages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why the in-memory-representation would be different. I found this: https://github.com/pybind/pybind11_protobuf but I have some trouble to link it with onnx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #4274.
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thank you!
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@liqunfu : could you take a look? This requires approval from archinfra sig too. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one nit comment. Exciting to see Python printer has more supports!
onnx/test/printer_test.py
Outdated
# SPDX-License-Identifier: Apache-2.0 | ||
import onnx | ||
import unittest | ||
from onnx import helper, parser, printer, GraphProto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import alphabetically. Justin have an ongoing PR for it so going forward it would be great if we can always keep the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can fix this manually, but out of curiosity, what will happen with/after Justins PR? Is there a specific auto-format tool/command recommended for people to apply? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think all these format tools will be included into style.sh as mentioned here. Then developer can simply run single script to check whether there is any error before submitting CIs. For auto-apply, perhaps we can also introduce another script if needed.
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
…into python-print-2
* Expose printing methods via Python Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Use indentation in printing Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Remove extra space Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Encapsulate printer logic as a class Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Delete VS config files Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add printer test Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add print support for ModelProto Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add to-string template Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Cleanup duplicate code Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Change Text to str Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix template specialization compile error Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Remove unused variable Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add more print tests Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Flake8 warnings Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Address PR feedback Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * FIx mypy error Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix mypy errors Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix typing Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Format printer.cc Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Format code Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Support for escape char in strings Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Address PR comments Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Format file Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Fix python imports Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Description
Extend the printer utilities and expose them in Python.
Motivation and Context
#4231
Limitations
Need support for external tensors. A textual-representation is not suitable for large tensor constants.