Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a Python interface for inference functions #4409

Merged
merged 19 commits into from Aug 15, 2022

Conversation

jbachurski
Copy link
Member

@jbachurski jbachurski commented Aug 4, 2022

Exposes a node-level API for performing type/shape inference to Python bindings.

Description
(Updated)

  • Adds a low-level function OpSchema._infer_node_outputs, to which serialised protobuf bytes are passed - arguments: schema, node, input types, input data/other inference context arguments, returns: inferred output types.
  • Adds a public interface shape_inference.infer_node_outputs:
def infer_node_outputs(
     schema: onnx.defs.OpSchema,
     node: onnx.NodeProto,
     input_types: Dict[str, onnx.TypeProto],
     input_data: Optional[Dict[str, onnx.TensorProto]] = None,
     input_sparse_data: Optional[Dict[str, onnx.SparseTensorProto]] = None,
 ) -> Dict[str, onnx.TypeProto]:  # output_types
  • Underneath this performs validation of passed in nodes/types, ValidationErrors and InferenceErrors are raised.
  • Should be the only used function, OpSchema._infer_node_outputs is internal to the implementation.
  • input_types keys are named like node declares its inputs.
  • input_data and input_sparse_data are used for passing in known constant inputs (as per C++ usage).
  • Could implement passing in partial data propagation results and GraphInferenceContext in the future.
  • Adds test cases for some inference calls from the Python side.

Motivation and Context

@jbachurski
Copy link
Member Author

@gramalingam Would you mind viewing the proposal if this sort of interface would work? In the future it could be extended to support the other arguments, they could be optional so it's backward compatible.

onnx/defs/schema.h Outdated Show resolved Hide resolved
@gramalingam
Copy link
Contributor

Updating the pyi definition for OpSchema would be useful.

onnx/cpp2py_export.cc Outdated Show resolved Hide resolved
@jbachurski
Copy link
Member Author

I think it might be useful to promote an API like this as a wrapper around the raw C++ method (that is, make if part of the public API instead of something in just this test-file). But, for that, I don't think we would want node to be optional. And am not sure about num_outputs either.

When it comes to the Python-side interface: A node has input/output names, operator name & domain, attributes. Own name and docstring don't matter. We are missing basically only the operator version it seems, and the input types. I see some variations of the possible signature:

  • Inputs & outputs
    • list[TypeProto] - we already have an ordering of fields inside the node. Node inputs/outputs are positional anyway, so making it explicit with ordering in a list seems okay.
    • dict[str, TypeProto] - since the node names the fields in definition. On the other hand since the node has to get created explicitly first anyway in this approach we might as well use the names that are within the node. I think this shouldn't mess with generics etc. since if we use a name multiple times is has to have the same type - we get some very simple consistency check.
  • Operator version
    • version: int - get the current operator schema from a list at this current version based on what is in the node.
    • OpSchema- make the user explicitly provide the schema themselves. Is more redundant, since the Node already has the domain and identifier defined.

Also, where would we put this function? shape_inference would be fine, but really it should have been called inference (as it also infers types and other stuff?). Could also put it in helper. Any ideas?

Let me know your thoughts and I'll get started on some improvements in the Python-interface area.

@jbachurski
Copy link
Member Author

@gramalingam I pushed some changes:

std::unordered_map<std::string, py::bytes> CallNodeInferenceFunction(
    OpSchema* schema,
    const py::bytes& nodeBytes,
    std::unordered_map<std::string, py::bytes> valueTypesByNameBytes,
    std::unordered_map<std::string, py::bytes> inputDataByNameBytes,
    std::unordered_map<std::string, py::bytes> inputSparseDataByNameBytes) {
  • Now returns a map, since the inputs are a map already. Doesn't return entries were the output TypeProto was not initialised (inference failed implicitly?).
  • Added OpSchema::Verify before the call to the inference (checks if the count of inputs/outputs is correct, etc.), and OpSchema::CheckInputOutputType afterwards (checks if the types are consistent with the signature). Both should throw ValidationError.
  • At the same time, inference itself may throw InferenceError. Interestingly, this may also throw errors related to attributes (so things like Cast will check here if exactly one of the right attributes is set).
  • Let me know if I should handle exceptions there in some more involved way. I think what is thrown already reflects most of what check_model and infer_shapes do.
# class OpSchema:
    def infer_node_outputs(self, node_proto: bytes, value_types: Dict[str, bytes],
                           input_data: Dict[str, bytes], input_sparse_data: Dict[str, bytes]
                           ) -> Dict[str, bytes]: ...
  • This is just a binding for the C++ function. I added it to defs.pyi as above.
  • My general feeling is that the attributes after value_types should have empty defaults so that in the future generatedShapeData and graphInferenceContext may be supported as well (but this involves, I think, significant work).
# shape_inference.py
def infer_node_outputs(
    schema: onnx.defs.OpSchema, node: onnx.NodeProto, input_types: Dict[str, onnx.TypeProto],
    input_data: Optional[Dict[str, onnx.TensorProto]] = None,
    input_sparse_data: Optional[Dict[str, onnx.SparseTensorProto]] = None
) -> Dict[str, onnx.TypeProto]: ...
  • The public-facing Python function does all of the conversions and expects proper types.
  • Already has default arguments, which we may change in the future. Reflect the actual underlying implementation.
  • Does some checks if all the expected keys are present and only copies over what is actually required on C++ side for some performance.

Let me know what you think of these changes! I also added some more tests.

@JakubBachurskiQC JakubBachurskiQC force-pushed the python-inference-function branch 2 times, most recently from 4363146 to da9b92a Compare August 11, 2022 09:26
@@ -85,6 +136,10 @@ PYBIND11_MODULE(onnx_cpp2py_export, onnx_cpp2py_export) {
return py::bytes(bytes);
})
.def_property_readonly("has_context_dependent_function", &OpSchema::HasContextDependentFunction)
.def("infer_node_outputs", CallNodeInferenceFunction,
py::arg("nodeBytes"), py::arg("valueTypesByNameBytes"),
py::arg("inputDataByNameBytes") = std::unordered_map<std::string, py::bytes>{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is better than an empty container as a default value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I know what you mean here. This is C++ side, so there's no None. Do you mean to make it a pointer and use nullptr/use std::optional?
I think that this wouldn't be a lot better as the map isn't mutated and also in this it makes sense that the default is "no additional information" (for the inference context we have to pass in some map, and it makes sense to make it empty as there's no input data we know).
Could you provide a code example for the change?

@jbachurski jbachurski marked this pull request as ready for review August 11, 2022 21:49
@jbachurski jbachurski requested a review from a team as a code owner August 11, 2022 21:49
Copy link
Contributor

@gramalingam gramalingam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks very much!

Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
Signed-off-by: KubinGH <kbachurski@gmail.com>
@jbachurski
Copy link
Member Author

Thanks!
Should be ready for merge, then?

@gramalingam gramalingam merged commit 1f3cecc into onnx:main Aug 15, 2022
def infer_node_outputs(
schema: onnx.defs.OpSchema,
node: onnx.NodeProto,
input_types: Dict[str, onnx.TypeProto],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we explicitly require a dictionary for these params? If not prefer a more generic type like Mapping.

——-

Use more generic types for input parameters and specific types for return values. For example, a function may take a Sequence and return a List.

Diagram for reference: https://gist.github.com/justinchuby/4021cebe9e093f636759a88de325c85f

broune pushed a commit to broune/onnx that referenced this pull request May 6, 2023
* Add an experimental infer_types implementation

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Create infers_.py

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Fix naming

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Replace example script with tests

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Use unittest

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Use unittest

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Fix wrapper function signature

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Get rid of existing getter

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Run C++ formatter

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Improve C++ implementation, Python interface, tests

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Fix CI

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Use explicit enums for tensor elements

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Run black and isort for new linter

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Specify reshape vector dtype

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Make binding-side attributes optional

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Run clang-format

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Make OpSchema method explicitly protected

Signed-off-by: KubinGH <kbachurski@gmail.com>

* Missed stub file protected

Signed-off-by: KubinGH <kbachurski@gmail.com>

Signed-off-by: KubinGH <kbachurski@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants