Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] Eliminate duplicate string in operator document to prevent binary size increasing #4138

Open
jcwchen opened this issue Apr 14, 2022 · 5 comments
Labels
enhancement Request for new feature or operator good first issue Good for getting started tracking Tracking issues

Comments

@jcwchen
Copy link
Member

jcwchen commented Apr 14, 2022

Feature Request

System information

ONNX version (you are using): 1.11

What is the problem that this feature solves?

Prevent binary size increasing.

Describe the feature

While bumping version for certain operator, currently people just created another string for the new document for the new version. Sometimes the document is very similar to previous document with few additional sentences or even is exactly the same as the previous one (for instance, add more type support does not influence the document content). It's a potential issue for binary size increasing if ONNX keeps using different string for the same/similar content. Also, using the same string is easier to maintain if we would like to update the same content for different versions. We should go through all operators under all defs.cc and old.cc to fix this. Open this issue to track this work item.

Furthermore, going forward we should try our best to share the string when bumping versions.

Will this influence the current api?

No

Feature Area

Package binary size.

Are you willing to contribute it (Y/N):

Y

@jcwchen jcwchen added the enhancement Request for new feature or operator label Apr 14, 2022
@jcwchen jcwchen changed the title Eliminate duplicate string in operator document to prevent binary size increasing [Tracking] Eliminate duplicate string in operator document to prevent binary size increasing Apr 14, 2022
@jcwchen jcwchen added the tracking Tracking issues label Apr 14, 2022
@jcwchen jcwchen added the good first issue Good for getting started label Jun 9, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2023
@jcwchen jcwchen reopened this Jul 1, 2023
@rajveer43
Copy link

It is still open?

@jcwchen
Copy link
Member Author

jcwchen commented Aug 26, 2023

Thanks for catching this. Yes, it is still open and anyone interested feel free to pick this up.

@rajveer43
Copy link

Can you give some more insights where what needs to be changed. bit explanation will help me understand the task better.
Then a Quick PR will be done to review and work faster

Thanks for catching this. Yes, it is still open and anyone interested feel free to pick this up.

@jcwchen
Copy link
Member Author

jcwchen commented Aug 27, 2023

Sure. Thanks for your interest. Typically operators in different opset_versions have duplicate string, which can be found from onnx/defs/xxx/defs.cc or onnx/defs/xxx/old.cc. Take Reshape-14 and Reshape-13 as an example:

onnx/onnx/defs/tensor/old.cc

Lines 5002 to 5016 in b8482d7

static const char* Reshape_ver14_doc = R"DOC(
Reshape the input tensor similar to numpy.reshape.
First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor.
At most one dimension of the new shape can be -1. In this case, the value is
inferred from the size of the tensor and the remaining dimensions. A dimension
could also be 0, in which case the actual dimension value is unchanged (i.e. taken
from the input tensor). If 'allowzero' is set, and the new shape includes 0, the
dimension will be set explicitly to zero (i.e. not taken from input tensor).
Shape (second input) could be an empty shape, which means converting to a scalar.
The input tensor's shape and the output tensor's shape are required to have the same number of elements.
If the attribute 'allowzero' is set, it is invalid for the specified shape to
contain both a zero value and -1, as the value of the dimension corresponding
to -1 cannot be determined uniquely.
)DOC";

and

static const char* Reshape_ver13_doc = R"DOC(
Reshape the input tensor similar to numpy.reshape.
First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor.
At most one dimension of the new shape can be -1. In this case, the value is
inferred from the size of the tensor and the remaining dimensions. A dimension
could also be 0, in which case the actual dimension value is unchanged (i.e. taken
from the input tensor). Shape (second input) could be an empty shape, which means converting to a scalar.
The input tensor's shape and the output tensor's shape are required to have the same number of elements.)DOC";

The documents are very similar and only have few different sentences. Ideally we should make the duplicate part as a common string. You can directly modify the C++ files (defs.cc or old.cc).

@rajveer43
Copy link

OKay Got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature or operator good first issue Good for getting started tracking Tracking issues
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants