Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom ops Rotary #738
base: main
Are you sure you want to change the base?
Add custom ops Rotary #738
Changes from 7 commits
1c9c4a4
52f351c
f67d3b1
29c460f
ec3887a
7ba45f2
c406f42
1c535cb
b67f62c
d70a588
ade1ae1
9835fe8
dfeafa5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should do
LeftRotary
andRightRotary
so that the compute function becomes stateless.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is done later here: https://github.com/microsoft/onnxruntime-extensions/pull/738/files#diff-643fdcb552aafbcb9a86bc8d48cab50dea3723b20b3105880e081b2e5b1b8da9R24. I hope the compiler is able to remove the unnecessary code at compilation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It becomes more reasonable to split them in the beginning instead of relying on compiler. If they are two ops, we define two ops. No need to make a mega op to support several cases; this will make future improvement harder and error-prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the only place I did that. There is only one op from ort point of view and it is Rotary but we have different implementation depending the argument value, the same goes with the input types. Rotary(X, side=LEFT) = Neg(Rotary(X, side=RIGHT)). Do you want to have distinct onnx names as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is
split
related to the formula above?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Rotary replaces something like:
When I implemented this kernel, I found it was equal splits on llama but I wondered if it could be not equal splits so I chose to leave this parameter so that I could still use the optimized and let the kernel fails when the splits are not equal. Then I would know not equal splits must be implemented. I can remove if you know this case never happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure the
exact
behavior is described in comments for this op. I don't see the use and explanation ofsplit
in the equations. In addition to math symbols, you can use ONNX sub-graph to describe them.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assume even split and throw otherwise. With
split
, you will need to introduce a sub-graph to compute the rightsplit
from the shape ofX
. That will slows down ORT..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The formula remains the same, whether or not the splits are equal or not.