New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CELU node as a Function #2575
Implement CELU node as a Function #2575
Conversation
1c596cb
to
77d5293
Compare
onnx/defs/nn/defs.cc
Outdated
{// nodes: {outputs, op, inputs, attributes} | ||
{{"X_alpha"}, | ||
"Div", | ||
{"X", "alpha"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't figure out how to reference the attribute of the Celu
instruction as an argument of the node Div
.
I had a look at the helpers FunctionBodyHelper::BuildNodes
, FunctionBodyHelper::Const
and MakeRefAttribute
without success.
Could you show me some documentation / example of this usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you for your answer. :)
Unfortunately I did read this line and the usage in MeanVarianceNormalization but I am still confused. I tried different syntax that compile, but since the shape inference test fail it is probably not right graph.
As I understand, I can create a graph equivalent to Div(X, alpha=alpha)
using
{{"X_alpha"},
"Div",
{"X"},
{MakeRefAttribute("alpha", AttributeProto::FLOAT)}
},
But it doesn't seams to be what I am lloking for, probably because Div
have two input and 0 attribute.
How can write the equivalent of Div(X, alpha)
(i.e. use this reference as the second argument of Div)? I would like to write
{{"X_alpha"},
"Div",
{"X", MakeRefAttribute("alpha", AttributeProto::FLOAT)}
}
but obviously this is not possible since std::String != AttributeProtoWrapper
😅
In the MeanVarianceNormalization it seams all the usage of axis simply forward the attribute to the underlying Ops, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I see, you are quite right. I think you should be able to move from an attribute to a value by adding a Constant
node. I'm not sure if it will be okay with providing a scalar for a tensor though 😬.
Also, the current helper doesn't allow you to use different names for the attr (value
) and the ref_attr (alpha
), so you'd need to add that.
FunctionBodyHelper::BuildNodes(
{// nodes: {outputs, op, inputs, attributes}
{{"alpha"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}},
{{"X_alpha"}, "Div", {"X", "alpha"}},
{{"Y"}, "Elu", {"X_alpha"}}})
onnx/defs/operator_sets.h
Outdated
@@ -635,6 +635,7 @@ class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Pad); | |||
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Gemm); | |||
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, If); | |||
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, NonMaxSuppression); | |||
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Celu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in version 12, as the latest released version is 11 🙂
onnx/defs/nn/defs.cc
Outdated
{MakeRefAttribute("alpha", AttributeProto::FLOAT)} | ||
}, | ||
{{"Y"}, "Elu", {"X_alpha"}}}))); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function body is NOT a correct "sub-graph" representing the formula you described. Function body is actually a graph to represent the math formula you mentioned with other ops, in this case, it should be "Constant", "Div", "Elu".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need some information on how to convert the Scalar
Attribute into a Constant Tensor
. Do you know how to accomplish this?
Thanks @TMVector and @linkerzhang for your feedback.
I made some attempt to create a constant node that recover the If I run the shape inference test with the following body, I get a nice "Y" of empty shape.
but if I try to run the shape inference test with the attribute (see code below), then no "Y" is inferred at all.
On my side I am stuck. Looking at |
onnx/defs/nn/defs.cc
Outdated
"Div", | ||
{"X", "alpha"} | ||
}, | ||
{{"U"}, "Elu", {"X_alpha"}}}))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Pytorch, CELU equation is
CELU(x)=max(0,x)+min(0,α∗(exp(x/α)−1))
while ELU uses
ELU(x)=max(0,x)+min(0,α∗(exp(x)−1))
In addition, here the function body is doing
ONNX_CELU(x)=max(0,x/α)+min(0,(exp(x/α)−1))
which doesn't exactly match Pytorch CELU. Is this expected? Or I miss something?
Do we have a numpy reference implementation for generating tests? We should also check if that implementation matches Pytorch CELU.
[Update] I saw your numpy reference implementation. Nice! Can you please provide a short comparison to show it performs the same as Pytorch CELU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
You are completely right, the ELU implementation of Pytorch is different from ONNX Elu, and it is not possible to express CELU from ONNX's ELU. Thanks you for noticing it, I am working on a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a code testing the differences between Pytorch CELU and the implementation (with corrected parenthesis) I provided.
import numpy as np
import torch
def onnx_celu(input_data, alpha=1.0):
positive_input = np.maximum(0, input_data)
negative_input = np.minimum(0, alpha * (np.exp(input_data / alpha) - 1))
output_data = positive_input + negative_input
return output_data
def torch_celu(input_data, alpha=1.0):
return torch.nn.CELU(alpha=alpha)(torch.Tensor(input_data)).numpy()
input_data = np.random.randn(1, 2, 3).astype('float32')
alpha = 2
assert (onnx_celu(input_data, alpha) == torch_celu(input_data, alpha)).all()
onnx/defs/nn/defs.cc
Outdated
"Constrain input and output types to floating-point tensors.") | ||
.FunctionBody(FunctionBodyHelper::BuildNodes( | ||
{// nodes: {outputs, op, inputs, attributes} | ||
//FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the comments here left intendedly?
As described here, the |
Unfortunately, after hours digging documentation and code, I can't figure a way to convert a scalar (from PS: If this is not possible, then maybe there is still a way to cheat with the |
cb02fd5
to
8068de6
Compare
I will try something on my side. In the meanwhile, what do you think if we make |
I think "I can't figure a way to convert a scalar (from MakeRefAttribute("alpha", Attribute\ Proto::FLOAT)) to a tensor constant" needs to be fixed. Logically, the body graph is referring an attribute outside (which should be an AttributeProto) and the "Constant" OP will use the attribute and output a Tensor. |
@wschin Personally, I really don't mind moving alpha as an input. But it may be very confusing for both developper of user if CELU and ELU have completely different interface. If this approach get merged, it also means supporting it for a long time. 😅 @linkerzhang Would be awesome. I think anyone who implement a new function op that do not directly forward its arguments will end up having the exact same problem, and a clean way to move the scalars into the graph would solve this. Do you have any idea on how this could be archived? |
@jeremycochoy The AttributeProto itself was designed to support this kind of reference already, though the utility function is missing, I guess. PR #2583 should resolve it. MakeRefAttribute("value", "alpha", AttributeProto::FLOAT) for the Constant node in the function body. |
@linkerzhang I think that will work if the |
@linkerzhang |
We might need to support @TMVector, @jeremycochoy, any comments? |
To me it sounds the best solution, and it definitively makes sense to convert both float and floats to their corresponding tensors. |
@jeremycochoy yep, it's the same as the one in current PR (I missed the part). One more solution is removing AttributeProto and having TensorProto be used to store Attribute, to unify the two type systems (one tensor type system and one attribute type system). AttributeProto and its type system were designed at the very beginning, to introduce a simpler way of having "scalar" attribute data. However, it introduces many troubles when specifying operator spec. For example, it's really hard to specify when attribute needs to share the same type as input or output (now most cases are using "Tensor" attribute type). This PR reminds me again that the benefit of AttributeProto and its type system is not that much, while many troubles introduced. I'd suggest to remove them and have only one type system in ONNX. @gramalingam @wschin @jeremycochoy @TMVector What do you think please? |
onnx/defs/attr_proto_util.cc
Outdated
@@ -58,4 +58,15 @@ AttributeProto MakeRefAttribute( | |||
return a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this function to call the overridden one added below please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think you can just merge your PR and I can rebase this branch on top of it. I remember you introduced documentation, your ordering of arguments sounds more natural to me, and I have nothing against splitting PRs in smaller piece.
Edit: I rebased on top of your commit. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I abandoned my PR this morning (realized it's duplicate with changes in your PR). Let me get it back and merge it in this way :)
onnx/defs/nn/defs.cc
Outdated
{{"Elu_Result"}, "Elu", {"X_alpha"}, {{"alpha", 1.f}}}, | ||
{{"Y"}, "Mul", {"alpha", "Elu_Result"}}}))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be equivalent to pass Celu.alpha to Elu.alpha instead of setting Elu.alpha=1 and then multiplying by Celu.alpha?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. 😅 (Because the alpha is applied only on the second member of the + operator in the Elu equation).
Explanations from wschin: #2575 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, quite right.
Btw should Celu be defined in onnx/defs/math/defs.cc
? -- that's where Relu, Elu, etc. are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it would make more sense to place it net to relu / elu / leaky relu. I will change it this evening.
Done :)
Co-Authored-By: Jonny Shipton <tmvector@gmail.com>
Co-Authored-By: Jonny Shipton <tmvector@gmail.com>
686fb71
to
096799e
Compare
096799e
to
db6cc95
Compare
@jeremycochoy : This is an excellent description for a new operator (explaining why it's being added, where it came from, the actual equation used, and even an alternate Python implementation), and I'll point to it as a good example in the future. 👍 |
Thanks, I had no idea about the useful CELU. Is there a reason to use
Yes, only 20% faster on the other side, and maybe with values all over the place (ca. 50-50% split?) it's not as useful an optimization as I would think? |
To be honest, I am efraid that if you want to have something really optimized you'd need the backend to provide a specific implementation for celu that replace this graph function (this is 100% possible and any backend can chose th implementation that fits it's specific targeted hardware). |
Yes, and FYI, I found an even better activation function (in part since it's also continuously differentiable, why better than ReLU): Mish: A Self Regularized Non-Monotonic Neural Activation Function
It's also compared to ELU and some variant (while not CELU). Also interesting, and supposed advantages contrary to those supposed above (such as bounded at low): PLU: The Piecewise Linear Unit Activation Function I implemented it like this for fewer assembly instructions (and only one branch):
And if you're interested, a very simple idea here (using two "opposite", but similar, I wander if you could do similar for two dissimilar, e.g. those above?): https://arxiv.org/pdf/1709.04054.pdf
|
* Implement CELU node as a Function * Add shape inference test * Update onnx/defs/nn/defs.cc Co-Authored-By: Jonny Shipton <tmvector@gmail.com> * Update onnx/test/shape_inference_test.py Co-Authored-By: Jonny Shipton <tmvector@gmail.com> * Set operator version to 12 * ? * WIP. But the constant node can't be shape infered. * Rewrite correct implementation based on equation instead of Elu * Fix parentesis in formula * wschin suggestions from onnx#2583 PR * Fix a bug in inferene code and simplify graph * Fix typo in Celu test * Udapte docs * Move Celu operator next to Elu (math/defs.cc) Co-authored-by: Jonny Shipton <tmvector@gmail.com> Co-authored-by: Ke Zhang <kezhan@microsoft.com>
I had a look at your guidelines and tutorial for adding missing op according to #1121 (comment) .
Description:
The CELU operator has been required at this issue #1121 and is now part of the new operator request list #1646 .
First introduced in Continuously Differentiable Exponential Linear Units the CELU is similar to the ELU operation.
Given the attribute α, CELU is a pointwise application of the following formula:
and allow
leakage
of the gradient in the negative values, while having a differential remaining continuous for any value of alpha (which is not the case of ELU).It is implemented in Pytorch based on the Pytorch-ELU operation:
A similar approach for ONNX-ELU is
alpha * ELU(x / alpha, alpha=1)
.An alternative implementation in numpy, not requiring the Pytorch-ELU operator is given in the tests:
A first implementation was intended in #1676 but never merged.
Graph
The CELU function is implemented using the expression
alpha * ELU(x / alpha, alpha=1)
. This make the graph smaller (and easier to read) than using each individual functions (sum, dub, div, exp, mult) present in the expression of the operator. It also leverage good supports of `Elu in most onnx backend implementations.Tests
A unit test and a shape inference test are available following the tests of
MeanVarianceNormalization
function.