Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CELU node as a Function #2575

Merged
merged 16 commits into from Feb 16, 2020

Conversation

jeremycochoy
Copy link
Contributor

@jeremycochoy jeremycochoy commented Jan 30, 2020

I had a look at your guidelines and tutorial for adding missing op according to #1121 (comment) .

Description:

The CELU operator has been required at this issue #1121 and is now part of the new operator request list #1646 .

First introduced in Continuously Differentiable Exponential Linear Units the CELU is similar to the ELU operation.

Given the attribute α, CELU is a pointwise application of the following formula:

CELU(x)=max(0,x)+min(0,α*(exp(x/α)−1))

and allow leakage of the gradient in the negative values, while having a differential remaining continuous for any value of alpha (which is not the case of ELU).

It is implemented in Pytorch based on the Pytorch-ELU operation:

Tensor celu(const Tensor & self, Scalar alpha) {
  double inv_alpha = 1. / alpha.to<double>();
  return at::elu(self, alpha, Scalar(1.0), Scalar(inv_alpha));
}

A similar approach for ONNX-ELU is alpha * ELU(x / alpha, alpha=1).

An alternative implementation in numpy, not requiring the Pytorch-ELU operator is given in the tests:

import numpy as np

input_data = np.random.randn(1, 2, 3)
alpha = 2

positive_input = np.maximum(0, input_data)
negative_input = np.minimum(0, alpha * (np.exp(input_data / alpha) - 1))
output_data = positive_input + negative_input

A first implementation was intended in #1676 but never merged.

Graph

The CELU function is implemented using the expression alpha * ELU(x / alpha, alpha=1). This make the graph smaller (and easier to read) than using each individual functions (sum, dub, div, exp, mult) present in the expression of the operator. It also leverage good supports of `Elu in most onnx backend implementations.

Tests

A unit test and a shape inference test are available following the tests of MeanVarianceNormalization function.

{// nodes: {outputs, op, inputs, attributes}
{{"X_alpha"},
"Div",
{"X", "alpha"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to reference the attribute of the Celu instruction as an argument of the node Div.

I had a look at the helpers FunctionBodyHelper::BuildNodes, FunctionBodyHelper::Const and MakeRefAttribute without success.

Could you show me some documentation / example of this usage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment about this on FunctionBodyHelper::BuildNodes here, and it's used in the MeanVarianceNormalization operator function here. Shout if you have trouble :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you for your answer. :)

Unfortunately I did read this line and the usage in MeanVarianceNormalization but I am still confused. I tried different syntax that compile, but since the shape inference test fail it is probably not right graph.

As I understand, I can create a graph equivalent to Div(X, alpha=alpha) using

           {{"X_alpha"},
             "Div",
             {"X"},
             {MakeRefAttribute("alpha", AttributeProto::FLOAT)}
            },

But it doesn't seams to be what I am lloking for, probably because Div have two input and 0 attribute.

How can write the equivalent of Div(X, alpha) (i.e. use this reference as the second argument of Div)? I would like to write

           {{"X_alpha"},
             "Div",
             {"X", MakeRefAttribute("alpha", AttributeProto::FLOAT)}
            }

but obviously this is not possible since std::String != AttributeProtoWrapper 😅

In the MeanVarianceNormalization it seams all the usage of axis simply forward the attribute to the underlying Ops, right ?

Copy link
Contributor

@TMVector TMVector Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I see, you are quite right. I think you should be able to move from an attribute to a value by adding a Constant node. I'm not sure if it will be okay with providing a scalar for a tensor though 😬.

Also, the current helper doesn't allow you to use different names for the attr (value) and the ref_attr (alpha), so you'd need to add that.

FunctionBodyHelper::BuildNodes(
           {// nodes: {outputs, op, inputs, attributes}
            {{"alpha"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}},
            {{"X_alpha"}, "Div", {"X", "alpha"}},
            {{"Y"}, "Elu", {"X_alpha"}}})

@jeremycochoy jeremycochoy marked this pull request as ready for review January 30, 2020 18:23
@jeremycochoy jeremycochoy requested review from a team as code owners January 30, 2020 18:23
@@ -635,6 +635,7 @@ class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Pad);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Gemm);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, If);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, NonMaxSuppression);
class ONNX_OPERATOR_SET_SCHEMA_CLASS_NAME(Onnx, 11, Celu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in version 12, as the latest released version is 11 🙂

onnx/defs/nn/defs.cc Outdated Show resolved Hide resolved
{MakeRefAttribute("alpha", AttributeProto::FLOAT)}
},
{{"Y"}, "Elu", {"X_alpha"}}})));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function body is NOT a correct "sub-graph" representing the formula you described. Function body is actually a graph to represent the math formula you mentioned with other ops, in this case, it should be "Constant", "Div", "Elu".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need some information on how to convert the Scalar Attribute into a Constant Tensor. Do you know how to accomplish this?

@jeremycochoy
Copy link
Contributor Author

Thanks @TMVector and @linkerzhang for your feedback.

  • Regarding the Tensor/Scalar issue raised by TMVector, I can say that the following code pass the shape inference test. But I don't know if it is enough to say if everything is fine if the second argument is a scalar and not a tensor (I don't know if this 1.f isn't converted to a tensor implicitly).
          {// nodes: {outputs, op, inputs, attributes}                                                           
            FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}},                                        
            {{"X_alpha"},
             "Div",
             {"X", "alpha"}
            },
            {{"Y"}, "Elu", {"X_alpha"}}})));
  • Regarding the second problem (using the actual attribute):

I made some attempt to create a constant node that recover the alpha attribute from the Celu operator using AtributeProto. Although the code compile, the shape inference test is a huge failure. In order to understand what is happening, I simplified the body of the function.

If I run the shape inference test with the following body, I get a nice "Y" of empty shape.

FunctionBodyHelper::Const<float>("Y", 1.0f)
E             name: "Y"
E             type {
E               tensor_type {
E                 elem_type: 1
E                 shape {
E                 }
E               }
E             }

but if I try to run the shape inference test with the attribute (see code below), then no "Y" is inferred at all.

FunctionBodyHelper::NodeDef{{"Y"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}}
E       AssertionError: ({'X', 'Y'}, {'X'})
E       assert {'X', 'Y'} == {'X'}
E         Extra items in the left set:
E         'Y'
E         Use -v to get the full diff

On my side I am stuck. Looking at Const and ToVector implementation didn't gave me any new idea to test. Do you have any idea of what is happening? Is it related to this Tensor/Scalar problem? 🙃

@prasanthpul prasanthpul added the operator Issues related to ONNX operators label Feb 1, 2020
onnx/defs/nn/defs.cc Outdated Show resolved Hide resolved
"Div",
{"X", "alpha"}
},
{{"U"}, "Elu", {"X_alpha"}}})));
Copy link
Contributor

@wschin wschin Feb 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Pytorch, CELU equation is

CELU(x)=max(0,x)+min(0,α∗(exp(x/α)−1))

while ELU uses

ELU(x)=max(0,x)+min(0,α∗(exp(x)−1))

In addition, here the function body is doing

ONNX_CELU(x)=max(0,x/α)+min(0,(exp(x/α)−1))

which doesn't exactly match Pytorch CELU. Is this expected? Or I miss something?

Do we have a numpy reference implementation for generating tests? We should also check if that implementation matches Pytorch CELU.

[Update] I saw your numpy reference implementation. Nice! Can you please provide a short comparison to show it performs the same as Pytorch CELU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

You are completely right, the ELU implementation of Pytorch is different from ONNX Elu, and it is not possible to express CELU from ONNX's ELU. Thanks you for noticing it, I am working on a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a code testing the differences between Pytorch CELU and the implementation (with corrected parenthesis) I provided.

import numpy as np
import torch

def onnx_celu(input_data, alpha=1.0):
    positive_input = np.maximum(0, input_data)
    negative_input = np.minimum(0, alpha * (np.exp(input_data / alpha) - 1))
    output_data = positive_input + negative_input
    return output_data

def torch_celu(input_data, alpha=1.0):
    return torch.nn.CELU(alpha=alpha)(torch.Tensor(input_data)).numpy()

input_data = np.random.randn(1, 2, 3).astype('float32')
alpha = 2

assert (onnx_celu(input_data, alpha) == torch_celu(input_data, alpha)).all()

"Constrain input and output types to floating-point tensors.")
.FunctionBody(FunctionBodyHelper::BuildNodes(
{// nodes: {outputs, op, inputs, attributes}
//FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the comments here left intendedly?

@wschin
Copy link
Contributor

wschin commented Feb 1, 2020

Thanks @TMVector and @linkerzhang for your feedback.

  • Regarding the Tensor/Scalar issue raised by TMVector, I can say that the following code pass the shape inference test. But I don't know if it is enough to say if everything is fine if the second argument is a scalar and not a tensor (I don't know if this 1.f isn't converted to a tensor implicitly).
          {// nodes: {outputs, op, inputs, attributes}                                                           
            FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}},                                        
            {{"X_alpha"},
             "Div",
             {"X", "alpha"}
            },
            {{"Y"}, "Elu", {"X_alpha"}}})));
  • Regarding the second problem (using the actual attribute):

I made some attempt to create a constant node that recover the alpha attribute from the Celu operator using AtributeProto. Although the code compile, the shape inference test is a huge failure. In order to understand what is happening, I simplified the body of the function.

If I run the shape inference test with the following body, I get a nice "Y" of empty shape.

FunctionBodyHelper::Const<float>("Y", 1.0f)
E             name: "Y"
E             type {
E               tensor_type {
E                 elem_type: 1
E                 shape {
E                 }
E               }
E             }

but if I try to run the shape inference test with the attribute (see code below), then no "Y" is inferred at all.

FunctionBodyHelper::NodeDef{{"Y"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}}
E       AssertionError: ({'X', 'Y'}, {'X'})
E       assert {'X', 'Y'} == {'X'}
E         Extra items in the left set:
E         'Y'
E         Use -v to get the full diff

On my side I am stuck. Looking at Const and ToVector implementation didn't gave me any new idea to test. Do you have any idea of what is happening? Is it related to this Tensor/Scalar problem? 🙃

As described here, the value attribute should be an tensor, not a float.

@jeremycochoy
Copy link
Contributor Author

jeremycochoy commented Feb 1, 2020

Thanks @TMVector and @linkerzhang for your feedback.

  • Regarding the Tensor/Scalar issue raised by TMVector, I can say that the following code pass the shape inference test. But I don't know if it is enough to say if everything is fine if the second argument is a scalar and not a tensor (I don't know if this 1.f isn't converted to a tensor implicitly).
          {// nodes: {outputs, op, inputs, attributes}                                                           
            FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}},                                        
            {{"X_alpha"},
             "Div",
             {"X", "alpha"}
            },
            {{"Y"}, "Elu", {"X_alpha"}}})));
  • Regarding the second problem (using the actual attribute):

I made some attempt to create a constant node that recover the alpha attribute from the Celu operator using AtributeProto. Although the code compile, the shape inference test is a huge failure. In order to understand what is happening, I simplified the body of the function.
If I run the shape inference test with the following body, I get a nice "Y" of empty shape.

FunctionBodyHelper::Const<float>("Y", 1.0f)
E             name: "Y"
E             type {
E               tensor_type {
E                 elem_type: 1
E                 shape {
E                 }
E               }
E             }

but if I try to run the shape inference test with the attribute (see code below), then no "Y" is inferred at all.

FunctionBodyHelper::NodeDef{{"Y"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}}
E       AssertionError: ({'X', 'Y'}, {'X'})
E       assert {'X', 'Y'} == {'X'}
E         Extra items in the left set:
E         'Y'
E         Use -v to get the full diff

On my side I am stuck. Looking at Const and ToVector implementation didn't gave me any new idea to test. Do you have any idea of what is happening? Is it related to this Tensor/Scalar problem? 🙃

As described here, the value attribute should be an tensor, not a float.

Unfortunately, after hours digging documentation and code, I can't figure a way to convert a scalar (from MakeRefAttribute("alpha", Attribute\ Proto::FLOAT)) to a tensor constant.
I left a comment pointing the the problematic line in the body of the function.

PS: If this is not possible, then maybe there is still a way to cheat with the Gemm instruction (it is the only instruction I found which take a scalar attribute and do his product with a tensor). But I would need some help to create the 1x1 tensor input matrices.

@jeremycochoy jeremycochoy force-pushed the feature/add-celu-function branch 3 times, most recently from cb02fd5 to 8068de6 Compare February 1, 2020 13:14
@wschin
Copy link
Contributor

wschin commented Feb 1, 2020

Thanks @TMVector and @linkerzhang for your feedback.

  • Regarding the Tensor/Scalar issue raised by TMVector, I can say that the following code pass the shape inference test. But I don't know if it is enough to say if everything is fine if the second argument is a scalar and not a tensor (I don't know if this 1.f isn't converted to a tensor implicitly).
          {// nodes: {outputs, op, inputs, attributes}                                                           
            FunctionBodyHelper::NodeDef{{"alpha"}, "Constant", {}, {{"value", 1.f}}},                                        
            {{"X_alpha"},
             "Div",
             {"X", "alpha"}
            },
            {{"Y"}, "Elu", {"X_alpha"}}})));
  • Regarding the second problem (using the actual attribute):

I made some attempt to create a constant node that recover the alpha attribute from the Celu operator using AtributeProto. Although the code compile, the shape inference test is a huge failure. In order to understand what is happening, I simplified the body of the function.
If I run the shape inference test with the following body, I get a nice "Y" of empty shape.

FunctionBodyHelper::Const<float>("Y", 1.0f)
E             name: "Y"
E             type {
E               tensor_type {
E                 elem_type: 1
E                 shape {
E                 }
E               }
E             }

but if I try to run the shape inference test with the attribute (see code below), then no "Y" is inferred at all.

FunctionBodyHelper::NodeDef{{"Y"}, "Constant", {}, {MakeRefAttribute("value", AttributeProto::FLOAT, "alpha")}}
E       AssertionError: ({'X', 'Y'}, {'X'})
E       assert {'X', 'Y'} == {'X'}
E         Extra items in the left set:
E         'Y'
E         Use -v to get the full diff

On my side I am stuck. Looking at Const and ToVector implementation didn't gave me any new idea to test. Do you have any idea of what is happening? Is it related to this Tensor/Scalar problem? 🙃

As described here, the value attribute should be an tensor, not a float.

Unfortunately, after hours digging documentation and code, I can't figure a way to convert a scalar (from MakeRefAttribute("alpha", Attribute\ Proto::FLOAT)) to a tensor constant.
I left a comment pointing the the problematic line in the body of the function.

PS: If this is not possible, then maybe there is still a way to cheat with the Gemm instruction (it is the only instruction I found which take a scalar attribute and do his product with a tensor). But I would need some help to create the 1x1 tensor input matrices.

I will try something on my side. In the meanwhile, what do you think if we make alpha an input?

@linkerzhang
Copy link
Member

I think "I can't figure a way to convert a scalar (from MakeRefAttribute("alpha", Attribute\ Proto::FLOAT)) to a tensor constant" needs to be fixed. Logically, the body graph is referring an attribute outside (which should be an AttributeProto) and the "Constant" OP will use the attribute and output a Tensor.

@jeremycochoy
Copy link
Contributor Author

@wschin Personally, I really don't mind moving alpha as an input. But it may be very confusing for both developper of user if CELU and ELU have completely different interface. If this approach get merged, it also means supporting it for a long time. 😅

@linkerzhang Would be awesome. I think anyone who implement a new function op that do not directly forward its arguments will end up having the exact same problem, and a clean way to move the scalars into the graph would solve this. Do you have any idea on how this could be archived?

@linkerzhang
Copy link
Member

@jeremycochoy The AttributeProto itself was designed to support this kind of reference already, though the utility function is missing, I guess.

PR #2583 should resolve it.

MakeRefAttribute("value", "alpha", AttributeProto::FLOAT) for the Constant node in the function body.

@TMVector
Copy link
Contributor

TMVector commented Feb 3, 2020

@linkerzhang I think that will work if the alpha attribute is a tensor, but ideally it would be a naked float. Maybe Constant should be changed to promote non-tensor values to scalar tensors?

@jeremycochoy
Copy link
Contributor Author

jeremycochoy commented Feb 3, 2020

@jeremycochoy The AttributeProto itself was designed to support this kind of reference already, though the utility function is missing, I guess.

PR #2583 should resolve it.

MakeRefAttribute("value", "alpha", AttributeProto::FLOAT) for the Constant node in the function body.

@linkerzhang
Isn't essentially the same thing of https://github.com/onnx/onnx/pull/2575/files#diff-8073bde925403bcdfa7d23c68d914d97 present in the current PR? (although your ordering of arguments feel more natural to me)

@wschin
Copy link
Contributor

wschin commented Feb 3, 2020

@linkerzhang I think that will work if the alpha attribute is a tensor, but ideally it would be a naked float. Maybe Constant should be changed to promote non-tensor values to scalar tensors?

We might need to support floats in addition to float. The fundamental cause here is that Attribute and Graph use different numerical type systems. Attribute has float, floats, and tensor. Graph only has tensor. I think changing Constant will be a nice and small change to bridge these two systems -- because Attribute is always a constant in graphs.

@TMVector, @jeremycochoy, any comments?

@jeremycochoy
Copy link
Contributor Author

@linkerzhang I think that will work if the alpha attribute is a tensor, but ideally it would be a naked float. Maybe Constant should be changed to promote non-tensor values to scalar tensors?

We might need to support floats in addition to float. The fundamental cause here is that Attribute and Graph use different numerical type systems. Attribute has float, floats, and tensor. Graph only has tensor. I think changing Constant will be a nice and small change to bridge these two systems -- because Attribute is always a constant in graphs.

@TMVector, @jeremycochoy, any comments?

To me it sounds the best solution, and it definitively makes sense to convert both float and floats to their corresponding tensors.

@linkerzhang
Copy link
Member

linkerzhang commented Feb 4, 2020

@jeremycochoy yep, it's the same as the one in current PR (I missed the part).

One more solution is removing AttributeProto and having TensorProto be used to store Attribute, to unify the two type systems (one tensor type system and one attribute type system).

AttributeProto and its type system were designed at the very beginning, to introduce a simpler way of having "scalar" attribute data. However, it introduces many troubles when specifying operator spec. For example, it's really hard to specify when attribute needs to share the same type as input or output (now most cases are using "Tensor" attribute type).

This PR reminds me again that the benefit of AttributeProto and its type system is not that much, while many troubles introduced.

I'd suggest to remove them and have only one type system in ONNX.

@gramalingam @wschin @jeremycochoy @TMVector What do you think please?

@@ -58,4 +58,15 @@ AttributeProto MakeRefAttribute(
return a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this function to call the overridden one added below please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think you can just merge your PR and I can rebase this branch on top of it. I remember you introduced documentation, your ordering of arguments sounds more natural to me, and I have nothing against splitting PRs in smaller piece.

Edit: I rebased on top of your commit. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I abandoned my PR this morning (realized it's duplicate with changes in your PR). Let me get it back and merge it in this way :)

Comment on lines 2221 to 2222
{{"Elu_Result"}, "Elu", {"X_alpha"}, {{"alpha", 1.f}}},
{{"Y"}, "Mul", {"alpha", "Elu_Result"}}})));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be equivalent to pass Celu.alpha to Elu.alpha instead of setting Elu.alpha=1 and then multiplying by Celu.alpha?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. 😅 (Because the alpha is applied only on the second member of the + operator in the Elu equation).

Explanations from wschin: #2575 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, quite right.

Btw should Celu be defined in onnx/defs/math/defs.cc? -- that's where Relu, Elu, etc. are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it would make more sense to place it net to relu / elu / leaky relu. I will change it this evening.

Done :)

@linkerzhang linkerzhang merged commit c978d10 into onnx:master Feb 16, 2020
@chinhuang007 chinhuang007 added this to the 1.7 milestone Feb 19, 2020
@codemzs codemzs mentioned this pull request Apr 26, 2020
@fdwr
Copy link
Contributor

fdwr commented May 26, 2020

@jeremycochoy : This is an excellent description for a new operator (explaining why it's being added, where it came from, the actual equation used, and even an alternate Python implementation), and I'll point to it as a good example in the future. 👍

@PallHaraldsson
Copy link

PallHaraldsson commented Jun 14, 2020

Thanks, I had no idea about the useful CELU. Is there a reason to use min? I mean max only (for ReLU) is natural, implying one test, and while having both implies two, and I wouldn't trust a compiler to know only one test and branch needed, and if you don't exclude it, then you always have to calculate the slow exp (I'm sure it could be faster, i.e. both implementations could be optimized more), and that it can be 583 times slower:

julia> x = 1.1; α = 1.0

julia> ONNX_CELU(x, α)=max(0,x/α)+min(0,(exp(x/α)-1))

julia> @btime ONNX_CELU($x, $α);  # only add $ for @btime (not @time) that requires: using BenchmarkTools
  13.999 ns (0 allocations: 0 bytes)

julia> ONNX_CELU(x, α)=if x >= zero(x) x/α else exp(x/α)-1 end

julia> @btime ONNX_CELU($x, $α);
  0.024 ns (0 allocations: 0 bytes)

julia> x = -1.1

julia> @btime ONNX_CELU($x, $α);
  11.627 ns (0 allocations: 0 bytes)

Yes, only 20% faster on the other side, and maybe with values all over the place (ca. 50-50% split?) it's not as useful an optimization as I would think?

@jeremycochoy
Copy link
Contributor Author

Hi @PallHaraldsson

To be honest, I am efraid that if you want to have something really optimized you'd need the backend to provide a specific implementation for celu that replace this graph function (this is 100% possible and any backend can chose th implementation that fits it's specific targeted hardware).

@PallHaraldsson
Copy link

PallHaraldsson commented Jun 16, 2020

Yes, and FYI, I found an even better activation function (in part since it's also continuously differentiable, why better than ReLU):

Mish: A Self Regularized Non-Monotonic Neural Activation Function
https://arxiv.org/pdf/1908.08681v1.pdf

In Tensorflow[11], the function definition of Mish can be written as x * tf.math.tanh(tf.softplus(x)) while in Torch[12] it is x * torch.tanh(F.softplus(x)). For improved results over ReLU, it is advised to use a slightly lower learning rate for Mish.

It's also compared to ELU and some variant (while not CELU).

Also interesting, and supposed advantages contrary to those supposed above (such as bounded at low):

PLU: The Piecewise Linear Unit Activation Function
https://arxiv.org/abs/1809.09534

I implemented it like this for fewer assembly instructions (and only one branch):

julia> function PLU(x)
         stripped=abs(x)
         s=sign(x)
         if stripped <= 1.0
           return x
         else
           return 0.1*(x-s)+s
         end
       end

julia> @code_native PLU(1.0)  # to see assembly. I always get 0.024 ns by timing with @btime

And if you're interested, a very simple idea here (using two "opposite", but similar, I wander if you could do similar for two dissimilar, e.g. those above?):

https://arxiv.org/pdf/1709.04054.pdf

We propose a simple extension to the ReLU-family of activation functions that allows them to shift the mean activation across a layer towards zero. Combined with proper weight initialization, this alleviates the need for normalization layers. We explore the training of deep vanilla recurrent neural networks (RNNs) with up to 144 layers, and show that bipolar activation functions help learning in this setting. On the Penn Treebank and Text8 language modeling tasks we obtain competitive results, improving on the best reported results for non-gated networks.

jcwchen pushed a commit to jcwchen/onnx that referenced this pull request Sep 23, 2020
* Implement CELU node as a Function

* Add shape inference test

* Update onnx/defs/nn/defs.cc

Co-Authored-By: Jonny Shipton <tmvector@gmail.com>

* Update onnx/test/shape_inference_test.py

Co-Authored-By: Jonny Shipton <tmvector@gmail.com>

* Set operator version to 12

* ?

* WIP. But the constant node can't be shape infered.

* Rewrite correct implementation based on equation instead of Elu

* Fix parentesis in formula

* wschin suggestions from onnx#2583 PR

* Fix a bug in inferene code and simplify graph

* Fix typo in Celu test

* Udapte docs

* Move Celu operator next to Elu (math/defs.cc)

Co-authored-by: Jonny Shipton <tmvector@gmail.com>
Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants