New and better implementation of activations #6

PallHaraldsson · 2020-06-17T17:11:53Z

No description provided.

PallHaraldsson · 2020-06-17T17:18:31Z

[max (for ReLU) is surprisingly short (fast) assembly code, I'm a bit puzzled why equivalent if is longer code (while @btime showed same speed, but not important as I'm not changing that).]

But when you have min and max together (for CELU), I timed my implementation at least 20% faster, and for one side the original 583 times slower:

onnx/onnx#2575 (comment)

Maybe I'm missing something, and min and max are better for say GPUs, for some dataflow reasons? Maybe you can time it? I'm not even sure you support GPUs, and then my shouldn't be slower. Mish could be optimized, I just have the definition there.

PallHaraldsson · 2020-06-17T17:41:16Z

Is Mish completely useless without a derivative? It's here on page 2:

https://arxiv.org/pdf/1908.08681v1.pdf

I just did most basic plu (not parameterized, could be easily added):
https://arxiv.org/pdf/1809.09534.pdf

Maybe credits need to be in the code, or ok here only in the PR? Move to top comment before you merge?

PallHaraldsson · 2020-06-17T17:58:24Z

Do you know, I was thinking of ways to speed up functions, so are extreme values common as inputs for activation functions?

They don't even have to be that common, I can approximate with the identity function for high, even exact for higher then 9.0 (and use 0.0 for all low values):

julia> mish(9f0)-9f0
0.0f0

julia> mish(19.1)-19.1
0.0

sylvaticus · 2020-06-18T07:56:05Z

Thank you!
I think it would be interesting to put the references as comments in the code and parametrise the plu function with α and c (using 0.1 and 1 as defaults). As people can use the defaults, it doesn't add any complicatedness, while being more flexible for interested users.
It would also be great to have dcelu, dmish, dsoftplus, as it would allow people to use them without having to rely to AD.
Let me know if you prefer doing this or I merge this pull request and then I'll do, as you prefer.
By the way I want change the name of the derivatives from dfunction to ∇function`, but I'll do this later for all the functions at the same time..

PallHaraldsson · 2020-06-18T17:39:26Z

without having to rely to AD

ok, good to know, works as is, with a downside. I mean I know what AD is, somewhat, just not sure exactly. Assume a runtime cost, and symbolic better if you have it.

I tried for fun:
https://www.derivative-calculator.net/

and it doesn't handle softplus(x) and more, but e.g. gave sech(x)^2 so I timed it (for some inputs no change, or hopefully never slower, or I assume it wouldn't be in the standard library):

julia> x = 10.0; @btime dtanh($x);
  35.581 ns (0 allocations: 0 bytes)

julia> f(x)=sech(x)^2; @btime f($x);
  30.643 ns (0 allocations: 0 bytes)

You didn't answer if used on GPUs, I don't know about tanh and sech there, but these are never in hardware that I know of, while tanh is pretty common for ANNs, so who knows for that only? I hate to be optimizing, but making slower for GPUs.

I'm just busy right now, so more, like dmish, will have to wait.

PallHaraldsson · 2020-06-18T18:37:01Z

Your call what do do for casing, e.g. Plus, and other upper casing inconsistencies: softMax, dSoftMax, [..] softplus

I also add spaces myself where I think clearer, and also changed dSoftMax (including visible docs, but then got second thoughts about that, if you skip spaces intentionally.

sylvaticus · 2020-06-19T08:09:48Z

Hello, thank you for your contribution.
I am accepting this pull request, but I will then gonna make some changes:

in softmax the default parameter should be β=one(x[1]) or β=one.(x) (it doesn't compile like this)
i prefer optional parameters to remain keyword arguments
celu doesn't return the same output for a alpha parameter different than 1, e.g. celu(2,α=2)
plu doesn't return the same output for a parameter different than 0.1, e.g. plu(4,α=10)

New and better implementation of activations

7d17c02

Fix and derivative for plu only

d7f4dde

Faster dtanh and more other general

937e409

Derivative and credits

209d26f

sylvaticus merged commit 685db0e into sylvaticus:master Jun 19, 2020

PallHaraldsson deleted the patch-1 branch June 22, 2020 18:00

PallHaraldsson mentioned this pull request Jul 9, 2020

Why does yolov4-tiny use leaky instead of mish? AlexeyAB/darknet#6178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New and better implementation of activations #6

New and better implementation of activations #6

PallHaraldsson commented Jun 17, 2020

PallHaraldsson commented Jun 17, 2020 •

edited

Loading

PallHaraldsson commented Jun 17, 2020

PallHaraldsson commented Jun 17, 2020 •

edited

Loading

sylvaticus commented Jun 18, 2020

PallHaraldsson commented Jun 18, 2020

PallHaraldsson commented Jun 18, 2020

sylvaticus commented Jun 19, 2020

New and better implementation of activations #6

New and better implementation of activations #6

Conversation

PallHaraldsson commented Jun 17, 2020

PallHaraldsson commented Jun 17, 2020 • edited Loading

PallHaraldsson commented Jun 17, 2020

PallHaraldsson commented Jun 17, 2020 • edited Loading

sylvaticus commented Jun 18, 2020

PallHaraldsson commented Jun 18, 2020

PallHaraldsson commented Jun 18, 2020

sylvaticus commented Jun 19, 2020

PallHaraldsson commented Jun 17, 2020 •

edited

Loading

PallHaraldsson commented Jun 17, 2020 •

edited

Loading