-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and better implementation of activations #6
Conversation
[max (for ReLU) is surprisingly short (fast) assembly code, I'm a bit puzzled why equivalent if is longer code (while But when you have min and max together (for CELU), I timed my implementation at least 20% faster, and for one side the original 583 times slower: Maybe I'm missing something, and min and max are better for say GPUs, for some dataflow reasons? Maybe you can time it? I'm not even sure you support GPUs, and then my shouldn't be slower. Mish could be optimized, I just have the definition there. |
Is Mish completely useless without a derivative? It's here on page 2: https://arxiv.org/pdf/1908.08681v1.pdf I just did most basic plu (not parameterized, could be easily added): Maybe credits need to be in the code, or ok here only in the PR? Move to top comment before you merge? |
Do you know, I was thinking of ways to speed up functions, so are extreme values common as inputs for activation functions? They don't even have to be that common, I can approximate with the identity function for high, even exact for higher then 9.0 (and use 0.0 for all low values):
|
Thank you! |
ok, good to know, works as is, with a downside. I mean I know what AD is, somewhat, just not sure exactly. Assume a runtime cost, and symbolic better if you have it. I tried for fun: and it doesn't handle softplus(x) and more, but e.g. gave sech(x)^2 so I timed it (for some inputs no change, or hopefully never slower, or I assume it wouldn't be in the standard library):
You didn't answer if used on GPUs, I don't know about tanh and sech there, but these are never in hardware that I know of, while tanh is pretty common for ANNs, so who knows for that only? I hate to be optimizing, but making slower for GPUs. I'm just busy right now, so more, like dmish, will have to wait. |
Your call what do do for casing, e.g. Plus, and other upper casing inconsistencies: softMax, dSoftMax, [..] softplus I also add spaces myself where I think clearer, and also changed dSoftMax (including visible docs, but then got second thoughts about that, if you skip spaces intentionally. |
Hello, thank you for your contribution.
|
No description provided.