# ACON激活函数复现


<center><img src="https://ai-studio-static-online.cdn.bcebos.com/3c3ae675554c4c949aaf6222326550107568506ac3384080b022663b48969f03"></center>
<br></br>

&emsp;&emsp;在此论文中作者提出了一个简单、有效的激活函数ACON，该激活函数可以决定是否要激活神经元，在ACON基础上作者进一步提出了激活函数，它通过引入开关因子去学习非线性（激活）和线性（非激活）之间的参数切换。实验结果表明，在图像分类，目标检测以及语义分割的任务上，该激活函数都可以使得深度模型有显著的提升效果。



[论文地址](https://arxiv.org/abs/2009.04759) [代码地址](https://github.com/nmaac/acon)

### Smooth Maximum（光滑最大值函数）



&emsp;&emsp;我们目前常用的激活函数本质上都是MAX函数，以ReLU函数为例，其形式可以表示为：
<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/0915cc5b0abe4526a162ac83c4bf380aef53002d35914eabbba1a63ae33ec5e0"></center>
<br></br>

而MAX函数的平滑，可微分变体我们称为Smooth Maximum，其公式如下：

<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/9b3cb6dc975946b889d949404827e8028258c6f382e1413fa151c74da6946f8a"></center>
<br></br>


这里我们只考虑Smooth Maximum只有两个输入量的情况，即n=2，于是有以下公式：

<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/6f7ba323c5f34205b9e230e54f0de47101388f54944247debabf1fc59361ea2c"></center>
<br></br>

考虑平滑形式下的ReLU![](https://ai-studio-static-online.cdn.bcebos.com/4e5a2b7b626d4e1d8d119b537a651facb2a9e5a0c7cf4ad791416e55df7a885a)，代入公式我们得到而这个结果![](https://ai-studio-static-online.cdn.bcebos.com/91d853a5d25b46558f73469d821ce01ca27f89cf4e824c7ca7a1822467607287)就是Swish激活函数！所以我们可以得到，Swish激活函数是ReLU函数的一种平滑近似。我们称其为ACON-A：


<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/8d099a4825a34b5e8f7769eadec68eefa6b15b8423434496849f4c954d28e800"></center>

<center><img src="https://ai-studio-static-online.cdn.bcebos.com/1b3c3236d9b54f5eb42ef3ab675122d56fd21b60ffac47f6b207fe499969fc1f"></center>

<center><img src="https://ai-studio-static-online.cdn.bcebos.com/b2ab1f5b070540f4b9efaf81524c40cd044b66dd34c448599d26e62cffca05cb"></center>
<br></br>


ACON-C的一阶导数计算公式如下所示：



<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/53c6d63f17e14e6abc8156dde49d667e18d049e2a0e745128533dbf83a422e4b"></center>
<br></br>



解上述方程可得：


<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/1d23a25b348a4b4397ee040ea3c2c8a0bb9c0ef3c07148b7a96d895f5170e912"></center>
<br></br>

可学习的边界对于简化优化是必不可少的，这些可学习的上界和下届是改善结果的关键。

## 基于飞桨框架的复现



* **一些API**介绍


`paddle.static.create_parameter`


该OP创建一个参数。该参数是一个可学习的变量, 拥有梯度并且可优化。


根据ACON的官方代码，我复现了Paddle版本的ACON-C如下所示。对比下来，基于飞桨框架的API更加简练一点，并且可以直接在API里指定初始化方式。实际上，各种初始化方式也有很多的，大家可以自行百度一下哦。

In [1]:
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr
from paddle.regularizer import L2Decay
from paddle.nn import AvgPool2D, Conv2D
import numpy as np

class AconC(nn.Layer):
    """ ACON activation (activate or not).
    # AconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is a learnable parameter
    # according to "Activate or Not: Learning Customized Activation" <https://arxiv.org/pdf/2009.04759.pdf>.
    """

    def __init__(self, width):
        super().__init__()
        
        self.p1 = paddle.create_parameter([1, width, 1, 1], dtype='float32', default_initializer=nn.initializer.Normal())
        self.p2 = paddle.create_parameter([1, width, 1, 1], dtype='float32', default_initializer=nn.initializer.Normal())
        self.beta = paddle.create_parameter([1, width, 1, 1], dtype='float32', default_initializer=paddle.fluid.initializer.NumpyArrayInitializer(np.ones([1, width, 1, 1])))

    def forward(self, x):
        return (self.p1 * x - self.p2 * x) * F.sigmoid(self.beta * (self.p1 * x - self.p2 * x)) + self.p2 * x


### 网络搭建

In [None]:
class dcn2(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(dcn2, self).__init__()

        self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)
        # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)
        # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)

        self.acon1 = AconC(64)
      

        self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)

        self.flatten = paddle.nn.Flatten()

        self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)
        self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        # x = self.pool1(x)
        # print(x.shape)
        x = self.conv2(x)
        x = F.relu(x)
        # x = self.pool2(x)
        # print(x.shape)

        x = self.conv3(x)
        x = self.acon1(x)
        # print(x.shape)
        
        # offsets = self.offsets(x)
        # masks = self.mask(x)
        # print(offsets.shape)
        # print(masks.shape)
        x = self.conv4(x)
        x = F.relu(x)
        # print(x.shape)

        x = self.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        return x

### 网络结构可视化

In [None]:
cnn3 = dcn2()

model3 = paddle.Model(cnn3)

model3.summary((64, 3, 32, 32))

---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
   Conv2D-1      [[64, 3, 32, 32]]     [64, 32, 32, 32]         896      
   Conv2D-2      [[64, 32, 32, 32]]    [64, 64, 15, 15]       18,496     
   Conv2D-3      [[64, 64, 15, 15]]     [64, 64, 7, 7]        36,928     
    AconC-1       [[64, 64, 7, 7]]      [64, 64, 7, 7]          192      
   Conv2D-4       [[64, 64, 7, 7]]      [64, 64, 4, 4]        36,928     
   Flatten-1      [[64, 64, 4, 4]]        [64, 1024]             0       
   Linear-1         [[64, 1024]]           [64, 64]           65,600     
   Linear-2          [[64, 64]]            [64, 1]              65       
Total params: 159,105
Trainable params: 159,105
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.75
Forward/backward pass size (MB): 27.13
Params size (MB): 0.61
Estimated Total Size (MB):

{'total_params': 159105, 'trainable_params': 159105}

In [2]:
class dcn3(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(dcn3, self).__init__()

        self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)
        # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)
        # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)

        self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)

        self.flatten = paddle.nn.Flatten()

        self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)
        self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)

        x = self.conv2(x)
        x = F.relu(x)
        # print(x.shape)

        x = self.conv3(x)
        x = F.relu(x)
        # print(x.shape)
        
        x = self.conv4(x)
        x = F.relu(x)
        # print(x.shape)

        x = self.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        return x

In [3]:
cnn4 = dcn3()

model4 = paddle.Model(cnn4)

model4.summary((64, 3, 32, 32))

---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
   Conv2D-1      [[64, 3, 32, 32]]     [64, 32, 32, 32]         896      
   Conv2D-2      [[64, 32, 32, 32]]    [64, 64, 15, 15]       18,496     
   Conv2D-3      [[64, 64, 15, 15]]     [64, 64, 7, 7]        36,928     
   Conv2D-4       [[64, 64, 7, 7]]      [64, 64, 4, 4]        36,928     
   Flatten-1      [[64, 64, 4, 4]]        [64, 1024]             0       
   Linear-1         [[64, 1024]]           [64, 64]           65,600     
   Linear-2          [[64, 64]]            [64, 1]              65       
Total params: 158,913
Trainable params: 158,913
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 0.75
Forward/backward pass size (MB): 25.59
Params size (MB): 0.61
Estimated Total Size (MB): 26.95
-------------------------------------------------------------------

{'total_params': 158913, 'trainable_params': 158913}

## Meta-ACON



前面我们有提到，ACON系列的激活函数通过$\beta$的值来控制是否激活神经元（$\beta$ 为0，即不激活）。因此我们需要为ACON设计一个计算 $\beta$ 的自适应函数:




<br></br>
<center><img src="https://ai-studio-static-online.cdn.bcebos.com/8e9dc6405c8e4d598b322cbcdc40f9354e0b9e0489914e8a8099fa951f1d444e"></center>
<br></br>


![](https://ai-studio-static-online.cdn.bcebos.com/aad65672f1c842d78307506774cbc4839a0dde295f2f483f8c040159a61ff8e3)


In [None]:
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr
from paddle.regularizer import L2Decay
from paddle.nn import AvgPool2D, Conv2D
import numpy as np

class MetaAconC(nn.Layer):
    r""" ACON activation (activate or not).
    # MetaAconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is generated by a small network
    # according to "Activate or Not: Learning Customized Activation" <https://arxiv.org/pdf/2009.04759.pdf>.
    """

    def __init__(self, width, r=16):
        super().__init__()
        self.fc1 = nn.Conv2D(width, max(r, width // r), kernel_size=1, stride=1)
        self.bn1 = nn.BatchNorm2D(max(r, width // r))
        self.fc2 = nn.Conv2D(max(r, width // r), width, kernel_size=1, stride=1)
        self.bn2 = nn.BatchNorm2D(width)

        self.p1 = paddle.create_parameter([1, width, 1, 1], dtype='float32', default_initializer=nn.initializer.Normal())
        self.p2 = paddle.create_parameter([1, width, 1, 1], dtype='float32', default_initializer=nn.initializer.Normal())

    def forward(self, x):
        beta = F.sigmoid(
            self.bn2(self.fc2(self.bn1(self.fc1(x.mean(axis=2, keepdim=True).mean(axis=3, keepdim=True))))))
            # self.bn2(self.fc2(self.bn1(self.fc1(x.mean().mean())))))
        return (self.p1 * x - self.p2 * x) * F.sigmoid(beta * (self.p1 * x - self.p2 * x)) + self.p2 * x

In [None]:
class dcn2(paddle.nn.Layer):
    def __init__(self, num_classes=1):
        super(dcn2, self).__init__()

        self.conv1 = paddle.nn.Conv2D(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding = 1)
        # self.pool1 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv2 = paddle.nn.Conv2D(in_channels=32, out_channels=64, kernel_size=(3,3),  stride=2, padding = 0)
        # self.pool2 = paddle.nn.MaxPool2D(kernel_size=2, stride=2)

        self.conv3 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 0)

        self.acon1 = MetaAconC(64)
      

        self.conv4 = paddle.nn.Conv2D(in_channels=64, out_channels=64, kernel_size=(3,3), stride=2, padding = 1)

        self.flatten = paddle.nn.Flatten()

        self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)
        self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        # x = self.pool1(x)
        # print(x.shape)
        x = self.conv2(x)
        x = F.relu(x)
        # x = self.pool2(x)
        # print(x.shape)

        x = self.conv3(x)
        x = self.acon1(x)
        # print(x.shape)
        
        
        x = self.conv4(x)
        x = F.relu(x)
        # print(x.shape)

        x = self.flatten(x)
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        return x

In [None]:
cnn3 = dcn2()

model3 = paddle.Model(cnn3)

model3.summary((64, 3, 32, 32))

![](https://ai-studio-static-online.cdn.bcebos.com/6b93219550d8468bbd74e44bd647e16f73f0c81c586f4f659dd27e05d13f223a)


# 总结

本教程主要关注于程序的具体复现，该激活函数的效果以及在各类计算机视觉任务上均未进行验证，大家可以根据自己的实际需求进行使用。对比二者的网络参数，使用了ACON-C激活函数会增加网络的参数量。在论文中提到的Meta-ACON将在下一个教程中为大家进行讲解。在本教程中，已经为大家演示了该激活函数的复现代码，以及如何应用在网络结构中。大家可以即插即用。