This is a Keras implementation of "Gradient Acceleration in Activation Functions (GAAF)". This repository includes ResNet_v1 and ResNet_v2 implementation to test and compare GAAF with original activation functions.
"Gradient Acceleration in Activation Functions" proposes a new technique for activations functions, gradient acceeration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region.
- Python 3.x
- Keras (Tensorflow backend)
This repository use Cifar10 dataset. When you run the training script, the dataset will be automatically downloaded.
This repository supports two GAAF: GAAF_relu and GAAF_tanh. For the GAAF_relu, It uses shifted sigmoid function as shape function and you can set the shift parameter. For the GAAF_tanh, It uses modified Gaussian function with a peak point at y=1.
To change activation function, you can set activation
on main.py
with GAAF_relu
, GAAF_tanh
, and relu
.
You can train and test with base CNN model listed below.
- ResNet_v1 (e.g. ResNet20, ResNet32, ResNet44, ResNet56, ResNet110, ResNet164, ResNet1001)
- ResNet_v2 (e.g. ResNet20, ResNet56, ResNet110, ResNet164, ResNet1001)
You can simply train a model with main.py
.
- Set depth for ResNet model
- e.g.
depth=20
- e.g.
- Define a model you want to train.
- e.g.
model = resnet_v1.resnet_v1(input_shape=input_shape, depth=depth, activation=activation)
- e.g.
- Set other parameter such as batch_size, epochs, data_augmentation and so on.
- Run the
main.py
file- e.g.
python main.py
- e.g.
I conducted some experiments on ResNet20_v1 by replacing the original activation (relu) with GAAF_relu and the results is described as below.
num | data | backbone | activation | shift | steps | acc | batch_size | optimizer | lr |
---|---|---|---|---|---|---|---|---|---|
baseline | cifar10 | resnet20_v1 | relu | - | 200 | 0.8084 | 128 | adam | 0.001 |
ex1 | cifar10 | resnet20_v1 | GAAF_relu | 5 | 200 | 0.6517 | 128 | adam | 0.001 |
ex2 | cifar10 | resnet20_v1 | GAAF_relu | 4 | 200 | 0.6924 | 128 | adam | 0.001 |
ex3 | cifar10 | resnet20_v1 | GAAF_relu | 3 | 200 | 0.7042 | 128 | adam | 0.001 |
ex4 | cifar10 | resnet20_v1 | GAAF_relu | 2 | 200 | 0.7441 | 128 | adam | 0.001 |
ex5 | cifar10 | resnet20_v1 | GAAF_relu | 1 | 200 | 0.78 | 128 | adam | 0.001 |
ex6 | cifar10 | resnet20_v1 | GAAF_relu | 0 | 200 | 0.7886 | 128 | adam | 0.001 |
ex7 | cifar10 | resnet20_v1 | GAAF_relu | -0.5 | 200 | 0.7945 | 128 | adam | 0.001 |
ex8 | cifar10 | resnet20_v1 | GAAF_relu | -1 | 200 | 0.7948 | 128 | adam | 0.001 |
ex9 | cifar10 | resnet20_v1 | GAAF_relu | -2 | 200 | 0.78 | 128 | adam | 0.001 |
ex10 | cifar10 | resnet20_v1 | GAAF_relu | -3 | 200 | 0.7768 | 128 | adam | 0.001 |
ex11 | cifar10 | resnet20_v1 | GAAF_relu | -4 | 200 | 0.7733 | 128 | adam | 0.001 |
ex12 | cifar10 | resnet20_v1 | GAAF_relu | no s(x) | 200 | 0.6603 | 128 | adam | 0.001 |
ex13 | cifar10 | resnet20_v1 | GAAF_relu(K.round) | -1 | 200 | 0.8054 | 128 | adam | 0.001 |
GAAF_relu actually did not give any improvement for every experiments.
At the beginning of the experiment, I thought shift=4
will give the best performance as shifted sigmoid shape is very similar with the shape function that GAAF paper suggested.
However, shift=-1
gives the best performance but still lower than the baseline.
Interesting thing is, when I changed mut-tf.floor(mut)
with K.abs(mut-K.round(mut))
, it gives the best performance but less than the baseline. (Keras backend function does not have floor operation, so I attempted to use K.round
)
If there is any implementation error in this repository, please let me know.
- Blog: Gradient Acceleration in Activation Functions
- Repository: CBAM-keras
- Paper: Gradient Acceleration in Activation Functions
- Repository: Cifar10 ResNet example in Keras
Byung Soo Ko / kobiso62@gmail.com