In this Notebook , I will try to cover all basic activation functions used in neural networks. As we all know NN without activation functions is just a linear regression model.

**The Functions that I have covered are following**

* Binary Step
* Linear
* Sigmoid
* Tanh
* ReLU
* Leaky ReLU
* Parameterised ReLU
* Exponential Linear Unit
* Swish
* Softmax


In [1]:
# 1. Binary step
# If the input to activation function is greater than a threshold then the neuron is activated else not.
# Useful only in creating binary classifier.

def binary_func(x):
    
    if x < 0:
        return 0
    else:
        return 1



In [2]:
binary_func(-2)

0

In [3]:
binary_func(5)

1

In [4]:
# 2. Linear function
# It can be used as an alternative to Binary step, also the gradient of Linear function is not zero.

def Linear_func(k,x):
    
    return k*x 

In [5]:
Linear_func(2,3)

6

In [6]:
# 3. Sigmoid function
# It transforms any value between the range 0 and 1 and is given as
# f(x) = 1/(1 + e^-x)

import numpy as np

def Sigmoid_func(x):
    
    return 1/(1 + np.exp(-x))


In [7]:
Sigmoid_func(9)

0.9998766054240137

In [8]:
# 4. Tanh function
# It is similar to tanh but is not symmetrical around origin
# It transforms any value between the range -1 and 1 and is given as
# tanh(x)=2sigmoid(2x)-1

def Tanh_func(x):
    
    return 2*Sigmoid_func(2*x)-1



In [9]:
Tanh_func(0.5)

0.4621171572600098

In [10]:
# 5. ReLU(Rectified Linear Unit) function
# It takes the maximum between 0 and input value, that is the neurons will be deactivated when the input is negative.

def Relu_func(x):
    
    return max(0,x)


In [11]:
Relu_func(4)

4

In [12]:
Relu_func(-1)

0

In [13]:
# 6. Leaky ReLU
# It is similar to the ReLU function despite the fact that it handles negative values too.

def Leaky_ReLU_func(x):
    
    if x < 0:
        return 0.01*x
    else:
        return x


In [14]:
Leaky_ReLU_func(3)

3

In [15]:
Leaky_ReLU_func(-3)

-0.03

In [16]:
# 7. Parameterised ReLU
# It takes care of the negative values by passing a parameter and handles the gradient from becoming zero.
# The value of parameter is also trainable hence it is more optimum.

def parameterised_ReLU(a,x):
    
    if x<0:
        return a*x
    else:
        return x

In [17]:
parameterised_ReLU(3,-2)

-6

In [18]:
# 8. Exponential Linear Unit
# It uses a log curve for defining the negative values

import numpy as np

def elu_function(x, a):
    if x<0:
        return a*(np.exp(x)-1)
    else:
        return x

In [19]:
elu_function(5, 0.1)

5

In [20]:
elu_function(-5, 0.1)

-0.09932620530009145

In [21]:
# 9. Swish function
# It is as efficient as ReLU.
# The value ranges from negative infinity to infinity

def swish_function(x):
    return x/(1-np.exp(-x))



In [22]:
swish_function(4)

4.074629441455096

In [23]:
swish_function(-3.5)

0.10898180740229926

In [24]:
# 9. Softmax function
# It is made up of multiple sigmoid functions and returns the value between 0 and 1.
# It is usually used to denote the probability of data point belonging to each class.

def softmax_function(x):
    z = np.exp(x)
    res = z/z.sum()
    return res

In [25]:
softmax_function([1.8,0.3,4])

array([0.09757865, 0.02177274, 0.88064861])

Conclusions:
* ReLU is a good choice to use in hidden layers.
* In the case of dead neurons, Leaky ReLU is the optimal choice.
* Sigmoid functions work better in case of classifiers.