## Activation Function

__What is Activation Function?__

It’s just a thing function that you use to get the output of node. It is also known as Transfer Function.

### 1. Sigmoid or Logistic Activation Function

The Sigmoid Function curve looks like a S-shape.

$$ f(x) = \frac{1}{1+e^{-x}} = \frac {e^x}{e^x+1} = \frac {1}{2} + \frac {1}{2}tanh( \frac {x}{2}) $$



<img src = "https://miro.medium.com/max/728/1*Xu7B5y9gp0iL5ooBj7LtWw.png">

- The range of the sigmoid function is from (-1 to 1).
- Therefore, it is especially __used for models where we have to predict the probability as an output.__
- The function is differentiable.
- The function is monotonic but function’s derivative is not.
- The logistic sigmoid function can cause a neural network to get stuck at the training time.
- The softmax function is a more generalized logistic activation function which is used for multiclass classification.

$$ f(x_i) = \frac {e^{x_i}}{\sum_{i}{e^{x_i}}}$$

### 2. Tanh or hyperbolic tangent Activation Function

tanh is also sigmoidal (s - shaped).

$$ tanh(x) = \frac {sinh(x)}{cosh(x)} = \frac {e^x - e^{-x}}{e^x + e^{-x}}$$



<img src = "https://miro.medium.com/max/893/1*f9erByySVjTjohfFdNkJYQ.jpeg">

- The range of the tanh function is from (-1 to 1). 
- The function is differentiable.
- The function is monotonic while its derivative is not monotonic.
- The tanh function is mainly used classification between two classes.
- Both tanh and logistic sigmoid activation functions are used in feed-forward nets.

### 3. ReLU (Rectified Linear Unit) Activation Function

$$ R(z) =  max(z,0) = \begin{cases} 
                        0 & \mbox{for } z < 0\\
                        z & \mbox {for } z \ge 0   
             \end{cases} $$ 
 
                    
<img src = "https://miro.medium.com/max/1050/1*XxxiA0jJvPrHEJHD4z893g.png">


- It is __used in almost all the convolutional neural networks.
- Range: [0, $\infty$]
- The function and its derivative both are monotonic.
- But the issue is that all the negative values become zero immediately which decreases the ability of the model to fit or train from the data properly.

### 4. Leaky ReLU

It is an attempt to solve the dying ReLU problem

$$ f(x) =  max(ax,x) = \begin{cases} 
                        ax & \mbox{for } x < 0\\
                        x & \mbox {for } x \ge 0   
             \end{cases} $$ 

The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so
When $a$ is not 0.01 then it is called __Randomized ReLU__.

<img src = "https://d1zx6djv3kb1v7.cloudfront.net/wp-content/media/2019/09/Deep-learning-25-i2tutorials.png">

- Range: [ -$\infty$, $\infty$ ]
- Both Leaky and Randomized ReLU functions are monotonic. 
- Also, their derivatives also monotonic in nature.


### Why derivative/differentiation is used ?

When updating the curve, to know in which direction and how much to change or update the curve depending upon the slope.That is why we use differentiation in almost every part of Machine Learning and Deep Learning.

<img src= "https://miro.medium.com/max/1050/1*p_hyqAtyI8pbt2kEl6siOQ.png">


<img src = "https://miro.medium.com/max/1050/1*n1HFBpwv21FCAzGjmWt1sg.png">

In [56]:
import numpy as np
import tensorflow as tf

v= [1,-3,3, 0, -0.5]

__sigmoid or logistic__ activation function

In [59]:
# calculate the sigmoid of a vector

def np_sigmoid(x):
    s = 1/(1+1/np.exp(x))
    return  s

print(np_sigmoid(v))


[0.73105858 0.04742587 0.95257413 0.5        0.37754067]


__tanh or hyperboic tangent__ activation function

In [38]:
def tanh(x):
    s = np.tanh(x)
    return  s

print(tanh(v))

[ 0.76159416 -0.99505475  0.99505475  0.         -0.46211716]


__ReLU__ activation function

In [43]:
def relu(x):
    s = np.maximum(x,0)
    return s

print(relu(v))

[1. 0. 3. 0. 0.]


__Leaky ReLU__ activation function

In [77]:
def leaky_ReLU(x):
    s = np.where(x > 0, x, x * 0.01)     
    return s

vv = np.array(v)
print (leaky_ReLU(vv))

[ 1.    -0.03   3.     0.    -0.005]


__softmax__ activation function

In [23]:
# calculate the softmax of a vector
def softmax(vector):
    e = np.exp(vector)
    return e / e.sum()
 
# define data
data = [1, 3, 2]
# convert list of numbers to a list of probabilities
result = softmax(data)
# report the probabilities
print(result)
# report the sum of the probabilities
print(sum(result))

[0.09003057 0.66524096 0.24472847]
1.0
