## Activation Function 
- activation function in a neural network determines the output of a neuron given a set of inputs. 
- it introduces non-linearity to the network, enabling it to learn complex relationships and patterns in data. 
- a neural network without an activation function is essentially just a linear regression model. 
- thus we use non linear tranformation to the inputs of the neuron and this non-linearity in the network is introduced by an activation function. 


## 1. Binary Step Function
- the first thing that comes to our minds when we have an activation function would be a threshold base classifier i.e whether or not the neuron should be activated based on the value from the linear transformation 
>
f(x) = 1, x>=0
     = 0, x<0

In [14]:
def binary_step(x):
    if x < 0:
        return 0 
    else:
        return 1 
    

In [15]:
binary_step(5)

1

In [16]:
binary_step(-1)

0

## 2. Sigmoid function 
- The sigmoid activation squashes the input into the range between 0 and 1. 
- it is often used in the output layer of binary classification problems.
- however if suffers from vanishing gradient problems, limiting its effectiveness
>
f(x) = 1/(1+e^-x)

In [24]:
import numpy as np 
def sigmoid_func(x):
    z = (1 / (1 + np.exp(-x)))
    
    return z


In [25]:
sigmoid_func(212)

1.0

In [26]:
sigmoid_func(-23)

1.0261879630648827e-10

In [27]:
sigmoid_func(-22)

2.7894680920908113e-10

## 3. Tanh 
- is a function similar to sigmoid function. 
- the only difference is that it is symmetric around the origin.
- the range of values in this case is from -1 to 1. 
>
tanh(x)=2sigmoid(2x)-1
>
tanh(x) = 2/(1+e^(-2x)) -1

In [28]:
def tanh_function(x):
    z = (2 / (1 +np.exp(-2*x))) -1
    
    return z 

In [29]:
tanh_function(0.4)

0.379948962255225

In [30]:
tanh_function(-1)

-0.7615941559557649

>> usually tanh is preferred over the sigmoid function since it is zero centered and the gradients are not restricted to move in a certain direction 

## 4. ReLu 
- Rectified Linear Activation 
- it is the most widely used activation function today. 
- it outputs the input if it's positive and zero otherwise.
- it doesn't suffer from vanishing gradient issues for positive inputs, making it a good choice for training deep networks.
>
f(x) = max(0, x)

In [31]:
def relu_function(x):
    return max(0, x)


In [32]:
relu_function(24
             )

24

>  if you look at the negative side of the graph, you will notice that the gradient value is zero. 
> DUe to this reason, during backpropogation process, the weights and biases for some neurons are not updated.
> This can create dead neuronds which never get activated. This is taken care of by Leaky ReLU function 

## 5. Leaky ReLU
- is a variation of ReLU that allows a small, non-zero gradient for negative inputs, mitigaing the dead neuron problems
>
f(x)= 0.01x, x<0
    =   x, x>=0

In [33]:
def leaky_relu(x):
    if x < 0:
        return 0.01 * x 
    else:
        return x 
    

In [34]:
leaky_relu(343)

343

In [35]:
leaky_relu(-2424)

-24.240000000000002

## 6. Swish 
- is a lesser known activation function which was discovered by researchers at Google. 
- is a computationally efficient as ReLU and shows better performance than ReLU on deeper models. 
- the values for swish ranges from negative infinity to infinity.
>
f(x) = x*sigmoid(x)
f(x) = x/(1-e^-x)

In [36]:
def swish_function(x):
    z = x / (x / (1 + np.exp(-x)))
    
    return z

In [37]:
swish_function(234)

1.0

In [38]:
swish_function(-234)

4.2160792462083295e+101


## 7. Softmax 
- is often described as a combination of multiple sigmoids. 
- it can be used for multiclass classification problems. 
>
softmax(x_i) = exp(x_i) / sum(exp(x_j) for j in all classes)


In [41]:
def softmax_function(x):
    z = np.exp(x)
    z_ = z/z.sum()
    
    return z_

In [42]:
softmax_function([0.8, 1.2, 3.1])

array([0.08021815, 0.11967141, 0.80011044])