# Activation Functions in Neural Networks


- In neural networks, non-linear activation functions such as tanh, sigmoid, and ReLU are used to introduce non-linearity into the model, which allows the network to learn complex, non-linear relationships between the input and output variables.
- Without non-linear activation functions, a neural network would simply be a linear model.


![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Activation functions for Classification in Neural Network


*The output of your net activation function is actually linear in nature but very often linear functions will not be able to solve most of classification or regression problems so your activation functions offers non-linearity to your model.*

1. **<font color=blue>Sigmoid** is a non-linear function that maps any input value to a value between 0 and 1. The output of the sigmoid function is a non-linear transformation of the input, as the rate of change of the function is not constant.
2. **<font color=blue>Step function** is a piecewise constant function that jumps from one constant value to another at specific points, not a linear function as linear function has constant rate of change or slope.
3. **<font color=blue>ReLU** is a non-linear function that returns the input if it is positive, and returns zero if it is negative. The output of the ReLU function is also a non-linear transformation of the input, since the function is piecewise linear, with a zero gradient for negative values and a constant gradient of 1 for positive values. ReLu used in initial layers to preserve information. If we use sigmoid or tanh information might start getting clustered towards 0 and 1 in intial layers itself, so instead we use Relu in initial layers.
    
4. **<font color=blue>Tanh** is non-linear and maps any input value to a value between -1 and 1. The output of the tanh function is a non-linear transformation of the input, as the rate of change of the function is not constant.

## Sigmoid vs Softmax 

1. The output of softmax is a probability distribution over all possible classes, whereas the sigmoid function outputs a real number b/w 0 and 1 which can be interpreted as probability of +ve class. The softmax function ensures that the sum of probabilities of all possible classes is 1, while sigmoid does not have this property. This means that softmax outputs can be directly interpreted as class probabilities, while sigmoid outputs need to be thresholded to obtain binary predictions (by default 0.5 threshold to classify as 0 or 1).


2. In summary, sigmoid is used in binary classification problems, while softmax is used in multi-class classification problems. Softmax outputs a probability distribution over all possible classes, while sigmoid outputs a single probability value. Softmax ensures that the sum of probabilities of all possible classes is 1, while sigmoid does not.


3. Sigmoid tries to concentrate information towards 0 and 1. Not very flexible. But good for binary classification because we want output as either 0 or 1.



## Activation functions for Regression in Neural Network

- One of the most commonly used activation functions for this purpose is the **<font color=blue>Linear activation function**.The linear activation function is simply the identity function, which means that it returns the input value as is, without any transformation.
- The linear activation function is useful for regression tasks because it can output any real-valued number, which is necessary for predicting continuous variables. When the linear activation function is used in the output layer of a neural network, the network is said to be performing linear regression.
- Another commonly used activation function for regression tasks is the **<font color=blue>ReLU (Rectified Linear Unit) function**
- The ReLU function can also be used in the output layer for regression tasks, but it is typically used with a modified loss function, such as mean squared error or mean absolute error, to train the network.