In [1]:
import tensorflow



In [2]:
import numpy as np

# 1. Sigmoid:
Pros:
The sigmoid activation function maps inputs to the range of 0 and 1, which makes it useful for binary classification problems.
The sigmoid function is differentiable, making it suitable for use in backpropagation.


Cons:
The sigmoid function saturates for large inputs, meaning that the gradients can become very small, leading to the vanishing gradient problem.
The sigmoid function outputs are not zero-centered, which can lead to slow convergence when using optimization algorithms such as gradient descent.

In [3]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

In [4]:
sigmoid(-2)

0.11920292202211755

# 2. ReLU:
Pros:
ReLU is computationally efficient as it simply involves thresholding the input.
ReLU can avoid the vanishing gradient problem that is encountered with the sigmoid activation function.


Cons:
ReLU can produce dead neurons, meaning that some neurons may produce zero outputs for all inputs, leading to a loss of information.
ReLU is not differentiable at 0, which can pose challenges for optimization algorithms such as gradient descent.

In [5]:
def relu(x):
    return np.maximum(0, x)

In [6]:
relu(3455)

3455

# 3. Tanh:
Pros:
The tanh function produces outputs with zero mean and unit variance, which can improve the performance of the network.
The tanh function is differentiable, making it suitable for use in backpropagation.


Cons:
The tanh function saturates for large inputs, meaning that the gradients can become very small, leading to the vanishing gradient problem.

In [7]:
def tanh(x):
    return np.tanh(x)

In [8]:
tanh(1)

0.7615941559557649

# 4. Softmax:
Pros:
Softmax activation is ideal for multiclass classification problems as it maps its inputs to a probability distribution over multiple classes.
Softmax is differentiable, making it suitable for use in backpropagation.


Cons:
Softmax is computationally expensive as it requires normalizing the exponential of the inputs, which can be computationally intensive for large inputs.
Softmax is sensitive to the scale of the inputs, meaning that large inputs can dominate the outputs, leading to a loss of information.

In [9]:
def softmax(x):
    exp_x = np.exp(x)
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# 5. Leaky ReLU:
Pros:
Leaky ReLU can avoid the dying neurons problem encountered with the ReLU activation function.
Leaky ReLU is differentiable, making it suitable for use in backpropagation.


Cons:
Leaky ReLU requires the choice of a hyperparameter, which determines the slope of the function for negative inputs.
Leaky ReLU is not as computationally efficient as the ReLU activation function.

In [10]:
def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

# 6. Swish:
Pros:
Swish has been shown to outperform ReLU on some tasks.
Swish is differentiable, making it suitable for use in backpropagation.


Cons:
Swish requires the evaluation of both the sigmoid function and the input, making it computationally more expensive than ReLU.
Swish requires the choice of a hyperparameter, which can be challenging to set.

In [11]:
def swish(x):
    return x * sigmoid(x)