#### Sigmoid Function

#### ReLU and Leaky ReLU

#### Softmax Function

Given a real-valued vector $( \mathbf{z} = [z_1, z_2, \dots, z_K] )$, the Softmax function converts it into a probability distribution $( \mathbf{p} = [p_1, p_2, \dots, p_K] )$, where each $( p_i )$ represents the probability corresponding to class $( i )$.

The Softmax function is defined as:

$$[
p_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} \quad \text{for } i = 1, 2, \dots, K
]$$

Where:

- $( K )$ is the number of classes.
- $( z_i )$ is the $( i )$-th component of the input vector.
- $( p_i )$ is the probability corresponding to $( z_i )$.

#### Explanation and Properties

1. **Exponential Function**: The Softmax function first applies an exponential transformation to each input $( z_i )$, converting all input values into positive numbers. Since the exponential function is monotonic, larger input values will remain relatively larger after the exponential transformation, while smaller input values will become relatively smaller.

2. **Normalization**: All exponentiated values are summed to create a normalization factor, and each exponentiated value is divided by this sum. This step ensures that the output probability distribution sums to 1.

3. **Output Range**: The output of the Softmax function is a probability vector, where each element is bounded between 0 and 1. This makes Softmax well-suited for representing class probabilities in multi-class classification problems.

4. **Class Competition**: A critical property of the Softmax function is its ability to amplify the differences between input values. If one input $( z_i )$ is significantly larger than the others, then $( p_i )$ will approach 1, while the other $( p_j )$ values will approach 0. This property makes Softmax effective for selecting the most likely class in multi-class classification problems.


In [3]:
import math

def softmax(scores):
	probabilities = []
	sum_exp_scores = sum([math.exp(score) for score in scores])
	for score in scores:
		probabilities.append(math.exp(score)/sum_exp_scores)
	return probabilities

scores = [1, 2, 3]
# output: [0.0900, 0.2447, 0.6652]
print(softmax(scores))

[0.09003057317038046, 0.24472847105479767, 0.6652409557748219]


#### Log Softmax
In machine learning and statistics, the softmax function is a generalization of the logistic function that converts a vector of scores into probabilities. The log-softmax function is the logarithm of the softmax function, and it is often used for numerical stability when computing the softmax of large numbers.
Given a real-valued vector $( \mathbf{z} = [z_1, z_2, \dots, z_K] )$, the Log Softmax is defined as:

$$
\text{LogSoftmax}(z_i) = \log \left( \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} \right)
$$

This can be further simplified as:

$$
\text{LogSoftmax}(z_i) = z_i - \log \left( \sum_{j=1}^{K} e^{z_j} \right)
$$

In [None]:
import numpy as np

def log_softmax(scores: list) -> np.ndarray:
	# Subtract the maximum value for numerical stability
	scores = scores - np.max(scores)
	return scores - np.log(sum(np.exp(scores)))