## Sigmoid

Reference: [Jurafsky SLP (2025)](https://web.stanford.edu/~jurafsky/slp3/) p. 79

Also called the logistic function, $\mathrm{sigmoid}$ smushes a real-valued input into the range $(0,1)$.

Named for the "s" shape it produces.

$$\mathbb{\sigma}(z) = \frac{1}{1 + \exp(-z)}$$

## Softmax

Reference: [Jurafsky SLP (2025)](https://web.stanford.edu/~jurafsky/slp3/) p. 140

Let $\mathbf{z} \in \mathbb{R}^d$ be an intermediate output vector where $d$ is the dimensionality or size of the vector. $\mathrm{softmax}$ is a function that normalizes $\mathbf{z}$ such that its values sum to 1.

$$\mathrm{softmax}(\mathbf{z}_i) = \frac{\exp(\mathbf{z}_i)}{\sum^{d}_{j=1}\exp(\mathrm{z}_j)}$$

In [1]:
import math

def softmax(z):
    denominator = sum([math.exp(z_j) for z_j in z])
    result = [math.exp(z_i) / denominator for z_i in z]
    return result

softmax([1, 2, 3])

[0.09003057317038045, 0.24472847105479764, 0.6652409557748219]

## Perplexity

## Cross-entropy

## P, R, F1

$$P = \frac{tp}{tp + fp}$$

$$R = \frac{tp}{tp + fn}$$

$$F_1 = \frac{2pr}{p + r}$$