# Activation Functions in Neural Networks

Activation functions are applied at the end of each neuron's computation to introduce non-linearity. Here's how they're typically represented and used.

## Where the Activation Function is Applied

In each neuron:
```
output = activation(weighted_sum + bias)
```
Where:
- `weighted_sum` = w1*x1 + w2*x2 + ... + wn*xn
- `bias` = b
- `activation(...)` = the non-linear function applied to the result

## ReLU (Rectified Linear Unit)
```
activation(x) = max(0, x)
```
**Use:** Hidden layers in most deep neural networks

## Leaky ReLU
```
activation(x) = x if x > 0, otherwise α * x  (α is a small constant like 0.01)
```
**Use:** Solves the “dying ReLU” issue by keeping gradients alive for negative inputs

## Sigmoid
```
activation(x) = 1 / (1 + exp(-x))
```
**Use:** Output layer for binary classification

## Tanh (Hyperbolic Tangent)
```
activation(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
```
**Use:** Hidden layers where zero-centered output is helpful

## Softmax
```
activation(xi) = exp(xi) / sum(exp(xj))  for all j
```
**Use:** Output layer for multi-class classification

Converts a vector of logits into a probability distribution.