## Activation/Kernel Functions Cheat Sheet

### Radial Basis Function (RBF)
- Use when: You need a non-linear transformation in SVMs or need a universal function approximator in neural networks.
- Pros: Can map data to infinite dimensions, good for handling non-linear classification.
- Cons: Requires selection of a suitable parameter for spread.

### Linear Function
- Use when: Your data is linearly separable (for SVMs) or you need a simple transformation of the input (for neural networks).
- Pros: Simple and computationally efficient.
- Cons: Cannot handle non-linear data or complex tasks.

### Polynomial Function
- Use when: You need a non-linear transformation that considers interactions between features (for SVMs) or you need a non-linear activation function (for neural networks).
- Pros: Can handle non-linear data, considers interactions between features.
- Cons: Computationally expensive, may lead to overfitting with high degrees.

### Sigmoid Function
- Use when: You need a smooth, differentiable function that maps input to the range (0, 1) (for neural networks) or you need a kernel that maps data to infinite dimensions (for SVMs).
- Pros: Smooth and differentiable, outputs have nice interpretation as probabilities (for neural networks).
- Cons: Can suffer from vanishing gradients problem (for neural networks).

### ReLU (Rectified Linear Unit)
- Use when: You need a simple, efficient non-linear function for a neural network.
- Pros: Simple and computationally efficient, helps mitigate the vanishing gradients problem.
- Cons: Can suffer from "dead neurons" where some neurons never activate.

### Tanh (Hyperbolic Tangent)
- Use when: You need a smooth, differentiable function that maps input to the range (-1, 1) for a neural network.
- Pros: Output is zero-centered, making it easier for the model to learn.
- Cons: Can still suffer from vanishing gradients problem.

### Softmax Function
- Use when: You need to output a probability distribution for a multi-class classification problem in a neural network.
- Pros: Outputs have a nice interpretation as probabilities, suitable for multi-class classification.
- Cons: Computationally more expensive than other activation functions.