### Cost / Loss Functions

**Mean Squared Error (MSE):**  
$$
C = \frac{1}{n} \sum_i (y_i - \hat{y}_i)^2
$$  
$$
\frac{\partial C}{\partial \hat{y}_i} = -\frac{2}{n}(y_i - \hat{y}_i)
$$

**MSE with $L_2$ regularization:**  
$$
C = \frac{1}{n} \sum_i (y_i - \hat{y}_i)^2 + \lambda \sum_j w_j^2
$$

**MSE with $L_1$ regularization:**  
$$
C = \frac{1}{n} \sum_i (y_i - \hat{y}_i)^2 + \lambda \sum_j |w_j|
$$

---

**Binary Cross-Entropy (Log Loss):**  
$$
C = -\frac{1}{n} \sum_i [y_i \log(\hat{y}_i) + (1 - y_i)\log(1 - \hat{y}_i)]
$$  
$$
\frac{\partial C}{\partial \hat{y}_i} = -\frac{1}{n} \left(\frac{y_i}{\hat{y}_i} - \frac{1 - y_i}{1 - \hat{y}_i}\right)
$$

---

**Softmax Cross-Entropy (Multiclass):**  
$$
\hat{y}_i = \frac{e^{z_i}}{\sum_k e^{z_k}}
$$  
$$
C = -\frac{1}{n} \sum_i \sum_j y_{ij} \log(\hat{y}_{ij})
$$  
$$
\frac{\partial C}{\partial z_{ij}} = \hat{y}_{ij} - y_{ij}
$$

---

### Activation Functions

**Sigmoid (Logit):**  
$$
\sigma(z) = \frac{1}{1 + e^{-z}}, \quad
\sigma'(z) = \sigma(z)(1 - \sigma(z))
$$

**ReLU:**  
$$
f(z) = \max(0, z), \quad
f'(z) =
\begin{cases}
1, & z > 0 \\
0, & z \le 0
\end{cases}
$$

**Leaky ReLU:**  
$$
f(z) =
\begin{cases}
z, & z > 0 \\
\alpha z, & z \le 0
\end{cases},
\quad
f'(z) =
\begin{cases}
1, & z > 0 \\
\alpha, & z \le 0
\end{cases}
$$