---
author: Zeel B Patel
badges: true
categories:
  - ML
date: "2025-04-24"
description: Preparing for ML Breadth and Depth for Interviews
title: Machine Learning Cheat Sheet
toc: true
---


## Classification


### Logistic Regression


| Entity   | Math             | Shape  | Type       |
| :------- | :--------------- | :----- | :--------- |
| Features | $X$              | (n, d) | Continuous |
| Weights  | $W$              | (d, 1) | Continuous |
| Bias     | $b$              | (1, 1) | Continuous |
| Output   | $\boldsymbol{y}$ | (n, 1) | Binary     |

\begin{align*}
\hat{\boldsymbol{y}} &= \sigma(XW + b) \\
&= \sigma(\underbrace{X}*{n \times d} \underbrace{W}_{d \times 1} + \underbrace{b}_{1}) \\
&= \sigma(\underbrace{z}_{n \times 1}) \\
\text{where } \sigma(z) &= \frac{1}{1 + e^{-z}}
\end{align*}

\begin{align*}
\text{Loss} = -\frac{1}{n} \sum_{i=1}^{n}
\begin{cases}
\log(\hat{y}_i) & \text{if } y_i = 1 \\
\log(1 - \hat{y}_i) & \text{if } y_i = 0
\end{cases}
\end{align*}


Another way to write the loss function is:

\begin{align*}
\text{Loss} = -\frac{1}{n} \sum\_{i=1}^{n} y_i \log(\hat{y}\_i) + (1 - y_i) \log(1 - \hat{y}\_i)
\end{align*}


### Multi-Class Logistic Regression
| Entity   | Math             | Shape  | Type       |
| :------- | :--------------- | :----- | :--------- |
| Features | $X$              | (n, d) | Continuous |
| Weights  | $W$              | (d, k) | Continuous |
| Bias     | $b$              | (1, k) | Continuous |
| Output   | $\boldsymbol{y}$ | (n, k) | Categorical |

\begin{align*}
\hat{\boldsymbol{y}} &= \text{softmax}(XW + b) \\
&= \text{softmax}(\underbrace{X}_{n \times d} \underbrace{W}_{d \times k} + \underbrace{b}_{1 \times k}) \\
&= \text{softmax}(\underbrace{z}_{n \times k}) \\
\text{where } \text{softmax}(z) &= \frac{e^{z_i}}{\sum_{j=1}^{k} e^{z_j}}
\end{align*}

\begin{align*}
\text{Loss} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{k} y_{ij} \log(\hat{y}_{ij})
\end{align*}

### Metrics
| Metric         | Formula                                                                 |
| :------------- | :--------------------------------------------------------------------- |
| Accuracy       | $\frac{TP + TN}{TP + TN + FP + FN}$                                   |
| Precision      | $\frac{TP}{TP + FP}$                                                   |
| Recall         | $\frac{TP}{TP + FN}$                                                   |
| F1 Score      | $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$            |
| False Positive Rate | $\frac{FP}{FP + TN}$                                                   |
| True Positive Rate | $\frac{TP}{TP + FN}$                                                   |
| ROC curve | Plot of True Positive Rate vs False Positive Rate (X-axis is FPR, Y-axis is TPR) |
| AUC-ROC | Area under the ROC curve (1 is perfect, 0.5 is random guessing, 0 is worst) |
| Matthews Correlation Coefficient | $\frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$ |
| Cohen's Kappa | $\frac{P_o - P_e}{1 - P_e}$ where $P_o$ is observed accuracy and $P_e$ is expected accuracy |

#### Accuracy

\begin{align*}
\text{Accuracy} &= \frac{\text{Correct Predictions}}{\text{Total Predictions}} \\
&= \frac{TP + TN}{TP + TN + FP + FN}
\end{align*}

#### Precision

\begin{align*}
\text{Precision} &= \frac{\text{True Positives}}{\text{Predicted Positives}} \\
&= \frac{TP}{TP + FP}
\end{align*}

#### Recall

\begin{align*}
\text{Recall} &= \frac{\text{True Positives}}{\text{Actual Positives}} \\
&= \frac{TP}{TP + FN}
\end{align*}

#### F1 Score

Harmonic mean of precision and recall.

\begin{align*}
\text{F1 Score} &= 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \\
&= \frac{2TP}{2TP + FP + FN}
\end{align*}

#### Matthews Correlation Coefficient
Similar to Pearson correlation coefficient, but for binary classification.

> The Matthews correlation coefficient (MCC) is pearson correlation coefficient between the observed and predicted binary classifications.

Derivation: (Homework)

#### Cohen's Kappa
Definition: Cohen's Kappa is a statistic that measures agreement between two rates while correcting for chance.
Range: -1 to 1
- 1: Total disagreement
- 0: Random agreement
- 1: Perfect agreement

Formula:

\begin{align*}
\text{Cohen's Kappa} &= \frac{P_o - P_e}{1 - P_e} \\
P_o &= \text{Accuracy} \\
P_e &= \sum_{i=1}^{k} \frac{\text{predicted}_i}{\text{total samples}} \cdot \frac{\text{actual}_i}{\text{total samples}}
\end{align*}