#ML Cheat Sheet

| Algorithm Category | Algorithm | Key Techniques | Loss Function (LaTeX Formula) | Accuracy Metrics | Key Hyperparameters | Data Preparation/Scaling Techniques | Key Assumptions | Target Types | Example Use Cases |
|--------------------|-----------|----------------|-------------------------------|------------------|---------------------|-------------------------------------|-----------------|--------------|-------------------|
| **Traditional ML** | **Linear Regression** | - Ordinary Least Squares<br>- Gradient Descent<br>- Regularization (L1, L2) | $$\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$ | - R² Score<br>- Mean Absolute Error (MAE)<br>- Mean Squared Error (MSE) | - Learning rate<br>- Regularization strength (α)<br>- Fit intercept | - Handle missing values (imputation)<br>- Feature scaling (StandardScaler, MinMaxScaler)<br>- Remove outliers<br>- Encode categorical variables (OneHotEncoder) | - Linear relationship between features and target<br>- Homoscedasticity<br>- No multicollinearity | Numeric | - House price prediction<br>- Sales forecasting<br>- Energy consumption estimation |
| | **Logistic Regression** | - Maximum Likelihood Estimation<br>- Gradient Descent<br>- Regularization (L1, L2) | $$-\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right]$$ | - Accuracy<br>- Precision/Recall<br>- F1 Score<br>- ROC-AUC | - Learning rate<br>- Regularization strength (C)<br>- Max iterations<br>- Solver (e.g., lbfgs, saga) | - Handle missing values<br>- StandardScaler for numerical features<br>- Encode categorical variables<br>- Balance classes (SMOTE) | - Linearly separable classes<br>- Features are independent<br>- Binary outcomes | Binary Class | - Spam email detection<br>- Customer churn prediction<br>- Disease diagnosis |
| | **Multinomial Logistic Regression** | - Softmax Function<br>- Gradient Descent<br>- Regularization (L1, L2) | $$-\sum_{i=1}^n \sum_{c=1}^C y_{i,c} \log(\hat{y}_{i,c})$$ | - Accuracy<br>- Macro/Micro F1 Score<br>- Confusion Matrix | - Learning rate<br>- Regularization strength (C)<br>- Max iterations<br>- Solver | - Handle missing values<br>- StandardScaler or MinMaxScaler<br>- OneHotEncoder for categorical variables<br>- Balance multi-class data | - Linearly separable classes<br>- Features are independent<br>- Multi-class outcomes | Multi-class | - Handwritten digit recognition<br>- Sentiment analysis (positive/neutral/negative)<br>- Iris flower classification |
| | **Decision Trees** | - Gini/Entropy Splitting<br>- Pruning<br>- Feature Importance | Classification (Gini):<br>$$\sum_{c=1}^C p_c (1-p_c)$$<br>Regression (MSE):<br>$$\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$ | - Accuracy<br>- Precision/Recall<br>- F1 Score<br>- RMSE (regression) | - Max depth<br>- Min samples split<br>- Min samples leaf<br>- Max features | - Handle missing values<br>- No scaling needed (tree-based)<br>- Encode categorical variables<br>- Remove highly correlated features | - Features are independent<br>- Hierarchical decision boundaries<br>- No normality assumption | Numeric, Binary Class, Multi-class | - Credit risk assessment<br>- Customer segmentation<br>- Medical diagnosis |
| | **Support Vector Machine (SVM)** | - Kernel Trick<br>- Margin Maximization<br>- Regularization | Hinge Loss:<br>$$\sum_{i=1}^n \max(0, 1 - y_i (\mathbf{w} \cdot \mathbf{x}_i + b))$$ | - Accuracy<br>- Precision/Recall<br>- F1 Score<br>- ROC-AUC | - Regularization parameter (C)<br>- Kernel type (linear, rbf)<br>- Gamma (for rbf kernel)<br>- Max iterations | - Impute missing values<br>- StandardScaler (essential for SVM)<br>- Encode categorical variables<br>- Handle outliers | - Data is separable (with kernel)<br>- High-dimensional data<br>- Balanced classes preferred | Binary Class, Multi-class | - Text classification<br>- Image classification<br>- Fraud detection |
| | **Naive Bayes** | - Bayes’ Theorem<br>- Independence Assumption<br>- Probability Estimation | Log Loss:<br>$$-\sum_{i=1}^n \log P(y_i \mid \mathbf{x}_i)$$ | - Accuracy<br>- Precision/Recall<br>- F1 Score<br>- Confusion Matrix | - Smoothing parameter (alpha)<br>- Prior probabilities | - Handle missing values<br>- No scaling needed (probability-based)<br>- Encode categorical variables<br>- Convert numerical to categorical if needed | - Features are independent<br>- Features follow specific distributions (e.g., Gaussian)<br>- Small dataset size | Binary Class, Multi-class | - Spam filtering<br>- Document classification<br>- Sentiment analysis |
| **Ensemble Models** | **Random Forest** | - Bagging<br>- Feature Randomness<br>- Decision Trees | N/A (uses majority voting or averaging) | - Accuracy<br>- Precision/Recall<br>- F1 Score<br>- ROC-AUC | - Number of trees (n_estimators)<br>- Max depth<br>- Min samples split<br>- Max features | - Handle missing values (imputation)<br>- No scaling needed (tree-based)<br>- Encode categorical variables<br>- Remove highly correlated features | - Features are independent<br>- Robust to noise<br>- Non-linear relationships | Numeric, Binary Class, Multi-class | - Fraud detection<br>- Stock price prediction<br>- Customer churn prediction |
| | **XGBoost** | - Gradient Boosting<br>- Regularization<br>- Early Stopping | MSE (Regression):<br>$$\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$<br>Log Loss (Classification):<br>$$-\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right]$$ | - Accuracy<br>- F1 Score<br>- ROC-AUC<br>- RMSE (regression) | - Learning rate (eta)<br>- Max depth<br>- Number of estimators<br>- Subsample ratio<br>- Regularization (lambda, alpha) | - Handle missing values (XGBoost handles natively)<br>- No scaling needed<br>- Encode categorical variables<br>- Feature selection | - Non-linear relationships<br>- Robust to outliers<br>- Features are independent | Numeric, Binary Class, Multi-class | - Kaggle competitions<br>- Customer lifetime value prediction<br>- Disease prediction |
| **Deep Learning** | **Deep Neural Network (DNN)** | - Backpropagation<br>- Activation Functions (ReLU, Sigmoid)<br>- Dropout | MSE (Regression):<br>$$\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$$<br>Categorical Cross-Entropy (Classification):<br>$$-\sum_{i=1}^n \sum_{c=1}^C y_{i,c} \log(\hat{y}_{i,c})$$ | - Accuracy<br>- F1 Score<br>- ROC-AUC<br>- MSE (regression) | - Number of layers<br>- Neurons per layer<br>- Learning rate<br>- Dropout rate<br>- Batch size | - Impute missing values<br>- StandardScaler or MinMaxScaler<br>- Normalize/scale inputs<br>- Encode categorical variables | - Large datasets<br>- Complex non-linear relationships<br>- Sufficient computational resources | Numeric, Binary Class, Multi-class | - Fraud detection<br>- Recommendation systems<br>- Predictive maintenance |
| | **Convolutional Neural Network (CNN)** | - Convolution Layers<br>- Pooling (Max, Average)<br>- Dropout | Categorical Cross-Entropy:<br>$$-\sum_{i=1}^n \sum_{c=1}^C y_{i,c} \log(\hat{y}_{i,c})$$ | - Accuracy<br>- Precision/Recall<br>- F1 Score | - Number of filters<br>- Kernel size<br>- Learning rate<br>- Dropout rate<br>- Batch size | - Resize images<br>- Normalize pixel values (0-1)<br>- Data augmentation (rotation, flip)<br>- Handle missing images | - Spatial data (images)<br>- Large datasets<br>- Invariance to translation/rotation | Multi-class, Binary Class | - Image classification<br>- Object detection<br>- Facial recognition |
| **Unsupervised** | **K-Means Clustering** | - Centroid Initialization<br>- Elbow Method<br>- Distance Metrics (Euclidean) | Within-Cluster Variance:<br>$$\sum_{i=1}^k \sum_{x \in C_i} \|x - \mu_i\|^2$$ | - Silhouette Score<br>- Inertia<br>- Davies-Bouldin Index | - Number of clusters (k)<br>- Max iterations<br>- Initialization method (k-means++) | - Impute missing values<br>- StandardScaler for numerical features<br>- Remove outliers<br>- Encode categorical variables | - Spherical clusters<br>- Equal variance clusters<br>- Noisy data affects performance | N/A (Unsupervised) | - Customer segmentation<br>- Image compression<br>- Anomaly detection |
| **Forecasting** | **ARIMA** | - Autoregression (AR)<br>- Moving Average (MA)<br>- Differencing | N/A (Maximum Likelihood Estimation) | - Mean Absolute Error (MAE)<br>- Mean Squared Error (MSE)<br>- AIC/BIC | - Order (p, d, q)<br>- Seasonal order (P, D, Q, s) | - Handle missing values<br>- Ensure stationarity (differencing)<br>- Normalize/scale time series<br>- Remove trends/seasons | - Stationary data (after differencing)<br>- Linear relationships<br>- No abrupt changes | Numeric | - Stock price forecasting<br>- Weather prediction<br>- Sales forecasting |
| **NLP** | **Semantic Analysis (e.g., BERT)** | - Transformer Architecture<br>- Attention Mechanisms<br>- Transfer Learning | Cross-Entropy (Classification):<br>$$-\sum_{i=1}^n \sum_{c=1}^C y_{i,c} \log(\hat{y}_{i,c})$$ | - Accuracy<br>- F1 Score<br>- Precision/Recall<br>- BLEU (for generation) | - Learning rate<br>- Number of layers<br>- Attention heads<br>- Batch size<br>- Dropout rate | - Tokenization<br>- Padding/truncation<br>- Normalize text (lowercase, remove punctuation)<br>- Handle missing text | - Large text corpora<br>- Contextual relationships<br>- Computational resources | Binary Class, Multi-class | - Sentiment analysis<br>- Question answering<br>- Text classification |