# [Linear and Quadratic Discriminant Analysis](https://scikit-learn.org/stable/modules/lda_qda.html#estimation-algorithms)

* Classifiers with closed form solution
* Simple: multiclass & no hyperparameters
* Linear and quadratic decision bounds
* `LinearDiscriminantAnalysis` can be used for dimensionality reduction and is available in the `.transform` method
* `QDA` is eqiuvalent to `naive_bays.GaussianNB` if the inputs are conditionally independend in each class
* Regularization/Shrinkage can be used to improve the generalization performance of the classifier
    * set solver to `lsqr` or `eigen`
    * set shrinkage to `auto`


## [Support Vector Machines](https://scikit-learn.org/stable/modules/svm.html)
* Classification (`SVC`, `NuSVC`, `LinearSVC`), regression, outliers detection
* Support vectors -> subset of training points within the margin (`.support_vectors_`). Cost function for building the model does not care about training points that lie on or beyond the margin.
    * Margin: D(decision boudary, closest data points of each class), Goal -> maximize the margin (larger margin, better generalization)
    * Deicion boundary: plane separating different classes
* good where n features > n samples 
* They can handle non-linear decision bounds - so they are good for classifying complex data
* Binary and Multi-class classification: one-versus-one for multi-class classification - binary classifier for every possible pair of classes
* `SVC`, `NuSVC`: one-vs-one and then map to one-vs-rest with `decsiion_function` for per-class score
* Scores are scaled with the Platt method (logit + cross-validation), but this has *limitations*
    * Computationally expensive -> Kernel choice for optimization
    * probability estimates can be incosistent with the scores (`predict` gives positive class even if `.predict_proba` < threshold)
    * "theoretical issues"
* has `class_weight` and `sample_weicth` method to deal with imbalance
* Regularization is present
* *Data has to be scaled!*
* Outlier detection `OneClassSVM` -> unsupervised learning


## [Stochastic Gradient Descent](https://scikit-learn.org/stable/modules/sgd.html#stochastic-gradient-descent)
* A way to fit a model, such as linear classifiers & regressors (eg linear SVM, logit)
* Efficient (fast) in large scale (>10k) and spare matrix ML problems, like text classification or NLP
* Allows code tuning (from scratch), but requires hyperparameters
* **Requires features caling!**
* Training data needs to be shuffled
* Regularization: good for >10k samples, otherwiase use Ridge, Lasso, ElasticNet
* `early_stopping` -> True: train/test split, False: all data
* For multi-class classification, a “one versus all” approach is used.


## [Nearest Neighbors](https://scikit-learn.org/stable/modules/neighbors.html)
* unsupervised and supervised neighbors-based learning
* classification & regression
* Find n closest instances to the new point and predict label from these
    * kNN -> k nearest, user defined
    * Number ased on local density -> radius-based
* Classification: non-generalizing ML model -> "stores" all training data & does not generlaize. A new point is then assigned based on the training data
* Efficient when decision boundary is very irregular
* **Unsupervised** NN: 
    * brute-force `sklearn.metrics.pairwise`: Distance of all pairs
    * `KDTree`: if point A is very distant from point B, and point B is very close to point C, then we know that points A and C are very distant.
    *`BallTree`: recursively divides data into nodes defined by a centroid of radius r
* Best in small or medimum sized data
* Nearest Neighbors Regression use-cases
    * Impute missing values
    * time series forecasting
    * geographical/spatial data


SVM vs kNN
- KNN can handle <span style="color:#1df5b4;">sparse data</span> well since it directly considers the distance between data points. SVM, particularly with linear kernels, may struggle in high-dimensional spaces due to the curse of dimensionality, leading to overfitting or suboptimal decision boundaries. 
- KNN's instance-based approach allows it to leverage <span style="color:#1df5b4;">local</span> structures in sparse data effec KNN is particularly effective when local neighborhoods carry significant information about class membership. In image recognition, slight variations in pixel values may lead to different classes, and KNN can capture these local nuances. SVM, in such cases, might form more global boundaries that overlook fine-grained local patterns.
- KNN can effectively <span style="color:#1df5b4;">impute</span> missing values based on the nearest neighbors, allowing for flexible data handling. 
- SVM is effective in <span style="color:#1df5b4;">high-dimensional</span> spaces due to its focus on maximizing the margin between classes. KNN can struggle with the curse of dimensionality, where all points seem equidistant,
- SVM works on the principle of maximizing the margin and has well-defined theoretical foundations that help with <span style="color:#1df5b4;">generalization</span>
- SVM can be adapted to <span style="color:#1df5b4;">multi-class</span> classification effectively through strategies like “one-vs-one” or “one-vs-all.” These adaptations can lead to good performance even in multi-class settings. KNN, although it can also handle multi-class problems, may become computationally expensive as the number of classes increases because it needs to consider multiple neighbors for each prediction



NEXT: https://scikit-learn.org/stable/modules/gaussian_process.html