k-Nearest Neighbors (Non-parametric Model)

1) kNN Algorithm
![knn_classify](knn_classification.png)
* kNN Classification steps:
    1. For a data point in training set, calculate distance from data point to new value
    2. Order distances in increasing order and take the first $k$
    3. Take the label with the most votes
* **kNN Regression** - assign mean value of k nearest neighbors
    * neighbors averaged (or some combo function) to give continuous value
* **kNN Imputation** - replace missing data/values with kNN
* **kNN Anomaly Detection** - outlier has large distances to nearest neighbors
    * determining if the nearest neighbor is super far away
    * is the new data point an outlier?
* Determining K:
    * use k-fold cross-validation technique to determine the best $k$ for the model
![kfold_cv_knn](kfold_cv_knn.png)
* Feature Scaling:
![knn_feature_scaling](knn_feature_scaling.png)
    * Make sure to scale the features
    * Similar to Ridge Regression

2) Distance Metrics:
1. **Euclidean distance** - straight-line distance between two points
![euclid_dist](http://rosalind.info/media/Euclidean_distance.png)
    * equation: $dist(a,b) = \Vert a-b \Vert = \sqrt{(a_1-b_1)^2+(a_2-b_2)^2+\cdots+(a_n-b_n)^2} = \sqrt{\sum_{i=1}^n (a_i-b_i)^2}$
2. **Cosine similarity** - measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them
![cosine_sim](https://engineering.aweber.com/wp-content/uploads/2013/02/4AUbj.png)
    * equation: $dist(a,b) = \frac{a \cdot b}{\Vert a \Vert \Vert b \Vert} = \frac{\sum_{i=1}^n a_i b_i}{\sqrt{\sum_{i=1}^n a_i^2}\sqrt{\sum_{i=1}^n b_i^2}}$
3. **Manhattan** - sum of the lengths of the projections of the line segments between the points onto the coordinate axes
![manhat_dist](https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Manhattan_distance.svg/1200px-Manhattan_distance.svg.png)   
    * equation: $dist(a,b) = \Vert a-b \Vert_1 = \sum_{i=1}^n |a_i-b_i|$

3) Curse of Dimensionality / Pros and Cons
* **Curse of Dimensionality** - as the dimensionality increases, performance of kNN commonly decreases
    * Nearest neighbors are no longer nearby neighbor
    * Adding useful features (truly associated with the response) is generally helpful, but noise features increase dimensionality without the upside
    * Intuition: consider a particular red hypercube inside a unit-hypercube space (black)
        * how long would we need to make one side of the red hypercube in order to capture 10% of the volume of the space (black)?
![curs_dim_knn](curse_of_dim_knn.png) 
* kNN Strength and Weaknesses:
    * (+) really easy to train, store and update model (saves all data)
    * (+) easily works with any number of classes
    * (+) easy to add new training data points
    * (+) can learn a very complex function
    * (+) no demands on relationships between variables (e.g. linearity assumption)
    * (+) only a few hyperparameters to tune (e.g. k, distance metric, etc.)
    * (-) really slow to predict (especially if you have a lot of features)
    * (-) I.O. bound - the cost is reading through data, not calculation
    * (-) noise can affect results
        * particularly irrelevant dimensions
        * even though it is okay to have correlated variables, it is best to remove or else the dimensionality price will be high
    * (-) feature interpretation can be tricky
        * categorical: how to calculate distance?
        * feature scaling makes variables less interpretable
