# Unsupervised Learning

## Clustering

### Types of Unsupervised Learning
There are two popular methods for unsupervised machine learning.

1. Clustering - which groups data together based on similarities

2. Dimensionality Reduction - which condenses a large number of features into a (usually much) smaller set of features.

## K-Means

The K-Means algorithm is used to cluster all sorts of data.

It can group together

1. Books of similar genres or written by the same authors.
2. Similar movies.
3. Similar music.
4. Similar groups of customers.

This clustering can lead to product, movie, music and other types of recommendations.

In the K-means algorithm __'k' represents the number of clusters you have in your dataset__.

### Elbow Method
When you have no idea how many clusters exist in your dataset, a common strategy for determining __k__ is the __elbow method__. In the elbow method, you create a plot of the number of clusters (on the x-axis) vs. the average distance of the center of the cluster to each point (on the y-axis). This plot is called a __scree plot__

The average distance will always decrease with each additional cluster center. However, with fewer clusters, those decreases will be more substantial. At some point, adding new clusters will no longer create a substantial decrease in the average distance. This point is known as the __elbow__.

![elbowMethod](./img/elbowMethod.png)

### How Does K-Means Work?

Here is one method for computing k-means:

1. Randomly place k centroids amongst your data.

Then within a loop until convergence perform the following two steps:

2. Assign each point to the closest centroid.

3. Move the centroid to the center of the points assigned to it.

At the end of this process, you should have k-clusters of points.

The [blog by Naftali Harris](https://www.naftaliharris.com/blog/visualizing-k-means-clustering/) is spectacular at showing you how k-means works for a number of situations.

The starting points of the centroids can actually make a difference as to the final results you obtain from the k-means algorithm.

In order to assure you have the "best" set of clusters, the algorithm you saw earlier will be performed a few times with different starting points. The best set of clusters is then the clustering that creates the smallest average distance from each point to its corresponding centroid.

## Feature Scaling
For any machine learning algorithm that uses distances as a part of its optimization, it is important to scale your features.

You saw this earlier in regularized forms of regression like Ridge and Lasso, but it is also true for k-means. In future sections on PCA and ICA, feature scaling will again be important for the successful optimization of your machine learning algorithms.

Though there are a large number of ways that you can go about scaling your features, there are two ways that are most common:

1. __Normalizing__ or __Max-Min Scaling__ - this type of scaling transforms variable values to between 0 and 1.
2. __Standardizing__ or __Z-Score Scaling__ - this type of scaling transforms variable values so they have a mean of 0 and standard deviation of 1.
