# Unsupervised Learning

Most common unsupervised learning tasks:
- Dimensionality Reduction
- Clustering
- Anomaly Detection
- Density Estimation
- Association Rule Learning

## Clustering Algorithms

Clustering is the task of identifying similar instances and putting them together in a cluster. 

### K-means Clustering

In [5]:
import numpy as np

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# extra code – the exact arguments of make_blobs() are not important
blob_centers = np.array([[ 0.2,  2.3], [-1.5 ,  2.3], [-2.8,  1.8],
                         [-2.8,  2.8], [-2.8,  1.3]])
blob_std = np.array([0.4, 0.3, 0.1, 0.1, 0.1])
X, y = make_blobs(n_samples=2000, centers=blob_centers, cluster_std=blob_std,
                  random_state=7) # make the blobs: y contains the cluster IDs, but won't be used because that's what we want to predict

k = 5
kmeans = KMeans(n_clusters=k, random_state=42)
y_pred = kmeans.fit_predict(X)

Each instance is assigned a label but this is the index of the cluster it belongs to 
<div class="alert alert-block alert-danger">
Not The True Class Label like in classification.<br> Remember it is unsupervised!
</div>

In [7]:
y_pred is kmeans.labels_

True

In [8]:
y_pred

array([2, 2, 4, ..., 1, 4, 2], dtype=int32)

In [9]:
kmeans.cluster_centers_

array([[-0.066884  ,  2.10378803],
       [-2.79290307,  2.79641063],
       [-2.80214068,  1.55162671],
       [-1.47468607,  2.28399066],
       [ 0.47042841,  2.41380533]])

In [11]:
# Adding new instances to clusters
X_new = np.array([[0, 2], [3, 2], [-3, 3], [-3, 2.5]])
kmeans.predict(X_new)

array([0, 4, 1, 1], dtype=int32)

Hard Clustering is when you assign each instance to a cluster.<br>
Soft Clustering is when you give each instance a score per cluster on how well it fits into the cluster.