# Unsupervised Learning, Recommenders, Reinforcement Learning

## Unsupervised Learning
- Clustering
- Anomaly detection

## Recommender systems
- For example the adivertisements use recommender systems to show adds based on audience preferences

## Reinforcement Learning
- new and has less commercially implemented


## Clustering
- In unsupervised learning we only have inputs but no output labels
- Used to find interesting structure of the data and if data can be grouped into clusters
- Applications
  - Grouping similar news
  - Market segmentation
  - DNA Analysis
  - Astronomical data analysis

## K-means clustering algorithm
- takes random guess of centers of the groups of clusters called cluster centroids
- Go through the data and check to which centroid the data is closer to and map to them
- recompute the centroids by taking the average of the distance from all points mapped to them
- re check to which centroid the data is closer to and map to them
- repeat until there are no changes or the algorithm is converged

#### pseudocode of the K-means algorithm is as follows:

centroids = kMeans_init_centroids(X, K)

for iter in range(iterations):
  - idx = find_closest_centroids(X, centroids)

centroids = compute_centroids(X, idx, K)

The inner-loop of the algorithm repeatedly carries out two steps:
Assigning each training example  𝑥(𝑖)
  to its closest centroid, and
Recomputing the mean of each centroid using the points assigned to it.


#### Actual implementation steps
- Randomly initialize K cluster centroids $μ_1,μ_2,......, μ_k$
- Repeat {
  - Assign points to cluster centroids
  - for i = 1 to m
    - c(i) := index (from 1 to K) of cluster centroid closest to x(i)
    min K ||x(i) - μk
  - Move cluster centroids
    - for k = 1 to K
    μk = average (mean) of points assigned to cluster k
}


#### Optimization
- Cost function : Distortion function
  - $c^{(i)}$ = index of cluster (1, 2, ...., K) to which example $x^{(i)}$ is currently assigned
  - $μ_k$ = cluster centroid k
  - $μ_{c^{(i)}}$ = cluster centroid of cluster to which example $x^{(i)}$ has been assigned
  - $J(c^{(1)}, ..., c^{(m)}, μ_1, ..., μ_k) = \frac{1}{m}∑_{i=1}^m ||x^{(i)}- μ_{c^{(i)}} ||^2$
  - Repeat {
    - Assign points to cluster centroids
    - for i=1 to m
        - $c^{(i)} := index of cluster centroid closest to x^{(i)}$
    - Move cluster centroids
    - for k=1 to K
      - $μ_k$ := average of points in cluster k
  }

#### Initializing K-means
  - How to randomly initialize the centroids location at the beginning
  - Randomly pick K training examples and set the centroids equal to those examples
  - repeat it for different examples
  - Calculate cost function for various randomly picked centroids and pick the ones with less cost


#### Choosing the number of clusters
- Elbow method
  - run K-means for various values of K (number of clusters) and calculate cost function
  - plot the graph between cost function and number of clusters
  - pick the number at the elbow of the curve
  - Don't choose K just to minimize cost function
- Other technique
  - Evaluate K-means based on how well it performs on that later purpose
  -



# Anomaly Detection
- Finding unusual events
- Density Estimation
- Examnple usecases
  - Fraud detection
  - Manufacturing
  - Monitoring computers in a data center


#### Gaussian distribution / Normal Distribution
- If x is a random number, probability of x is determined by a Gaussian with mean μ, variance $σ^2$
- $p(x)=\frac{1}{\sqrt{2\pi}σ}e^{\frac{-(x-μ)^2}{2σ^2}}$
- $μ = \frac{1}{m}∑_{i=1}^mx^{(i)}$
- $σ^2 = \frac{1}{m}∑_{i=1}^m(x^{(i)}-μ)^2$

## Anomaly detection algorithm
- Training set: {$\vec X^{(1)}, \vec X^{(2)}, ..., \vec X^{(m)}$}
- Each example $\vec X^{(i)}$ has n features
- $p(\vec x = p(x_1; μ_1, σ_1^2)*p(x_2; μ_2, σ_2^2)*p(x_3; μ_3, σ_3^2)* ... * p(x_n; μ_n, σ_n^2)$
- Also written as: $p(\vec x) = ∏_{j=1}^n p(x_j; μ_j, σ_j^2)$
- Algorithm
  - Choose n features $x_i$ that you think might be indicative of anomalous examples
  - Fit parameters $μ_1, ..., μ_n, σ_1^2, ..., σ_n^2$
  - $μ_j = \frac{1}{m}∑_{i=1}^mx_j^{(i)}$
  - $σ_j^2 = \frac{1}{m}∑_{i=1}^m(x_j^{(i)}-μ_j)^2$
  - Vectorized
  - $\vec μ = \frac{1}{m}∑_{i=1}^m\vec x^{(i)}$
  - Given new example x, compute p(x)
  - $p(\vec x) = ∏_{j=1}^n p(x_j; μ_j, σ_j^2) = ∏_{j=1}^n\frac{1}{\sqrt{2\pi}σ_j}e^{\frac{-(x_j-μ_j)^2}{2σ_j^2}}$
  - Anomaly if $p(x)<ϵ$