# LLE (2000)
LLE = Locally Linear Embeddings

http://www.robots.ox.ac.uk/~az/lectures/ml/lle.pdf

The algorithm consists of 3 steps:
1. Global Geometry => we compute distances between all pairs of data points
2. Local Geometry => we describe each data point as linear approximation of it's K nearest neighbors
3. Preserving Projection => we try to find projections that preserve those linear combinations in a smaller space

Why is it Locally Linear Embeddings?

It's **embeddings** because we are seeking for descriptions in a reduced latent space

It's **local** because it makes use of local structure of the data.

It's **linear** because we try to presereve linear approximations of data points

Here is the visualization by the authors
<img src="lle1.png" width=500>

To make the preserved structure density-independent for each point they 
convert neighbours to baricentric coordinates - subtract the point from its neighbors
normalize the linear combination coefficients - make them sum to one

### In more detail

Suppose we know nearest neighbors of each point.

#### Step 2 - Finding W

First, we need to find corresponding linear combination for each point

<img src="img/lle1.png" width=250>

$$ x_i \approx \sum_{j \in N(x_i)} w_{j}\eta_j $$

This task can be formulated as a least squares problem:

$$\epsilon_i = \left| x_i - \sum_{j \in N(x_i)} w_{j} \cdot \eta_j \right|^2 \rightarrow min $$

These individual tasks can be formulated as one optimization problem with weight matrix $W = \{ w_{i,j} \}$:

$$E = \sum_i \epsilon_i = \sum_i \big| x_i - \sum_{j \in N(x)} w_{ij} \cdot \eta_j \big |^2 \rightarrow min $$

We also apply 2 constraints:
- weights must sum to one in each row
- non-zero weights must correspons only to neigbors

$$E = \sum_i \bigg| \sum_{j \in N(x)} w_{ij} \cdot x_i - \sum_{j \in N(x)} x_i - w_{ij} \cdot \eta_j \bigg| ^2 = \sum_i \bigg|  \sum_{j \in N(x)} w_{ij}(x_i-\eta_j) \bigg|^2 \rightarrow min $$

Squared norm is a dot product of two combinations, so we get a quadratic form in the output

$$\epsilon_i = \sum_j \sum_k w_i \cdot w_j \cdot <x_j-\eta_j, x_k-\eta_k> $$

We can rewrite it as sum of local covariance matrices.

$$\epsilon_i = \sum_j \sum_k w_i \cdot w_j \cdot Cov(x_j - \eta, x_k - \eta)  $$

It's a minimization problem with constrain, so we can use Lagrange multipliers to solve it.

<img src="img/lle2.png" width=550>


#### Step 3 - Finding Y

Then we move to latent space $Y$. We fixed the weight matrix $W$ and now we need to optimize projections $y_i$ - we minimize the total loss function. 

$$\sum_i {\left( y_i - \sum_{j} w_{ij} y_j \right)}^2 \rightarrow min$$

Notice that loss function above is a quadratic form of the form $yM^Ty$.

To get optimal projections we find two eigenvectors with least eigenvalues for this matrix.

Recall the Rayleigh quotient theorem:

$$\frac{y^TMy}{y^Ty} \in [\lambda_{min}, \lambda_{max}]$$

To get minimum value we need to find eigenvector that corresponds to $\lambda_{min}$





Оригинальный датасет и PCA

Kernel PCA и LLE



