# LLE (2000)
---

LLE stands for "Locally Linear Embeddings"

The algorithm consists of 3 steps:
1. for each point find its neighbourhood
2. encode each point's local geometry => express the point as a linear combination of its neighbors
3. try to preserve those combinations in the target space

Schematic visualization of the process
<img src="img/lle.jpg" width=300>

Name explanation<br>
\- Embeddings = the output is a condensed latent representation of data point<br>
\- Local = method tries to preserve the local structure only (conditions expressed for neighbours only)<br>
\- Linear = we try to presereve linear approximations of data points


## Algorithm
---

#### Step 1 
Find the nearest neighbors of each point

#### Step 2 

Assumption = each point $x_i$ can be expressed in terms of its neighbours - as a linear combination

$x_i \approx \sum_{j \in N(x_i)} w_{j}\eta_j $

<br>For example for k=3 the weights vector define a point in a triangle

<img src="img/lle1.png" width=250>

The error for a single point

$\epsilon_i = \left| x_i - \sum_{j \in N(x_i)} w_{j} \cdot \eta_j \right|^2 \rightarrow \min_{w} $<br><br>

The total error:

$E = \sum_i \epsilon_i = \sum_i \big| x_i - \sum_{j \in N(x)} w_{ij} \cdot \eta_j \big |^2 \rightarrow \min_{w} $<br><br>


<img src="img/lle2.png" width=550>

We also apply 2 constraints:
- weights must sum to one in each row
- non-zero weights must correspons only to neigbors

$$E = \sum_i \bigg| \sum_{j \in N(x)} w_{ij} \cdot x_i - \sum_{j \in N(x)} x_i - w_{ij} \cdot \eta_j \bigg| ^2 = \sum_i \bigg|  \sum_{j \in N(x)} w_{ij}(x_i-\eta_j) \bigg|^2 \rightarrow min $$

Squared norm is a dot product of two combinations, so we get a quadratic form in the output

$$\epsilon_i = \sum_j \sum_k w_i \cdot w_j \cdot <x_j-\eta_j, x_k-\eta_k> $$

We can rewrite it as sum of local covariance matrices.

$$\epsilon_i = \sum_j \sum_k w_i \cdot w_j \cdot Cov(x_j - \eta, x_k - \eta)  $$

It's a minimization problem with constrain, so we can use Lagrange multipliers to solve it.




#### Step 3 
In target space choose positions for $y$ that would preserve linear combinations
$$\sum_i {\left( y_i - \sum_{j} w_{ij} y_j \right)}^2 \rightarrow min_{y}$$

Notice that loss function above is a quadratic form $yM^Ty$.

To get optimal projections we find two eigenvectors with least eigenvalues for this matrix.

Recall the Rayleigh quotient theorem:

$$\frac{y^TMy}{y^Ty} \in [\lambda_{min}, \lambda_{max}]$$

To get minimum value we need to find eigenvector that corresponds to $\lambda_{min}$


## References
---
http://www.robots.ox.ac.uk/~az/lectures/ml/lle.pdf



## Example in Scikit-Learn
---

In [None]:
from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_neighbors=10, n_components=2)
X_transformed = lle.fit_transform(X)