
# Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss Lemma states that a set of n points in high-dimensional space can be embedded into a lower-dimensional space (of dimension k) while approximately preserving the pairwise distances between the points. This is particularly useful in machine learning, where reducing dimensionality can help with computation and storage without losing much information.

In [1]:
import numpy as np
from sklearn.random_projection import GaussianRandomProjection
from sklearn.metrics.pairwise import euclidean_distances

# Generate random data: 10 points in 100-dimensional space
np.random.seed(42)
original_data = np.random.rand(10, 100)

# Compute pairwise distances in the original space
original_distances = euclidean_distances(original_data)

# Use Gaussian Random Projection to reduce dimensions to 5
projection = GaussianRandomProjection(n_components=5, random_state=42)
reduced_data = projection.fit_transform(original_data)

# Compute pairwise distances in the reduced space
reduced_distances = euclidean_distances(reduced_data)

# Compare distances before and after projection
print("Original pairwise distances (first 3):\n", original_distances[:3, :3])
print("Reduced pairwise distances (first 3):\n", reduced_distances[:3, :3])

# Check relative errors between original and reduced distances
relative_errors = np.abs(original_distances - reduced_distances) / (original_distances + 1e-9)
print("Relative errors (first 3):\n", relative_errors[:3, :3])


Original pairwise distances (first 3):
 [[0.         4.23450514 4.26156823]
 [4.23450514 0.         4.42274646]
 [4.26156823 4.42274646 0.        ]]
Reduced pairwise distances (first 3):
 [[0.         3.97758741 4.16120501]
 [3.97758741 0.         2.93491154]
 [4.16120501 2.93491154 0.        ]]
Relative errors (first 3):
 [[0.         0.06067243 0.02355077]
 [0.06067243 0.         0.3364052 ]
 [0.02355077 0.3364052  0.        ]]


The theorem guarantees that a random projection approximately preserves pairwise distances between \( n \) points in $$ \mathbb{R}^d $$ when the projection is into \( k \)-dimensional space, where $$ k \geq \frac{384 \ln(n)}{\epsilon^2} $$. 


In [4]:
import numpy as np
from sklearn.random_projection import GaussianRandomProjection
from sklearn.metrics.pairwise import euclidean_distances

# Define parameters
n = 100  # Number of points
d = 500  # Original dimensionality
epsilon = 0.5  # Approximation factor

# Calculate the required number of dimensions for the projection
k = min(d, int(np.ceil(384 * np.log(n) / epsilon**2)))
print(f"Reducing dimension from {d} to {k} to satisfy the theorem.")

# Generate random high-dimensional data
np.random.seed(42)
data = np.random.rand(n, d)

# Compute pairwise distances in the original space
original_distances = euclidean_distances(data)

# Apply random projection
projection = GaussianRandomProjection(n_components=k, random_state=42)
reduced_data = projection.fit_transform(data)

# Compute pairwise distances in the reduced space
reduced_distances = euclidean_distances(reduced_data)

# Check if the distances satisfy the JL guarantee
sqrt_k = np.sqrt(k)
violations = 0
for i in range(n):
    for j in range(i + 1, n):
        orig_dist = original_distances[i, j]
        reduced_dist = reduced_distances[i, j]
        lower_bound = (1 - epsilon) * sqrt_k * orig_dist
        upper_bound = (1 + epsilon) * sqrt_k * orig_dist
        if not (lower_bound <= reduced_dist <= upper_bound):
            violations += 1

# Report results
print(f"Total number of points: {n}")
print(f"Number of violations: {violations}")
print(f"Violation rate: {violations / (n * (n - 1) / 2):.4f}")
print("The JL Lemma holds with high probability if the violation rate is small.")


Reducing dimension from 500 to 500 to satisfy the theorem.
Total number of points: 100
Number of violations: 4950
Violation rate: 1.0000
The JL Lemma holds with high probability if the violation rate is small.
