A pure Rust implementation of UMAP (Uniform Manifold Approximation and Projection) for dimension reduction, based on the paper by Leland McInnes, John Healy, and James Melville.
- Complete UMAP Implementation: All core algorithms from the original paper
- Multiple Distance Metrics: Euclidean, Manhattan, Cosine, Hamming, Chebyshev, Minkowski, and Jaccard
- Approximate Nearest Neighbors: NN-Descent algorithm for efficient k-NN search
- Spectral Initialization: Power iteration method for robust spectral embedding
- Stochastic Gradient Descent: Optimized SGD with negative sampling
- Configurable Parameters: Full control over all UMAP hyperparameters
- Algorithm 1: Main UMAP algorithm
- Algorithm 2: Local fuzzy simplicial set construction
- Algorithm 3: Smooth k-NN distance computation
- Algorithm 4: Spectral embedding initialization
- Algorithm 5: SGD optimization with negative sampling
Add this to your Cargo.toml:
[dependencies]
rag-umap = "0.1.0"use rag_umap::{convert_to_2d, convert_to_3d};
// Your high-dimensional embeddings data
let embeddings = vec![
vec![1.0, 2.0, 3.0, 4.0, 5.0],
vec![2.0, 3.0, 4.0, 5.0, 6.0],
vec![3.0, 4.0, 5.0, 6.0, 7.0],
// ... more data points
];
// Convert to 2D using UMAP
let embedding_2d = convert_to_2d(embeddings.clone())?;
println!("2D embedding: {:?}", embedding_2d);
// Convert to 3D using UMAP
let embedding_3d = convert_to_3d(embeddings)?;
println!("3D embedding: {:?}", embedding_3d);The conversion functions accept any type that implements Into<f64> + Copy:
// Works with integers
let int_embeddings = vec![vec![1, 2, 3], vec![4, 5, 6]];
let result = convert_to_2d(int_embeddings)?;
// Works with floats
let float_embeddings = vec![vec![1.0f32, 2.0f32], vec![3.0f32, 4.0f32]];
let result = convert_to_2d(float_embeddings)?;The library uses optimized default parameters internally:
| Parameter | Description | Default |
|---|---|---|
n_neighbors |
Number of nearest neighbors | 15 (or data size - 1 if smaller) |
n_components |
Target embedding dimension | 2 for convert_to_2d, 3 for convert_to_3d |
min_dist |
Minimum distance between points in embedding | 0.1 |
n_epochs |
Number of optimization epochs | 200 |
negative_sample_rate |
Negative samples per positive sample | 5 |
spread |
Spread parameter for low-dimensional representation | 1.0 |
local_connectivity |
Local connectivity parameter | 1.0 |
repulsion_strength |
Repulsion strength parameter | 1.0 |
- Local Structure: Preserves fine-grained neighborhood relationships
- Global Structure: Maintains overall data topology
- Automatic Scaling: Adapts neighbor count to dataset size
- Euclidean Distance: Uses L2 distance metric internally
See the examples/ directory for complete examples:
cargo run --example basic_usageThis implementation includes several optimizations:
- NN-Descent: O(N^1.14) approximate nearest neighbor search
- Power Iteration: Robust spectral embedding for large graphs
- Negative Sampling: Efficient repulsive force computation
- Sparse Representations: Memory-efficient fuzzy simplicial sets
UMAP is based on:
- Riemannian Geometry: Local manifold approximation
- Algebraic Topology: Fuzzy simplicial set representations
- Category Theory: Functors between metric spaces and topological structures
The algorithm constructs a high-dimensional fuzzy topological representation of the data, then optimizes a low-dimensional representation to match this structure using cross-entropy minimization.
| Method | Local Structure | Global Structure | Scalability | Embedding Dimension |
|---|---|---|---|---|
| UMAP | ✓ | ✓ | High | Any |
| t-SNE | ✓ | Limited | Medium | Typically 2-3 |
| PCA | Limited | ✓ | High | Any |
| Isomap | ✓ | ✓ | Low | Any |
- Assumes data lies on a manifold
- May find structure in noise (constellation effect)
- Local distance prioritized over global distance
- Requires parameter tuning for optimal results
This implementation is based on the UMAP paper:
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
Contributions are welcome!