# Uniform Manifold Approximation and Projection (UMAP)
UMAP is a dimensionality reduction algorithm which performs non-linear dimension reduction. It can also be used for visualation of the dataset.
The UMAP model implemented in cuml allows the user to set the following parameter values:
1. n_neighbors: number of neighboring sample used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved
2. n_components: the dimension of the space to embed into
3. n_epochs: number of training epochs to be used in optimizing the low dimensional embedding
4. learning_rate: initial learning rate for the embedding optimization.
5. init: string (optional, default 'spectral')
        How to initialize the low dimensional embedding. Options are:
            * 'spectral': use a spectral embedding of the fuzzy 1-skeleton
            * 'random': assign initial embedding positions at random.


In [1]:
import numpy as np
import pandas as pd

import cudf
import os

from sklearn import datasets
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans
from sklearn.manifold.t_sne import trustworthiness

from cuml.manifold.umap import UMAP

In [2]:
data, labels = datasets.make_blobs(
    n_samples=500, n_features=10, centers=5)

In [3]:
embedding = UMAP().fit_transform(data)

In [4]:
score = adjusted_rand_score(labels,
            KMeans(5).fit_predict(embedding))

assert score == 1.0

In [5]:
iris = datasets.load_iris()
data = iris.data

In [6]:
embedding = UMAP(
    n_neighbors=10, min_dist=0.01,  init="random"
).fit_transform(data)

In [7]:
trust = trustworthiness(iris.data, embedding, 10)
assert trust >= 0.95

In [8]:
iris_selection = np.random.choice(
    [True, False], 150, replace=True, p=[0.75, 0.25])
data = iris.data[iris_selection]

In [9]:
fitter = UMAP(n_neighbors=10, min_dist=0.01, verbose=True)
fitter.fit(data)

new_data = iris.data[~iris_selection]
embedding = fitter.transform(new_data)

In [10]:
trust = trustworthiness(new_data, embedding, 10)
assert trust >= 0.90