Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric = "precomputed" is not implimented #22

Closed
rach226a opened this issue Feb 25, 2019 · 7 comments
Closed

Metric = "precomputed" is not implimented #22

rach226a opened this issue Feb 25, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@rach226a
Copy link

rach226a commented Feb 25, 2019

Metric = "precomputed" is not implemented

I would like to run uwot::umap() with metric = 'pearson'. However, 'pearson' is not an option with within this package and I got the following error:

Error in match.arg(metric, c("euclidean", "cosine", "manhattan", "hamming", : 'arg' should be one of “euclidean”, “cosine”, “manhattan”, “hamming”, “precomputed”

This error suggests that I can use a "precomputed" distance matrix. So I tried to run uwot::umap() with metric = 'precomputed' and got the following error:

Error in create_ann(metric, nc) : BUG: unknown Annoy metric 'precomputed'

This error suggests precomputed is not implemented within this package.

PS. The original umap package allows for metrix = 'pearson.' It would be nice to see this added to this package!

@jlmelville
Copy link
Owner

metric = "precomputed" is for use with nearest neighbor data, so it requires a list of two matrices, the nearest neighbor indices and the distances. There are some details at https://github.com/jlmelville/uwot#nearest-neighbor-data-format.

It may be the case that uwot can already do what you want. If you have created a full distance matrix yourself, then if you convert it to a dist object, you can pass it directly to the X parameter of uwot without specifying metric = "precomputed", e.g.:

iris_dist <- dist(iris[, -5])
iris_umap <- umap(iris_dist)

I do see that metric = "precomputed" causes an error in the above case, so I will fix that. If you can provide an example of the input you were trying to use, I will try to improve the error reporting for this code path.

Thank you for the suggestion about other metrics and the vote for Pearson. I would also like to see more, but uwot relies on the metrics that Annoy supports. It's possible that I will get more of the neighbor search part of PyNNDescent implemented in R and then more metrics will be available.

@rach226a
Copy link
Author

Thank you for the suggestions. I have successfully run uwot::umap() with Pearson correlation via nn_method = list(idx = index_matrix, dist = dist_matrix) and via uwot::umap(dist(dist_matrix), metric = "precomputed"). My dist_matrix and index_matrix were created with Pearson correlation. Unfortunately, I wanted to do metric learning which isn't possible through this implementation.

@jlmelville
Copy link
Owner

Although I suspect that this is way too late for @rach226a purposes, I am temporarily re-opening to note that:

  1. transform nn #64 now allows for transforming new data with precomputed nearest neighbor data, and metric learning works as part of that:

    devtools::install_github("jlmelville/vizier")
    devtools::install_github("jlmelville/snedata")
    fashion <- snedata::download_fashion_mnist()
    fashion_train <- head(fashion, 60000)
    fashion_test <- tail(fashion, 10000)
    
    # calculate the nearest neighbors outside of uwot (pretend the function isn't the implementation in uwot)
    fashion_train.nn <- uwot:::annoy_nn(X = as.matrix(fashion_train[, 1:784]), k = 15, metric = "cosine", ret_index = TRUE)
    # return umap map with annoy_nn input
    set.seed(1337)
    fashion_umap <- uwot::umap(X = NULL, nn_method = fashion_train.nn, ret_model = TRUE, y = fashion_train$Label)
    
    # compute the query-reference annoy_nn
    query_ref.nn <- uwot:::annoy_search(X = as.matrix(fashion_test[, 1:784]), k = 15, ann = fashion_train.nn$index  )
    
    # use the query-reference annoy_nn to transform query to reference
    fashion_umap_test <- uwot::umap_transform(X = NULL,  model = fashion_umap, nn_method = query_ref.nn)
    
    vizier::embed_plot(fashion_umap$embedding, fashion_train, cex = 0.5, title = "Fashion UMAP", alpha_scale = 0.075)
    vizier::embed_plot(fashion_umap_test, fashion_test, cex = 0.5, title = "Fashion Test UMAP", alpha_scale = 0.075)
  2. Pearson correlation distance is the same as using cosine distance with each row normalized to zero mean, so it's already available in uwot at the cost of a little work up front:

    devtools::install_github("jlmelville/vizier")
    devtools::install_github("jlmelville/snedata")
    fashion <- snedata::download_fashion_mnist()
    fashion_train <- head(fashion, 60000)
    fashion_test <- tail(fashion, 10000)
    
    # subtract mean from each row
    fashion_trainm <- as.matrix(fashion_train[, 1:784])
    fashion_trainm <- fashion_trainm - apply(fashion_trainm, 1, mean)
    fashion_testm <- as.matrix(fashion_test[, 1:784])
    fashion_testm <- fashion_testm - apply(fashion_testm, 1, mean)
    
    fashion_umap <- uwot::umap(fashion_trainm, metric = "cosine", ret_model = TRUE, y = fashion_train$Label, verbose = TRUE)
    fashion_umap_test <- uwot::umap_transform(fashion_testm, model = fashion_umap)
    
    vizier::embed_plot(fashion_umap$embedding, fashion_train, cex = 0.5, title = "Fashion UMAP (Correlation)", alpha_scale = 0.075)
    vizier::embed_plot(fashion_umap_test, fashion_test, cex = 0.5, title = "Fashion Test UMAP (Correlation)", alpha_scale = 0.075)

But it would be better for uwot to do this work internally, and add a metric = "correlation" option.

@jlmelville jlmelville reopened this Jul 19, 2020
@jlmelville jlmelville added the enhancement New feature or request label Jul 19, 2020
@dkatztibco
Copy link

dkatztibco commented May 20, 2021

I have a distance matrix calculated with a non-supported metric (earth mover distance). How can I get it into the required format?
I tried str(fashion_train.nn) from your first example in an attempt to reverse-engineer the format, but it is complex enough so that it's not obvious what is required to move from a square symmetric matrix to that format.
Thanks in advance for any help.

@jlmelville
Copy link
Owner

To carry out UMAP successfully your NN data should be be in the form of a list consisting of two N x k matrices, where N is the number of points in the data set and k is the number of nearest neighbors. Matrix idx contains the indices of the neighbors of point i in row i. Matrix dist contains the equivalent distances.

If you have full dense N x N distance matrix, then there is an internal function you can use, uwot:::dist_nn, that will carry out the conversion for you, e.g.:

iris10 <- as.matrix(iris[1:10, -5])
iris10_dm <- as.matrix(dist(iris10))
# get 4 nearest neighbors
iris10_nn <- uwot:::dist_nn(iris10_dm, k = 4)

@dkatztibco
Copy link

dkatztibco commented May 21, 2021 via email

@jlmelville
Copy link
Owner

Using precomputed nearest neighbors is covered at https://jlmelville.github.io/uwot/articles/hnsw-umap.html and https://jlmelville.github.io/uwot/articles/rnndescent-umap.html. Pearson correlation is now supported with metric = "correlation".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants