Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

precomputed knn #214

Closed
jlmelville opened this issue Sep 17, 2022 · 2 comments
Closed

precomputed knn #214

jlmelville opened this issue Sep 17, 2022 · 2 comments
Labels
question Further information is requested

Comments

@jlmelville
Copy link

Hello, is there a way to use k-nearest neighbors data created externally? My current strategy is to create a dummy class of the form:

class PrecomputedKNNIndex:
    def __init__(self, indices, distances):
        self.indices = indices
        self.distances = distances
        self.k = indices.shape[1]

    def build(self):
        return self.indices, self.distances

    def query(self, query, k):
        raise NotImplementedError("No query with a pre-computed knn")

    def check_metric(self, metric):
        if callable(metric):
            pass
        return metric

and use it like:

import openTSNE

perplexity = 30
data = get_data_fom_somewhere()

n_neighbors = min(data.shape[0] - 1, int(3 * perplexity))
# assume this doesn't return the "self" neighbor as the first item in the knn
indices, dists = get_nn_from_somewhere(data, n_neighbors)
knn = PrecomputedKNNIndex(indices, dists)

affinities = openTSNE.affinity.PerplexityBasedNN(
    perplexity=perplexity,
    knn_index=knn,
)
embedder = openTSNE.TSNE(n_components=2)
embedded = embedder.fit(data, affinities=affinities)

This seems to work perfectly well, just wondered if I am missing a more obvious approach.

@pavlin-policar
Copy link
Owner

Hey, I think this is currently the only approach that would work. Your dummy class is actually included here, and I think it has already been released (it's been a while since I looked at this).

It's a convoluted solution, I know, but currently the only supported one. I need to return to this and think about how I would allow something like the standard metric="precomputed" without cluttering the API further.

@pavlin-policar pavlin-policar added the question Further information is requested label Sep 30, 2022
@jlmelville
Copy link
Author

Oh yes looks like I missed the in-built precomputed class. Works for me.

Although I am sure you are not looking for API suggestions, maybe you could allow the neighbors parameter on the TSNE constructor to take a tuple containing the indices and distances and then either create the affinities via the perplexity parameter, or use the Uniform version if the perplexity=None?

Anyway, thank you for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants