Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom losses, coherent embeddings #35

Open
znah opened this issue Jan 4, 2018 · 6 comments
Open

Custom losses, coherent embeddings #35

znah opened this issue Jan 4, 2018 · 6 comments

Comments

@znah
Copy link

znah commented Jan 4, 2018

Nice property of TSNE, that is not exploited in most of implementations, is that it can be treated as a combination of two orthogonal components: loss function and optimization algorithm. For example one may visualize set on temporally varying vectors with a sequence on coherent embeddings by adding a loss term that penalizes unnecessary movement of each vector between those embeddings. Is it possible to have provide such flexibility to use additional constrains with UMAP?

@lmcinnes
Copy link
Owner

lmcinnes commented Jan 4, 2018

It is a little hard to do that and still maintain efficiency; potentially it could be added as an extra code-path on the side that is slower but more flexible. That would be a more significant project however.

@znah
Copy link
Author

znah commented Jan 4, 2018

Thank you for the prompt reply!
I'm actually looking forward seeing your write-up about UMAP algorithm, hoping to reimplement it in a flexible way. For example, the most naive implementation of tsne loss with TensorFlow boils down to something like this:

def tsne_kl_loss(points, P):
    n = tf.shape(points)[0]
    Q = 1.0 / (1.0 + pdist2(points))
    sQ = tf.reduce_sum(Q) - tf.cast(n, tf.float32)
    return tf.reduce_sum(P*tf.log(P/Q)) + tf.log(sQ)

Then one can combine this loss with others and use one of standard optimizers.
Do you think if this kind of approach can be adopted to UMAP?

@lmcinnes
Copy link
Owner

lmcinnes commented Jan 4, 2018

At that level, yes almost certainly; if you are willing to do N^2 work then you can certainly have custom loss -- I was generally seeking to avoid that. On that front you might be interested by smallvis which implements t-SNE, LargeVis and UMAP in a common framework which I suspect would easily be adaptable to custom loss functions. The cathc is that it only supports small datasets for exactly the reason cited above. As a way to experiment, however, it is quite powerful.

@znah
Copy link
Author

znah commented Jan 4, 2018

Thank you for pointing me to smallvis! I think that's exactly what I needed.
N^2 can actually go surprisingly far with efficient GPU implementation. For example, here is a random youtube video showing n^2 nbody with 60k particles at 30fps.

Still, I like to think of algorithmic optimizations, like employing BH or something else as yet another, partially orthogonal component.

@vanhoan310
Copy link

How to obtain the matrix P from UMAP? Is it self.graph_ from fuzzy_simplicial_set function?

Thanks!

@lmcinnes
Copy link
Owner

lmcinnes commented Aug 7, 2020

That is the equivalent of it, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants