Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation and clarification of MNIST visualization #196

Closed
farleylai opened this issue Jan 29, 2019 · 1 comment
Closed

Validation and clarification of MNIST visualization #196

farleylai opened this issue Jan 29, 2019 · 1 comment

Comments

@farleylai
Copy link

farleylai commented Jan 29, 2019

Hi,

I just installed the latest release (0.3.7) of UMAP and scikit-learn (0.20) through conda.
The MNIST example does not work for the deprecated fetch_mldata().
The API seems changed to fetch_openml() so I modifies the example a bit as attached:

example.py.txt

The major change is to enlarge the point size to default by removing the s param and fix the target string type to match the color type for comparison with t-SNE later on.
The resulting figures of small MNIST from load_digits() and full MNIST from fetch_openml() are shown below:

example-small-15-0 1-euclidean
Small MNIST from load_digits()

example-full-15-0 1-euclidean
Full MNIST from fetch_openml()

I make sure the default parameters are the same as the original example:

  • n_neighbors = 15
  • min_dist = 0.1
  • metric = 'euclidean'

The running time of both are 7.99s and 357.24s on i7-6850K CPU @ 3.60GHz.
However, the figures don't look as clean as shown in the paper anymore but with many points from other classes.
Am I missing something or is it expected?

For comparison, the visualization by Multicore t-SNE is attached below:

MNIST by Multicore t-SNE

Their s param is also left to default.
Though there are more points between different classes, it looks cleaner within a class meaning out-of-class points seem to come in a fewer number of colors from nearby classes.
Any justifications would be appreciated.

Last but not least, is there any guarantee that UMAP retains more nearest neighbors in the low dimensional embedding space as in the original high dimensional space than t-SNE?

@sleighsoft
Copy link
Collaborator

The current up to date example for MNIST can be found here: https://github.com/lmcinnes/umap/blob/master/examples/plot_mnist_example.py.
I will close the issue for now. Feel free to create a new issue if you have any further problems or the one described here still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants