You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just installed the latest release (0.3.7) of UMAP and scikit-learn (0.20) through conda.
The MNIST example does not work for the deprecated fetch_mldata().
The API seems changed to fetch_openml() so I modifies the example a bit as attached:
The major change is to enlarge the point size to default by removing the s param and fix the target string type to match the color type for comparison with t-SNE later on.
The resulting figures of small MNIST from load_digits() and full MNIST from fetch_openml() are shown below:
Small MNIST from load_digits()
Full MNIST from fetch_openml()
I make sure the default parameters are the same as the original example:
n_neighbors = 15
min_dist = 0.1
metric = 'euclidean'
The running time of both are 7.99s and 357.24s on i7-6850K CPU @ 3.60GHz.
However, the figures don't look as clean as shown in the paper anymore but with many points from other classes.
Am I missing something or is it expected?
For comparison, the visualization by Multicore t-SNE is attached below:
Their s param is also left to default.
Though there are more points between different classes, it looks cleaner within a class meaning out-of-class points seem to come in a fewer number of colors from nearby classes.
Any justifications would be appreciated.
Last but not least, is there any guarantee that UMAP retains more nearest neighbors in the low dimensional embedding space as in the original high dimensional space than t-SNE?
The text was updated successfully, but these errors were encountered:
Hi,
I just installed the latest release (0.3.7) of UMAP and scikit-learn (0.20) through conda.
The MNIST example does not work for the deprecated fetch_mldata().
The API seems changed to fetch_openml() so I modifies the example a bit as attached:
example.py.txt
The major change is to enlarge the point size to default by removing the
s
param and fix the target string type to match the color type for comparison with t-SNE later on.The resulting figures of small MNIST from load_digits() and full MNIST from fetch_openml() are shown below:
Small MNIST from load_digits()
Full MNIST from fetch_openml()
I make sure the default parameters are the same as the original example:
The running time of both are 7.99s and 357.24s on i7-6850K CPU @ 3.60GHz.
However, the figures don't look as clean as shown in the paper anymore but with many points from other classes.
Am I missing something or is it expected?
For comparison, the visualization by Multicore t-SNE is attached below:
Their
s
param is also left to default.Though there are more points between different classes, it looks cleaner within a class meaning out-of-class points seem to come in a fewer number of colors from nearby classes.
Any justifications would be appreciated.
Last but not least, is there any guarantee that UMAP retains more nearest neighbors in the low dimensional embedding space as in the original high dimensional space than t-SNE?
The text was updated successfully, but these errors were encountered: