Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spots disappear when zoomed out #77

Closed
ravwojdyla opened this issue Apr 15, 2023 · 6 comments
Closed

Spots disappear when zoomed out #77

ravwojdyla opened this issue Apr 15, 2023 · 6 comments

Comments

@ravwojdyla
Copy link

Thanks for open sourcing, so far works great. Have noticed a small artefact, see:

artefact_vid.mov

Notice 3 spots that disappear when we zoom out and appear when zoomed in. There's roughly 10M points there. Default quadfeather flags. Latest deepscatter.

@bmschmidt
Copy link
Collaborator

bmschmidt commented Apr 15, 2023

This is likely a result of the insert order--the three spots are probably clusters that were in the last 50% inserted. At initial zoom, not all 10 million points are shown to reduce network transfer/improve render performance. So if--let's say--these are word2vec embeddings which pop out by default in frequency order, these may be clusters of rarely used words. Etc.

To confirm you could run

plot.plotAPI({"encoding": {"color": {"field": "ix", "range": "viridis", domain: [1, 10e6]}}})

which will make the color on the chart reflect input order.

Easiest solutions are:

  1. Just plot with {"max_points": 10e6}; for local exploration that's fine.
  2. Randomly shuffle the data before insertion (that is, before running quadfeather).

@bmschmidt
Copy link
Collaborator

Also, just out of curiosity, are you able to share what the data is? There aren't that many 10m point t-sne embeddings in the world yet!

@ravwojdyla
Copy link
Author

@bmschmidt ah, that makes sense and you were totally right. Thank you so much! Random shuffle fixed the issue. Regarding data - sure this is actually a subset of PubMed.

@ravwojdyla
Copy link
Author

Just plot with {"max_points": 10e6}; for local exploration that's fine.

@bmschmidt I have a follow up question about this recommendation. max_points controls how many points are displayed at a current zoom level right? So for example if I have 10M points, and set max_points to 500K, at any given point there will be up to 500K displayed and if there's more than 500K points at the current zoom they will be uniformly sampled?

@bmschmidt
Copy link
Collaborator

As currently implemented, max_points sets the number of points that would be displayed at each zoom level if the data were uniformly distributed across the full bounding box. This means that there may be more or less than max_points displayed at any given time. If you zoom into a region of high density with max_points: 500000, you may actually render 1.5 million points in some areas.

The reason is that it's not actually uniform sampling. At insert, every point is assigned an index number from 1 to (in your case) 10 million. At zoom level 1 all points that with an index below 500K will be shown; if you zoom in to show only a quarter of the data all points with an index level below 2m will be shown; if you zoom in to a quarter of that region all points with an index level below 8m will be shown; etc.

@ravwojdyla
Copy link
Author

@bmschmidt ah, that makes sense, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants