New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinmap with colored pins and clustring #2737

Closed
wants to merge 15 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@shahinism
Copy link

shahinism commented Jun 11, 2016

In this branch we've updated Pinmap data visualization component, to reflect our needs, and since the result is promising enough we though it's a good idea to share it with you here.

Changes:

  1. Using leafletjs for maps. It has a reach set of helper plugins, which makes it easier to handle data on the map.
  2. Using OSM map tiles instead of google map. It is also possible to use other map tiles, but for now we are using OSM. Google map has some known performance problem.
  3. Removed 2000 threshold for showing pins with popup data. Instead we are using pin clustering layer.
  4. Settings map center to middle point of pins on the map.
  5. Using SVG pins instead of images to make us able to change pin colors. (thanks to leaflet.vector-marker)
  6. Define pin colors based on category special type. So if a dataset, provides category type together with lat, lon fields, each pin will get a color related to it's category value. Colors for category values are calculated by color hash package. Also for category types with conventional green/red colors (like yes/no) it will use green or red instead of using color-hash.
  7. Clustering markers when there are a lot of markers in a map view. (thanks to prunecluster)
  8. Display the number of pins from each category with category's color, on cluster icon.

Screenshots:

Normal data with clustering (no category):

selection_150

data with clustering and categories:

selection_152

data with clustering and conventional (yes/no) category colors:

selection_151

Possible issues

Two of the libraries we used here (leaflet.vector-market and prunecluster) are not providing npm packages or serving this packages over a reliable CDN. Since we couldn't find any document on this subject in repository or any use case of such library in Metabase's code base, we are using rawgit temporarily.

@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Jun 11, 2016

Nice!

@kdoh does this play nicely with our color palate and the work you're doing to unify it?

@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Jun 12, 2016

@shahinism, before we'd be able to merge this into the project, we would need you to sign our Contributor License Agreement - https://docs.google.com/forms/d/1oV38o7b9ONFSwuzwmERRMi9SYrhYeOrkbmNaq9pOJ_E/viewform

We'll review it and see if we can get it in the 0.19 release if things look good. We're code freezing tomorrow for 0.18, and I think this will probably need a bit of bake-in time.

@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Jun 12, 2016

This looks like it sidesteps our server-side tile generation - https://github.com/metabase/metabase/blob/master/src/metabase/api/tiles.clj

We had originally done that to speed up tile rendering. @shahinism how many points have you been able to render using the approach in this PR?

Also, many of our dataset endpoints are limited from returning more than 2k rows, so we should make sure this isn't broken by that limit.

@shahinism

This comment has been minimized.

Copy link

shahinism commented Jun 12, 2016

@salsakran I've already signed contributor license agreement. (Shahin Azad)

About the number of points, we in Foundersbuddy need a lot more than 2k of points, so I increased the threshold to 20k for test and it works with no problem as normal. On the other hand, prune cluster have done some tests up to 1 million points on a map, which in my opinion has some good results (links and results in its repo).

@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Jun 12, 2016

my read of https://github.com/metabase/metabase/blob/master/src/metabase/api/tiles.clj#L122 is that we are effectively capping any given tile by absolute-max-results defined at https://github.com/metabase/metabase/blob/master/src/metabase/query_processor.clj#L24 . Hence the tiles should be accurate for anyone not plotting 100s of millions of rows or more.

If we do this, then the max result cap will directly affect the accuracy of these pin maps. So either we should have the pin marker end up being something like "2000+" instead of "2000".

I think for the general case of "bare rows" anything past 2k isn't going to be navigated by a human anyway, but in the case of pin maps it's not unreasonable to be trying to create a map of your entire use base. Lots of companies using us have way more than 2k users in a single region (eg, NYC or SF).

Btw, @shahinism I'm a huge fan of this approach (and leafletjs), and think removing a dependency on a commercial api is a huge win for the project. Just trying to think through corner cases and make if we ship this no one else's usage breaks. Bear with our OCD tendancies =)

@MrMauricioLeite

This comment has been minimized.

Copy link

MrMauricioLeite commented Jun 13, 2016

WOW.... Really, I'm amazed!
Great work @shahinism!!! You are doing an amazing job with that plugin.

Can't wait for this to be merged!! maybe in 0.19.0 milestone? =)

@agilliland

This comment has been minimized.

Copy link
Contributor

agilliland commented Jun 14, 2016

I'd say that overall this looks like a sound approach. The couple things I would tweak are ...

  1. I would prefer not to see single letter aliases for 3rd party library imports. It will ultimately make the code harder to read. We allow a couple special cases if a library is used a lot (like underscore), but in this case I don't think leaflet should be import L from "leaflet"; and instead would prefer to see a more natural name used such as leaflet.
  2. We'll definitely want to find an alternative to using the CDN to get the libs and css. Maybe @tlrobinson has some ideas on how to get this to integrate with our normal webpack build process?

Regarding the 2k limit on rows, that is definitely more of an issue now, but it's a bit orthogonal to this specific PR really. It's not terribly difficult to come up with some other situations where a :rows type query would justifiably need to allow more than 2k rows to be returned (e.g. a pre-aggregated timeseries), so I'm not sure I would hold up the PR on that basis. It is something that needs to be dealt with though. It may make sense to add in the capability of picking specific fields from a table and simply not enforce the 2k limit in that situation. That way you could do something like select lat, long from geo_table and it wouldn't be restricted to 2k rows the way that select * from geo_table would.

@shahinism shahinism force-pushed the FoundersBuddy:pinmap-with-clustring branch from c32d2e3 to 72895da Jun 19, 2016

@shahinism

This comment has been minimized.

Copy link

shahinism commented Jun 19, 2016

I'm agree with @agilliland about single letter aliases for library imports. I've already tried to use leaflet as library import name. But those leaflet plugins (which I mentioned on CDN issue), are using L to access it (it's like an unwritten standard in leaflet community and most of the plugins out there are using the same approach).

@MrMauricioLeite

This comment has been minimized.

Copy link

MrMauricioLeite commented Jun 29, 2016

I would love to see this merged

@tlrobinson

This comment has been minimized.

Copy link
Member

tlrobinson commented Jun 29, 2016

As far as moving off the CDN, it looks like all of those packages are on NPM now (not sure if they're the right versions)

https://www.npmjs.com/package/leaflet
https://www.npmjs.com/package/Leaflet.vector-markers
https://www.npmjs.com/package/prunecluster

@salsakran

This comment has been minimized.

Copy link
Contributor

salsakran commented Jun 29, 2016

@shahinism Sorry for the delay.

We've discussed this a bunch, and I think this should be merged after #612

We'll have the same issue with the 2k limit in charting arbitrary columns in a raw result set, and we'll set up a way for the client to pull down all values for charted rows. This will remove the correctness issue (i.e. we can't expect admins to directly manage the max rows limit based on their data size).

Thanks for you patience, and hard work.

@MrMauricioLeite MrMauricioLeite referenced this pull request Jul 14, 2016

Merged

Chart settings #2931

5 of 5 tasks complete
@shahinism

This comment has been minimized.

Copy link

shahinism commented Jul 18, 2016

@salsakran No problem ;)
@tlrobinson Thanks for packages links. I'm already using leaflet package from npm. The reason I didn't used prunecluster from npm is that this package is in beta version and as far as I tested it is unstable, so I used the latest stable version.

@tlrobinson

This comment has been minimized.

Copy link
Member

tlrobinson commented Jul 18, 2016

@shahinism Makes sense. If the git repo has the package.json you can also just specify the repo URL and hash/branch. Could you try that? Otherwise we can either fork the repos and add the package.json or just vendor the files in frontend/src/metabase/vendor/*

@tlrobinson

This comment has been minimized.

Copy link
Member

tlrobinson commented Jul 24, 2016

Regarding the 2k row limitation:

One possibility is adding an endpoint similar to the server-side tile generation endpoint, except it would returns a compact list of coordinates (and maybe entity id so we could link to an object detail page). It still wouldn't be as scalable as the server-rendered tiles since I don't think we want to send absolute-max-results rows to the client (1048576 * 2 floats = 8MB). The next step would be compute the clustering on the server, but that seems like overkill.

We could still offer the server-rendered tiles mode as a chart setting, but I think limiting to 10,000-100,000 rows and showing a warning would be fine. Remember that limit is the number of points within your current viewport (we filter by lat/lon bounds in the tile endpoint), so we can just tell the user to zoom in.

@salsakran Note that showing "2000+" on a cluster isn't sufficient since the 2000 rows may be split across more than one cluster. It would have to be a top-level warning like we have for tables.

@tlrobinson tlrobinson referenced this pull request Aug 5, 2016

Closed

Leaflet pinmaps #3133

3 of 14 tasks complete
@tlrobinson

This comment has been minimized.

Copy link
Member

tlrobinson commented Aug 5, 2016

@shahinism I've started merging this with our latest changes in #3133. Thanks for the great work! Let me know if you'd like to continue helping, otherwise I'm happy to finish it up.

@tlrobinson tlrobinson referenced this pull request Aug 8, 2016

Closed

Funnel visualization #3144

3 of 8 tasks complete
@shahinism

This comment has been minimized.

Copy link

shahinism commented Aug 8, 2016

Thank you @tlrobinson for taking care of this. I was busy last couple weeks, and couldn't test suggested solutions.

@kdoh

This comment has been minimized.

Copy link
Member

kdoh commented Feb 22, 2017

Hey @shahinism. Want to give you a big thank you for your work on this, it got us excited about maps and charting improvements in general. Since there hasn't been any activity on this for a while I'm going to close this for the moment. If you feel like working more on it, we're more than happy to re-open it.

@kdoh kdoh closed this Feb 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment