Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support property aggregation on clustered features #2412

Closed
vibze opened this issue Apr 8, 2016 · 49 comments · Fixed by #7584
Closed

Support property aggregation on clustered features #2412

vibze opened this issue Apr 8, 2016 · 49 comments · Fixed by #7584
Assignees

Comments

@vibze
Copy link

vibze commented Apr 8, 2016

Hello, I've started exploring this lib and have a feature request:
Provide reduce function(s) to clustering configuration, so it's possible to filter clusters by aggregated properties rather than just point_count.

@mourner
Copy link
Member

mourner commented Apr 8, 2016

Interesting feature. Can you describe your use case in detail? How should it aggregate the properties? E.g. suppose we cluster a million points where each has different property keys and values — how should the top cluster properties look like then?

@vibze
Copy link
Author

vibze commented Apr 9, 2016

User must provide an aggregation function, so in a case of points belonging to a cluster have different property keys and values or value types everything should be handled in that function. Mapbox-gl just applies it to the dataset.

Let's say I want to build a population heatmap using population data for every human settlement in area.

Feature example:

{
    "type": "Feature", 
    "properties": { 
        "osm_id": "26544289", 
        "name": "Алматы", 
        "type": "city", 
        "population": 1475400 
    }, 
    "geometry": { 
        "type": "Point", 
        "coordinates": [ 76.9458522, 43.2355947 ] 
    } 
}

Now, I want every cluster to have a sum population of all included points as a property. I configure the source like this:

map.addSource("points", { 
    "type": "geojson", 
    "data": "data/points.geojson",
    "cluster": true, 
    "clusterMaxZoom": 14, 
    "clusterRadius": 20,
    "clusterAggregations": {
        "population_sum": function(prevValue, feature, i, arr) {
            if (!('population' in feature) || isNaN(parseFloat(feature.population))) {
                return prevValue;  // handle the case of missing keys or wrong type
            }

            return prevValue + parseFloat(feature.population);
        }
    }
});

... and every cluster from this source receives the population_sum property to work with.

@mourner
Copy link
Member

mourner commented Apr 9, 2016

I believe we won't be able to accept a function, since the mapbox style specification only allows valid JSON. Would it be enough to have something like this?

clusterAggregate: ["population"]

Then it would just do a sum for all given properties.

@vibze
Copy link
Author

vibze commented Apr 15, 2016

It would be enough for certain cases, but definitely not as flexible. Do you think something like this http://bl.ocks.org/gisminister/10001728 could be possible with Mapbox GL API?

@lavishmantri1
Copy link

Can also have some common mathematical functions
clusterAggregateFunction: "SUM" or clusterAggregateFunction: "AVG"?

That would be helpful.

@ryanbaumann
Copy link
Contributor

ryanbaumann commented May 1, 2016

+1 for this feature request. Specifically:

  1. Ability to specify one or multiple property columns to aggregate
  2. Ability to select the aggregation caclulation type.

Requested aggregate calculations for circle-cluster property data to support would be min, max, average, and sum.

As an example, here is the map visual I want to create. To achieve this today, I aggregate the geojson data on the server to return one point with the aggregated "property" value I want.

I'd like to completely port my heatmap calculations over to the new circle-cluster and circle data-driven style. The ask would be to be able to return the raw geojson point data from my data set, and cluster it using the new circle-cluster / geojson property aggregation feature.

adv_dds_example

One challenge may be enforcing data types - geojson property data could be strings, floats, ints, etc. Some aggregate calcs may apply to certain data types but not others.

@idlikesometea
Copy link

Hi, is there any news on this feature? It would be awesome to have it on GL.

@lucaswoj lucaswoj changed the title Custom aggregation for clustering Support property aggregation on clustered features Jul 29, 2016
@davecranwell
Copy link

davecranwell commented Aug 30, 2016

I'd love this functionality too. I'm attempting to display several million points on a map, retrieved by an Elasticsearch geohash aggregate query (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html). This returns data where each row already includes a sum of the points within a certain area.

As it stands the mapbox gl clustering code would only sum the number of results returned not the already-summed values within each result, which totals several million.

@davecranwell
Copy link

davecranwell commented Sep 1, 2016

I note this is already an issue on supercluster: mapbox/supercluster#12

@redbmk
Copy link

redbmk commented Sep 14, 2016

I've been playing around with this and have something working with supercluster to pass in an arbitrary function that will let you aggregate properties.

Then I was thinking for mapbox you could pass in something like clusterAggregates: { population: 'sum', minimum_wage: 'min' }, then mapbox would pass a function to supercluster based on that. In this case the cluster would get a population property with the sum of all the population properties, and a minimum_wage property with the minimum of all minimum_wage properties in the cluster.

It would be simple enough to have sum, max, and min functions to start, which could expand from there.

I have a feeling there will be a lot of red tape around this though - I pretty much have it coded already but it requires changes to supercluster, mapbox-gl, and mapbox-gl-style-spec.

@lucaswoj
Copy link
Contributor

pass in an arbitrary function that will let you aggregate properties

As stated above we cannot accept arbitrary functions for technical reasons.

I have a feeling there will be a lot of red tape around this though - I pretty much have it coded already but it requires changes to supercluster, mapbox-gl, and mapbox-gl-style-spec.

If you're serious about working on this project (:heart:), please open a ticket in the mapbox/mapbox-gl repo and we'll walk you through the changes step-by-step! It's really quite straightforward.

@davecranwell
Copy link

I had a stab at this too and am using this fork in production:
https://github.com/davecranwell/mapbox-gl-js

It uses a PR made by a kind person on supercluster (mapbox/supercluster#12) and my own fork of mapbox-gl-style-spec (https://github.com/davecranwell/mapbox-gl-style-spec) which adds support for the field added in the PR.

Was unsure whether to submit this PRs since the one on supercluster has been lurking for a while now.

@redbmk
Copy link

redbmk commented Sep 14, 2016

As stated above we cannot accept arbitrary functions for technical reasons.

@lucaswoj My understanding is that we can't pass a function into the mapbox styles because they only take JSON - I'd imagine part of that reasoning is because data passed to a worker needs to be serializable. However, supercluster could take any function and mapbox-gl could be the one to pass in the function (from the worker) based on the JSON you give it. Please correct me if I'm understanding something wrong.

I'd be happy to work on the project - I'll get everything together soon and open a ticket.

@davecranwell The mapbox/supercluster#12 pr is really similar to what I have, but the problem I have with it is you can only work with one property (metric). I think it would be handy to be able to aggregate multiple properties, so for example you can get the sum of one property, and the max of another, for example. I guess in theory you could have a property that is a hash and use that as the metric, but then with just adding clusterMetric you're stuck with the default reducer, which is a summing function.

@lucaswoj
Copy link
Contributor

However, supercluster could take any function and mapbox-gl could be the one to pass in the function (from the worker) based on the JSON you give it.

Can you provide a concrete example (pretend Javascript) of how this would work?

I think it would be handy to be able to aggregate multiple properties, so for example you can get the sum of one property, and the max of another, for example.

Agreed. Whatever solution we end up merging will most likely work the way you describe.

@redbmk
Copy link

redbmk commented Sep 15, 2016

Sure,

I still need to write tests and make changes to the spec. Also I figure it will need to be tested and code reviewed for style, browser support, error checking, safety/security, speed, etc. Regardless here's what I have so far:

supercluster changes
mapbox-gl-js changes

Then in your mapbox style you could provide an option like this:

clusterAggregates: {
  property1: 'sum',
  property2: 'min',
  property3: 'min',
  property4: 'max'
}

For now there's no error checking, so if you don't have those properties you'll just get some concatted strings or some NaN's or something else weird probably. It just assumes those properties will be on each point, and carries the accumulated values over to the clusters. So your clusters will end up with the same properties, but with aggregated values. You could still filter the clusters and style them differently if you want by using point_count like normal.

@lamuertepeluda
Copy link

Any news on this topic?

I'd like to have a behavior for MGL similar to https://github.com/SINTEF-9012/PruneCluster which works great with Leaflet.

@lucaswoj
Copy link
Contributor

@lamuertepeluda per https://www.mapbox.com/mapbox-gl-js/roadmap/ this feature is under active development right now 😄

@lamuertepeluda
Copy link

@lucaswoj that's great news. I started contributing to the clustering engine, "supercluster". See
https://github.com/redbmk/supercluster/pull/2
and
mapbox/supercluster#12 (comment)
I hope they get merged soon.

I'll also try to fork mapbox-gl and add the accumulator parameter to the geojson source. I think it should have good enough performances.

@redbmk
Copy link

redbmk commented Jan 27, 2017

Hey, I've been pretty busy but I'll look into that pull request hopefully tonight. I'd actually been working off the cluster-agg-hash-array branch after some discussion about the style spec in mapbox/DEPRECATED-mapbox-gl#26. I'll merge that into master (in redbmk/supercluster) as well.

@lucaswoj are we still depending on mapbox/mapbox-gl-style-spec#47 (it looks like the discussion for this has moved around a bit, but I think that's the latest). I don't see it on the roadmap.

@redbmk
Copy link

redbmk commented Jan 27, 2017

Alright I updated the master and cluster-agg-hash-array branches of redbmk/supercluster.

We still need to come to some consensus on the syntax and I feel like the style spec needs to be fleshed out a bit. Right now it just accepts any object and has a comment suggesting how to use it, but it seems like there should be a way to specify exactly what options are allowed.

@lamuertepeluda
Copy link

I think the aggregator function should just be a parameter of the geojson source (worker options), see the workerOptions configuration rather than the style object.

Once you do this, you should get the custom properties defined in the aggregator function together with {point_count} for each cluster like in the cluster example.

I tried upgrading supercluster to the current version, and this makes {point_count} and {cluster_id} available for each cluster. So I think it would be enough to add an aggregator function parameter to the addSource that get passed down to the supercluster constructor as modified by @redbmk and me to get the custom aggregations for each point.

This wouldn't require any changes to the current style specs but just to add some other layers, that like the "cluster-count" layer of the example, only represent the cluster population information.
It would be easy to get a representation like in the left picture of PruneCluster categories example, while to get the representation on the right, there should be some additional border specs parameter.

I already managed to do this but I was missing the information about the cluster population, that's why I switched to the mod of supercluster.

@redbmk
Copy link

redbmk commented Jan 28, 2017

That's how I have it coded here. The code is over a month behind from master now, but it's the same idea you mentioned, going through workerOptions.

Ideally we would be able to add an aggregator function via the style spec or via the addSource function. That way you could add it to the original style and/or via addSource.

I believe it's technically possible to add arbitrary functions to a web worker by creating a Blob. If that's true, maybe we could pass in a function as a string that would be parsed in the worker. Maybe the string could be either a url to a function (which could be a blob url or an external script), or a string representation of a function (e.g. "" + function () { return 'hello world' }) which would be converted to a blob before being passed into the worker. That might also help solve mapbox/mapbox-gl-style-spec#47 by just letting the user pass in any function as a string.

@ryanbaumann
Copy link
Contributor

ryanbaumann commented Aug 18, 2017

Here is an interim solution to property aggregation using Supercluster in GL JS.

https://bl.ocks.org/ryanbaumann/01b2c7fc0ddb7b27f6a72217bd1461ad

ezgif com-gif-maker

@benderlidze
Copy link

Any updates on this?

@redbmk
Copy link

redbmk commented Aug 26, 2017

@benderlidze there is a lot of activity happening at #4777 to allow for arbitrary expressions. Once that is complete, it will be possible to use those expressions to add support for property aggregation.

@kkaefer
Copy link
Contributor

kkaefer commented Aug 28, 2017

In addition to that v0.39, which was released a few weeks ago, ships with an updated version of supercluster that supports property clustering.

@mourner
Copy link
Member

mourner commented Aug 28, 2017

@kkaefer there's no way to take advantage of that supercluster feature (without forking GL JS) until we expose it through the style spec.

@joeyramoni
Copy link

Any news on this feature? Alternative workaround?

@lamuertepeluda
Copy link

Any e.t.a for this feature?
It's a long time since this issue (and other similar ones) has been opened, and the problem has been already resolved on the superclusterjs side...

@lamuertepeluda
Copy link

I attempted a new workaround approach (which is a bit complex and I'd like to publish as an example on dedicated github repo as soon as I have time) but mainly works like this:

  • have a worker thread which fetches the data (xhr) and clusterizes them with supercluster@3.x and custom map/reduce
    • the worker threads stores the clusterized GeoJSON feature collection data in a blob URL (URL.createObjectURL)
  • when it receives a getClusters message with bbox and zoom, it invokes the index.getClusters returns the new blob URL of the clusterized data
  • the main thread instantiates a source (cluster: false) which receives the blob URL as data option, and on each moveend (debounced) it posts a getClusters message with bbox and zoom
    • when it receives the getClusters message from the worker, it invokes source.setData('clusteredFeaturesUrl')

Each cluster gets the custom stats defined in the worker.
It seems to be fluid enough and waste less memory even with ~300 point features.

Of course it is suboptimal and I want to verify some other details, but it basically works and it's maybe better than having a copy of supercluster in the main thread, which sucks I don't like at all.

@JoJ123
Copy link

JoJ123 commented Jul 9, 2018

@redbmk

@benderlidze there is a lot of activity happening at #4777 to allow for arbitrary expressions. Once that is complete, it will be possible to use those expressions to add support for property aggregation.

Is there an update on this? We would also like to add this feature in our project.

@tvler
Copy link

tvler commented Jul 12, 2018

This would be a great feature for something we're working on at Open Listings

Previously we've just been displaying homes on our map, which has been working fine with clustering. But we're currently working on coalescing homes that are all in the same building into a singular geojson feature, so there'll be less markers on our maps in densely populated areas.

With the way clusters are working now, the point_count won't be correct if building's are clustered in –– it'll just count as 1, rather than the total number of homes in the building.

I think if we store the building home count in our geojson metadata, then there should be a way to render the cluster value we want if a feature like this gets added!

@redbmk
Copy link

redbmk commented Jul 17, 2018

I've been really busy but I would love to work on this again now that Arbex is ready. Maybe I can make some time this weekend to take a look.

@senoj
Copy link

senoj commented Jul 19, 2018

This is something I have been patiently waiting to be implemented so its good to see this topic looked at again.

Im working on a project that needs this feature pretty badly. It is intended to monitor stores that are online/offline/mobile backup. Clusters would need to follow 3 rules:

  1. If all stores are online in a cluster it will be green
  2. If one store is offline in a cluster it will be red
  3. If one store is on mobile backup it will be orange

Right now it looks like this, unclustered points are working as expected:
image

@foundryspatial-duncan
Copy link

@senoj you can do that right now with the external supercluster library, see ryanbaumann's post above. I successfully used that technique to render pie charts showing the ratios of different properties in the clusters. It'll definitely be simpler when that's part of the mapbox-api, but if your business needs are urgent, you can do it now 👍

@senoj
Copy link

senoj commented Jul 19, 2018

@foundryspatial-duncan its not urgent, if there is no traction on this over the next month ill look at this. Id hate to put the work in and shortly after have it added to the mapbox-api :)

@JoJ123
Copy link

JoJ123 commented Jul 20, 2018

@senoj @foundryspatial-duncan I also started to implement it with supercluster. This is working quite well. But I would also prefer, when it's integrated in the mapbox-api.

@redbmk
Copy link

redbmk commented Jul 23, 2018

@kkaefer @Qwervo @lamuertepeluda @JoJ123 @tvler @senoj

I created a PR for this (#7004). I'm sure the syntax will change slightly after it's reviewed by the maintainers, but if you want to test it out now, that would help find any bugs or issues that may arise. Any feedback is welcome. Thanks!

Also, if anybody is interested in helping out, I still need to do some benchmarking, make sure the documentation is up to date everywhere, etc. Any help would be much appreciated!

@mourner
Copy link
Member

mourner commented Nov 12, 2018

Picking this up again, here's a PR proposing a different syntax/implementation for this feature #7584 — check it out!

@mourner
Copy link
Member

mourner commented Jan 4, 2019

Last call for comments on #7584! It's ready for review and likely to be merged soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment