-
-
Notifications
You must be signed in to change notification settings - Fork 396
ST_ClusterIntersectingWin and ST_ClusterWithinWin #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Your NEWs note should reference GH721 |
doc/reference_cluster.xml
Outdated
<funcsynopsis> | ||
<funcprototype> | ||
<funcdef>integer <function>ST_ClusterIntersectingWin</function></funcdef> | ||
<paramdef><type>geometry</type> <parameter>g</parameter></paramdef> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<paramdef><type>geometry</type> <parameter>g</parameter></paramdef> | |
<paramdef><type>geometry winset </type> <parameter>geom</parameter></paramdef> |
I'd change to geom to be consistent with the other functions, unless you plan to at some point in time make this work for geography, might be a good time to experiment with anyelement :)
The winset is fluff to flag it as a window function so the garden test can test it and it also appears in the 15.2. PostGIS Window Functions section.
doc/reference_cluster.xml
Outdated
<funcsynopsis> | ||
<funcprototype> | ||
<funcdef>integer <function>ST_ClusterWithinWin</function></funcdef> | ||
<paramdef><type>geometry </type> <parameter>g</parameter></paramdef> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<paramdef><type>geometry </type> <parameter>g</parameter></paramdef> | |
<paramdef><type>geometry winset </type> <parameter>geom</parameter></paramdef> |
Same as mentioned in other
NEWS
Outdated
@@ -15,6 +15,7 @@ xxxx/xx/xx | |||
- #5286, RenameTopoGeometryColumn (Sandro Santilli) | |||
- GH703, Add min/max resampling as options (Christian Schroeder) | |||
- #5336, topogeometry cast to topoelement support (Regina Obe) | |||
- New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey) | |
- GH721, New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more changes - the availability should be 3.4.0 not 3.4
postgis/postgis.sql.in
Outdated
LANGUAGE 'c' IMMUTABLE STRICT WINDOW PARALLEL SAFE | ||
_COST_HIGH; | ||
|
||
-- Availability: 3.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- Availability: 3.4 | |
-- Availability: 3.4.0 |
<refsection> | ||
<title>Description</title> | ||
|
||
<para>ST_ClusterWithinWin is a window function that returns an integer for each input geometry, where the integer the cluster number the geometry is a member of. Clusters represent a set of input geometries separated by no more than the specified distance. (Distances are Cartesian distances in the units of the SRID.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that this function is equivalent to ST_ClusterDBSCAN(eps, minPoints:=0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they fully equivalent? In behaviour as well as performance? If so, perhaps only one is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ST_ClusterWithin
just calls DBSCAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's equivalent to ST_ClusterDBSCan, then yah we should not bother adding it. One less function and then just add an addendum to the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that this function is equivalent to
ST_ClusterDBSCAN(eps, minPoints:=0)
?
Is there a reason you didn't set minPoints as default arg with value 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, seems to me that there could be a faster implementation for ClusterWithin (since unlike DBSCAN it is a deterministic function which doesn't have to be interative). So providing an specific API for it will allow a future optimized implementation, and lets users be specific about what they want to compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DBSCAN with minPoints=0 is deterministic. I'm not sure what you would propose changing about it? I have no opinion on adding a new function or not, I just thought it would be good to point out the equivalency in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's already special-cased, if that's what you're getting at)
postgis/liblwgeom/lwgeom_geos_cluster.c
Lines 514 to 520 in 1a004f4
int union_dbscan(LWGEOM** geoms, uint32_t num_geoms, UNIONFIND* uf, double eps, uint32_t min_points, char** in_a_cluster_ret) | |
{ | |
if (min_points <= 1) | |
return union_dbscan_minpoints_1(geoms, num_geoms, uf, eps, in_a_cluster_ret); | |
else | |
return union_dbscan_general(geoms, num_geoms, uf, eps, min_points, in_a_cluster_ret); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already special-cased
Well, that's a good optimization. And seems like it might actually be the optimal implementation for Cluster-Within? (A bit hard to tell, since the generality makes the algorithm somewhat obscure).
Anyway, the documentation for ST_ClusterDBSCAN
has a complex description which does not make it easy to deduce that it is also useful for a deterministic Cluster-Within functionality. So having a separate function and doc page seems ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my suggestion was only to document the equivalence.
</para> | ||
|
||
<para>Availability: 3.4.0</para> | ||
</refsection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
</refsection> | |
<para>&curve_support;</para> | |
</refsection> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the same token should ClusterIntersecting have this marker? I'm not sure if it means "native arc support" (like distance and area have) or "stroked arc support" (like all the GEOS backed functions have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would interpret it to mean native arc support, otherwise all functions would have arc support, no? (But I agree, the wording isn't clear one way or the other.) In this case ClusterWithin would have native arc support but ClusterIntersecting uses GEOS so it would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DBScan is not currently marked as having arc support. add there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup
<title>Description</title> | ||
|
||
<para>ST_ClusterIntersectingWin is a windowing function that builds inter-connecting clusters of geometries that intersect. It is possible to traverse all geometries in a cluster without leaving the cluster. The return value is the cluster number that the geometry argument participates in, or null for null inputs.</para> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be expected to be equivalent to ST_ClusterDBSCAN(0, 0)
, if not for cases where ST_Distance(a,b) == 0
but ST_Intersects(a, b)
is false.
@pramsey I investigated the error cirrus errors. It's happening when Cirrus CI is testing dumprestore. I run into the same issue if I do:
But works fine
I'm going to flip Cirrus CI to test against extension based install, since that's really what we care about these days anyway. I'll ticket this as a separate issue in postgis_restore.pl |
Was doing script installed dumprestore check which causes a failure in #721 Dumprestore Issue references #5348
Could you please rebase against current master branch, resolving conflicts, to see if it fixes the --dumprestore test ? |
96e718d
to
b741b22
Compare
@pramsey Can we close this out since you've already committed it or is there a reason it is still open? |
Access the functionality of ST_ClusterIntersecting and ST_ClusterWithin, but using a window function instead of getting back a big lump of GeometryCollection.