-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sp-gist index for BOX3D #251
Conversation
postgis/postgis.sql.in
Outdated
COMMUTATOR = <-> | ||
); | ||
|
||
CREATE FUNCTION temporal_left(box3D, box3D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporal -> box3d?
Hi, We've fixed all the build issues on the CI but are left with a problem in the regression tests which we're not sure how to fix. This is the problematic query (in
Adding an explicit cast to the_geom doesn't fix it. We could change the name of our && operator, but that is obviously not ideal. Any ideas on how to fix this would be very much appreciated :-) Thanks, EDIT: actually it seems to work when explicitly casting the_geom to box3d (I was trying geometry :-) |
Pourquoi tu n'essaies pas d'enlever le cast implicit, c-a-d, remplacer dans
le fichier postgis.sql.in
CREATE CAST (box3d AS geometry) WITH FUNCTION geometry(box3d) AS IMPLICIT;
par
CREATE CAST (box3d AS geometry) WITH FUNCTION geometry(box3d);
et lance les regression tests. Mon intuition est que cela devrait marcher.
Je ne vois pas d'autre solution pour le moment
…On Thu, May 24, 2018 at 11:41 AM, alesuiss ***@***.***> wrote:
Hi,
We've fixed all the build issues on the CI but are left with a problem in
the regression tests which we're not sure how to fix. This is the
problematic query (in regress/regress_index.sql):
select num,ST_astext(the_geom) from test where the_geom && 'BOX3D(125
125,135 135)'::box3d order by num;
ERROR: operator is not unique: geometry && box3d
Adding an explicit cast to the_geom doesn't fix it. We could change the
name of our && operator, but that is obviously not ideal.
Any ideas on how to fix this would be very much appreciated :-)
Thanks,
Arthur
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#251 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AZp2BpJYHIbmTZpF1I185ATlaUAmLbznks5t1oBYgaJpZM4UKq2o>
.
--
------------------------------------------------------------
Prof. Esteban Zimanyi
Department of Computer & Decision Engineering (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezimanyi@ulb.ac.be
Internet: http://code.ulb.ac.be/
------------------------------------------------------------
|
Now we get a weird error:
I don't see how this could be from our changes, so is this a bug in postgis' test system? |
@alesuiss such error means the issue is reported one test run above. for your case:
|
@Komzpa oops! Thanks, we're a little new to this :-) I still don't understand the error though. I've added our SQL declarations to postgis/postgis.sql.in, did we need to do anything else? EDIT: never mind, I'm just stupid :-) |
postgis/spgist_box3d.h
Outdated
#define RTOverFrontStrategyNumber 28 /* for &</ */ | ||
#define RTFrontStrategyNumber 29 /* for <</ */ | ||
#define RTBackStrategyNumber 30 /* for />> */ | ||
#define RTOverBackStrategyNumber 31 /* for /&> */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename these to use a different prefix than "RT" which to me expands to "r-tree".
AS 'MODULE_PATHNAME', 'spgist_box3D_octree_leaf_consistent' | ||
LANGUAGE C IMMUTABLE STRICT _PARALLEL; | ||
|
||
-- Availability: 2.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure I want to see all the different operators? My experience has been that nothing but "overlaps", "contains" and "within" are used in practice, and if we define all the operators then we get to carry/maintain them for the rest of time (like we did for the initial gist-rtree binding) even though they aren't used.
@@ -34,14 +34,14 @@ set enable_bitmapscan = off; | |||
set enable_seqscan = on; | |||
|
|||
SELECT 'scan_idx', qnodes('select * from test where the_geom && ST_MakePoint(0,0)'); | |||
select num,ST_astext(the_geom) from test where the_geom && 'BOX3D(125 125,135 135)'::box3d order by num; | |||
select num,ST_astext(the_geom) from test where the_geom::box3d && 'BOX3D(125 125,135 135)'::box3d order by num; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other large scale issue I have w/ this patch is defining the index on box3d rather than geometry. There's a convenience associated w/ having the index defined against geometry, because then people can use it directly w/o casting. Did you do it against box3d because of issues of dealing with large objects? The pg11 support for an spgist compress API hook should allow you to do an index against geometry while using boxes as the keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is to get around the spgist storage issue; the plan was to generalize the index once pg11 is out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant released (we're currently basing our work on the last released version).
Should we work with pg11 instead for this patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is not too much trouble for you. :)
Postgres 11 beta1 was released just a couple of days ago: https://www.postgresql.org/about/news/1855/ and is aimed at developers.
Basically, a released non-beta Postgres 11 means a released non-beta PostGIS 2.5, and feature freeze is going to come soon.
I'll see if I can get it running on Travis to make testing easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we can start with only a few operators.
However, please notice that all this pull request originates from working with trajectories, i.e., temporal points and thus we have developed an extension based on PostGIS in which we defined temporal points, as well as temporal floats (e.g., for expressing the distance between moving cars) and temporal booleans (e.g., for determining whether a moving car is located within a province). In that case we used the z dimension for encoding timestamps and thus the operators such as <</ and />>correspond to before and after in time and these are heavily used in that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Did you know that there is some trajectory work in PostGIS already, encoding the timestamps on the M?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes of course, but didn't scale up well, in particular because time is ordered by definition while M dimension is not and therefore we cannot leverage on that for having a much faster implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, please notice that the bug I reported
https://trac.osgeo.org/postgis/ticket/4088#ticket
has not been corrected in the beta version of postgis 2.5.0
I am also aiming to develop spgist 2D indexes on geometries using boxd2f bounding boxes. However, the fact that the over* (e.g. overleft) operators have erroneously defined a commutative operator prevents me to have correct results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have the generalized index ready soon. Should we use a new branch & new pull request for this?
* This module provides SP-GiST implementation for boxes using oct tree | ||
* analogy in 4-dimensional space. SP-GiST doesn't allow indexing of | ||
* overlapping objects. We are making 3D objects never-overlapping in | ||
* 6D space. This technique has some benefits compared to traditional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's nice from a completionist point-of-view to support 3D inputs, the reality of general use is that most indexes are against 2D objects. How does this index degrade if I feed it 2D objects? Is it happy? Also, note that most 3D objects have a very narrow range in Z compared to the range in X/Y -- is the index happy in that case? Is there any argument for having a 2D opclass be the default, with an optional 3D opclass for people who really have that problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the behaviour for indexing points (which end up represented by boxes that have zero-length edges? (which end up represented by boxes that have zero-length edges?) This is another common use case. Does it need special handling for good performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a quick test with 1e4 3D points. For higher numbers of points the scale up is much bigger.
CREATE OR REPLACE FUNCTION random_float(low float, high float)
RETURNS float AS $$
BEGIN
RETURN random()* (high-low + 1) + low;
END;
$$ LANGUAGE 'plpgsql' STRICT;
CREATE OR REPLACE FUNCTION random_geompoint3D(lowx float, highx float,
lowy float, highy float, lowz float, highz float)
RETURNS geometry AS $$
BEGIN
RETURN st_makepoint(random_float(lowx, highx), random_float(lowy, highy),
random_float(lowz, highz));
END;
$$ LANGUAGE 'plpgsql' STRICT;
drop table if exists tbl_point3Dboxes;
create table tbl_point3Dboxes as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000)::box3D as b
from generate_series(1,1e4) k;
select * from tbl_point3Dboxes limit 3;
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 10000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
"Aggregate (cost=1027824.45..1027824.46 rows=1 width=8) (actual time=15421.243..15421.243 rows=1 loops=1)"
" -> Nested Loop (cost=0.00..1010981.64 rows=6737126 width=0) (actual time=0.022..15419.443 rows=10000 loops=1)"
" Join Filter: (t1.b && t2.b)"
" Rows Removed by Join Filter: 99990000"
" -> Seq Scan on tbl_point3dboxes t1 (cost=0.00..196.08 rows=8208 width=52) (actual time=0.011..1.988 rows=10000 loops=1)"
" -> Materialize (cost=0.00..237.12 rows=8208 width=52) (actual time=0.000..0.507 rows=10000 loops=10000)"
" -> Seq Scan on tbl_point3dboxes t2 (cost=0.00..196.08 rows=8208 width=52) (actual time=0.004..1.683 rows=10000 loops=1)"
"Planning Time: 0.067 ms"
"Execution Time: 15421.441 ms"
create index tbl_point3Dboxes_spgist_idx on tbl_point3Dboxes using spgist(b);
-- Query returned successfully with no result in 94 msec.
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 10000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
"Aggregate (cost=303928.00..303928.01 rows=1 width=8) (actual time=183.226..183.226 rows=1 loops=1)"
" -> Nested Loop (cost=0.28..278928.00 rows=10000000 width=0) (actual time=0.049..181.660 rows=10000 loops=1)"
" -> Seq Scan on tbl_point3dboxes t1 (cost=0.00..214.00 rows=10000 width=52) (actual time=0.011..1.343 rows=10000 loops=1)"
" -> Index Only Scan using tbl_point3dboxes_spgist_idx on tbl_point3dboxes t2 (cost=0.28..17.87 rows=1000 width=52) (actual time=0.017..0.017 rows=1 loops=10000)"
" Index Cond: (b && t1.b)"
" Heap Fetches: 10000"
"Planning Time: 0.109 ms"
"Execution Time: 183.267 ms"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same test with 1e6 points
drop table if exists tbl_point3Dboxes;
create table tbl_point3Dboxes as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000)::box3D as b
from generate_series(1,1e6) k;
-- Query returned successfully: 1000000 rows affected, 8.3 secs execution time.
select * from tbl_point3Dboxes limit 3;
create index tbl_point3Dboxes_spgist_idx on tbl_point3Dboxes using spgist(b);
-- Query returned successfully with no result in 8.8 secs.
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 1000000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
"Finalize Aggregate (cost=1250227629.38..1250227629.39 rows=1 width=8) (actual time=15313.482..15313.482 rows=1 loops=1)"
" -> Gather (cost=1250227629.17..1250227629.38 rows=2 width=8) (actual time=15313.474..15313.476 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1250226629.17..1250226629.18 rows=1 width=8) (actual time=15240.598..15240.598 rows=1 loops=3)"
" -> Nested Loop (cost=0.41..1146059962.50 rows=41666666667 width=0) (actual time=0.211..15154.174 rows=333333 loops=3)"
" -> Parallel Seq Scan on tbl_point3dboxes t1 (cost=0.00..15530.67 rows=416667 width=52) (actual time=0.038..124.000 rows=333333 loops=3)"
" -> Index Only Scan using tbl_point3dboxes_spgist_idx on tbl_point3dboxes t2 (cost=0.41..1750.50 rows=100000 width=52) (actual time=0.044..0.044 rows=1 loops=1000000)"
" Index Cond: (b && t1.b)"
" Heap Fetches: 1000000"
"Planning Time: 0.199 ms"
"Execution Time: 15349.058 ms"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally equivalent test with 1e6 points using gist_geometry_ops_nd operator class
drop table if exists tbl_point3Dgeoms;
create table tbl_point3Dgeoms as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000) as p
from generate_series(1,1e6) k;
select * from tbl_point3Dgeoms limit 3;
create index tbl_point3Dgeoms_spgist_idx on tbl_point3Dgeoms using gist(p gist_geometry_ops_nd);
-- Query returned successfully with no result in 22.0 secs.
select count() from tbl_point3Dgeoms t1, tbl_point3Dgeoms t2 where t1.p &&& t2.p
-- 1000000
explain analyze
select count() from tbl_point3Dgeoms t1, tbl_point3Dgeoms t2 where t1.p &&& t2.p
"Finalize Aggregate (cost=1381495.95..1381495.96 rows=1 width=8) (actual time=340992.040..340992.040 rows=1 loops=1)"
" -> Gather (cost=1381495.73..1381495.94 rows=2 width=8) (actual time=340990.711..340992.032 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1380495.73..1380495.74 rows=1 width=8) (actual time=340973.307..340973.307 rows=1 loops=3)"
" -> Nested Loop (cost=0.29..1308292.04 rows=28881479 width=0) (actual time=1.255..340868.353 rows=333333 loops=3)"
" -> Parallel Seq Scan on tbl_point3dgeoms t1 (cost=0.00..13512.67 rows=416667 width=40) (actual time=0.049..169.917 rows=333333 loops=3)"
" -> Index Scan using tbl_point3dgeoms_spgist_idx on tbl_point3dgeoms t2 (cost=0.29..2.11 rows=100 width=40) (actual time=0.731..1.021 rows=1 loops=1000000)"
" Index Cond: (t1.p &&& p)"
"Planning Time: 14.149 ms"
"Execution Time: 340994.130 ms"
bool | ||
BOX3D_contains_internal(BOX3D *box1, BOX3D *box2) | ||
{ | ||
return (FPge(box1->xmax, box2->xmax) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure FPge is a fit here. A box that is unbuffered by 9E-7 will be containing the original one? It's also strange that there's 1e-8 SQLMM epsilon, and FPge uses a 100 times bigger one.
#define EPSILON 1.0E-06
#define FPge(A,B) ((B) - (A) <= EPSILON)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we could use float8_cmp_internal instead of those FP** comparisons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be just >=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference with float8_cmp_internal is that it takes care of NaN. Please tell me which is better.
We have finished implementing the sp-gist index on geometry, for both 2D and 3D; we have pushed our changes to a new branch. The index seems to work well, but of course it doesn't currently build on Travis. Kindly let us know what to do next. |
Geometry version of this PR was merged, closing this one. |
Implementation of the SP-GIST indexes for the BOX3D type provided by PostGIS. It implements a 6-dimensional oct-tree over 3D boxes that generalizes and improves the 4-dimensional quad-tree over 2D boxes provided by PostgreSQL in the file geo_spgist.c.