sp-gist index for BOX3D #251

estebanzimanyi · 2018-05-23T15:19:59Z

Implementation of the SP-GIST indexes for the BOX3D type provided by PostGIS. It implements a 6-dimensional oct-tree over 3D boxes that generalizes and improves the 4-dimensional quad-tree over 2D boxes provided by PostgreSQL in the file geo_spgist.c.

Komzpa · 2018-05-23T15:34:27Z

postgis/postgis.sql.in

+	COMMUTATOR = <->
+);
+
+CREATE FUNCTION temporal_left(box3D, box3D)


temporal -> box3d?

…into spgist-box3d

alesuiss · 2018-05-24T09:41:41Z

Hi,

We've fixed all the build issues on the CI but are left with a problem in the regression tests which we're not sure how to fix. This is the problematic query (in regress/regress_index.sql):

select num,ST_astext(the_geom) from test where the_geom && 'BOX3D(125 125,135 135)'::box3d order by num;
ERROR: operator is not unique: geometry && box3d

Adding an explicit cast to the_geom doesn't fix it. We could change the name of our && operator, but that is obviously not ideal.

Any ideas on how to fix this would be very much appreciated :-)

Thanks,
Arthur

EDIT: actually it seems to work when explicitly casting the_geom to box3d (I was trying geometry :-)

estebanzimanyi · 2018-05-24T09:46:02Z

Pourquoi tu n'essaies pas d'enlever le cast implicit, c-a-d, remplacer dans le fichier postgis.sql.in CREATE CAST (box3d AS geometry) WITH FUNCTION geometry(box3d) AS IMPLICIT; par CREATE CAST (box3d AS geometry) WITH FUNCTION geometry(box3d); et lance les regression tests. Mon intuition est que cela devrait marcher. Je ne vois pas d'autre solution pour le moment

…

On Thu, May 24, 2018 at 11:41 AM, alesuiss ***@***.***> wrote: Hi, We've fixed all the build issues on the CI but are left with a problem in the regression tests which we're not sure how to fix. This is the problematic query (in regress/regress_index.sql): select num,ST_astext(the_geom) from test where the_geom && 'BOX3D(125 125,135 135)'::box3d order by num; ERROR: operator is not unique: geometry && box3d Adding an explicit cast to the_geom doesn't fix it. We could change the name of our && operator, but that is obviously not ideal. Any ideas on how to fix this would be very much appreciated :-) Thanks, Arthur — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#251 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZp2BpJYHIbmTZpF1I185ATlaUAmLbznks5t1oBYgaJpZM4UKq2o> .

-- ------------------------------------------------------------ Prof. Esteban Zimanyi Department of Computer & Decision Engineering (CoDE) CP 165/15 Universite Libre de Bruxelles Avenue F. D. Roosevelt 50 B-1050 Brussels, Belgium fax: + 32.2.650.47.13 tel: + 32.2.650.31.85 e-mail: ezimanyi@ulb.ac.be Internet: http://code.ulb.ac.be/ ------------------------------------------------------------

alesuiss · 2018-05-24T10:01:31Z

Now we get a weird error:

Checking for shp2pgsql ... found
Checking for pgsql2shp ... found
TMPDIR is /tmp/pgis_reg
Database postgis_reg already exists.
Drop it, or run with the --nocreate flag to use it.
make[2]: *** [check] Error 1

I don't see how this could be from our changes, so is this a bug in postgis' test system?

Komzpa · 2018-05-24T10:08:51Z

@alesuiss such error means the issue is reported one test run above. for your case:

psql:/home/travis/build/postgis/postgis/regress/00-regress-install/share/contrib/postgis/postgis_upgrade.sql:6157: ERROR:  function "box3d_contains" already exists with same argument types

alesuiss · 2018-05-24T10:16:06Z

@Komzpa oops! Thanks, we're a little new to this :-)

I still don't understand the error though. I've added our SQL declarations to postgis/postgis.sql.in, did we need to do anything else?

EDIT: never mind, I'm just stupid :-)

pramsey · 2018-05-30T12:58:28Z

postgis/spgist_box3d.h

+#define RTOverFrontStrategyNumber		28		/* for &</ */
+#define RTFrontStrategyNumber			29		/* for <</ */
+#define RTBackStrategyNumber			30		/* for />> */
+#define RTOverBackStrategyNumber		31		/* for /&> */


Maybe rename these to use a different prefix than "RT" which to me expands to "r-tree".

pramsey · 2018-05-30T12:59:53Z

postgis/postgis.sql.in

+	AS 'MODULE_PATHNAME', 'spgist_box3D_octree_leaf_consistent'
+	LANGUAGE C IMMUTABLE STRICT _PARALLEL;
+
+-- Availability: 2.5.0


I'm not 100% sure I want to see all the different operators? My experience has been that nothing but "overlaps", "contains" and "within" are used in practice, and if we define all the operators then we get to carry/maintain them for the rest of time (like we did for the initial gist-rtree binding) even though they aren't used.

pramsey · 2018-05-30T13:04:19Z

regress/regress_index.sql

@@ -34,14 +34,14 @@ set enable_bitmapscan = off;
 set enable_seqscan = on;

 SELECT 'scan_idx', qnodes('select * from test where the_geom && ST_MakePoint(0,0)');
- select num,ST_astext(the_geom) from test where the_geom && 'BOX3D(125 125,135 135)'::box3d order by num;
+ select num,ST_astext(the_geom) from test where the_geom::box3d && 'BOX3D(125 125,135 135)'::box3d order by num;


The other large scale issue I have w/ this patch is defining the index on box3d rather than geometry. There's a convenience associated w/ having the index defined against geometry, because then people can use it directly w/o casting. Did you do it against box3d because of issues of dealing with large objects? The pg11 support for an spgist compress API hook should allow you to do an index against geometry while using boxes as the keys.

Indeed, this is to get around the spgist storage issue; the plan was to generalize the index once pg11 is out.

@alesuiss Pg11 is out: https://apt.postgresql.org/pub/repos/apt/pool/11/p/postgresql-11/ or https://wiki.postgresql.org/wiki/Apt/FAQ#I_want_to_try_the_beta_version_of_the_next_PostgreSQL_release

I meant released (we're currently basing our work on the last released version).

Should we work with pg11 instead for this patch?

If that is not too much trouble for you. :)
Postgres 11 beta1 was released just a couple of days ago: https://www.postgresql.org/about/news/1855/ and is aimed at developers.

Basically, a released non-beta Postgres 11 means a released non-beta PostGIS 2.5, and feature freeze is going to come soon.

I'll see if I can get it running on Travis to make testing easier.

Indeed we can start with only a few operators.

However, please notice that all this pull request originates from working with trajectories, i.e., temporal points and thus we have developed an extension based on PostGIS in which we defined temporal points, as well as temporal floats (e.g., for expressing the distance between moving cars) and temporal booleans (e.g., for determining whether a moving car is located within a province). In that case we used the z dimension for encoding timestamps and thus the operators such as <</ and />>correspond to before and after in time and these are heavily used in that scenario.

Interesting. Did you know that there is some trajectory work in PostGIS already, encoding the timestamps on the M?

Yes of course, but didn't scale up well, in particular because time is ordered by definition while M dimension is not and therefore we cannot leverage on that for having a much faster implementation

BTW, please notice that the bug I reported
https://trac.osgeo.org/postgis/ticket/4088#ticket
has not been corrected in the beta version of postgis 2.5.0
I am also aiming to develop spgist 2D indexes on geometries using boxd2f bounding boxes. However, the fact that the over* (e.g. overleft) operators have erroneously defined a commutative operator prevents me to have correct results.

We should have the generalized index ready soon. Should we use a new branch & new pull request for this?

pramsey · 2018-05-30T13:08:18Z

postgis/spgist_box3d.c

+ * This module provides SP-GiST implementation for boxes using oct tree
+ * analogy in 4-dimensional space.  SP-GiST doesn't allow indexing of
+ * overlapping objects.  We are making 3D objects never-overlapping in
+ * 6D space.  This technique has some benefits compared to traditional


While it's nice from a completionist point-of-view to support 3D inputs, the reality of general use is that most indexes are against 2D objects. How does this index degrade if I feed it 2D objects? Is it happy? Also, note that most 3D objects have a very narrow range in Z compared to the range in X/Y -- is the index happy in that case? Is there any argument for having a 2D opclass be the default, with an optional 3D opclass for people who really have that problem?

What is the behaviour for indexing points (which end up represented by boxes that have zero-length edges? (which end up represented by boxes that have zero-length edges?) This is another common use case. Does it need special handling for good performance?

Here is a quick test with 1e4 3D points. For higher numbers of points the scale up is much bigger.

CREATE OR REPLACE FUNCTION random_float(low float, high float)
RETURNS float AS $$
BEGIN
RETURN random()* (high-low + 1) + low;
END;
$$ LANGUAGE 'plpgsql' STRICT;

CREATE OR REPLACE FUNCTION random_geompoint3D(lowx float, highx float,
lowy float, highy float, lowz float, highz float)
RETURNS geometry AS $$
BEGIN
RETURN st_makepoint(random_float(lowx, highx), random_float(lowy, highy),
random_float(lowz, highz));
END;
$$ LANGUAGE 'plpgsql' STRICT;

drop table if exists tbl_point3Dboxes;
create table tbl_point3Dboxes as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000)::box3D as b
from generate_series(1,1e4) k;

select * from tbl_point3Dboxes limit 3;

select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 10000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b

"Aggregate (cost=1027824.45..1027824.46 rows=1 width=8) (actual time=15421.243..15421.243 rows=1 loops=1)"
" -> Nested Loop (cost=0.00..1010981.64 rows=6737126 width=0) (actual time=0.022..15419.443 rows=10000 loops=1)"
" Join Filter: (t1.b && t2.b)"
" Rows Removed by Join Filter: 99990000"
" -> Seq Scan on tbl_point3dboxes t1 (cost=0.00..196.08 rows=8208 width=52) (actual time=0.011..1.988 rows=10000 loops=1)"
" -> Materialize (cost=0.00..237.12 rows=8208 width=52) (actual time=0.000..0.507 rows=10000 loops=10000)"
" -> Seq Scan on tbl_point3dboxes t2 (cost=0.00..196.08 rows=8208 width=52) (actual time=0.004..1.683 rows=10000 loops=1)"
"Planning Time: 0.067 ms"
"Execution Time: 15421.441 ms"

create index tbl_point3Dboxes_spgist_idx on tbl_point3Dboxes using spgist(b);
-- Query returned successfully with no result in 94 msec.

select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 10000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b

"Aggregate (cost=303928.00..303928.01 rows=1 width=8) (actual time=183.226..183.226 rows=1 loops=1)"
" -> Nested Loop (cost=0.28..278928.00 rows=10000000 width=0) (actual time=0.049..181.660 rows=10000 loops=1)"
" -> Seq Scan on tbl_point3dboxes t1 (cost=0.00..214.00 rows=10000 width=52) (actual time=0.011..1.343 rows=10000 loops=1)"
" -> Index Only Scan using tbl_point3dboxes_spgist_idx on tbl_point3dboxes t2 (cost=0.28..17.87 rows=1000 width=52) (actual time=0.017..0.017 rows=1 loops=10000)"
" Index Cond: (b && t1.b)"
" Heap Fetches: 10000"
"Planning Time: 0.109 ms"
"Execution Time: 183.267 ms"

Same test with 1e6 points

drop table if exists tbl_point3Dboxes;
create table tbl_point3Dboxes as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000)::box3D as b
from generate_series(1,1e6) k;
-- Query returned successfully: 1000000 rows affected, 8.3 secs execution time.

select * from tbl_point3Dboxes limit 3;

create index tbl_point3Dboxes_spgist_idx on tbl_point3Dboxes using spgist(b);
-- Query returned successfully with no result in 8.8 secs.

select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b
-- 1000000
explain analyze
select count() from tbl_point3Dboxes t1, tbl_point3Dboxes t2 where t1.b && t2.b

"Finalize Aggregate (cost=1250227629.38..1250227629.39 rows=1 width=8) (actual time=15313.482..15313.482 rows=1 loops=1)"
" -> Gather (cost=1250227629.17..1250227629.38 rows=2 width=8) (actual time=15313.474..15313.476 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1250226629.17..1250226629.18 rows=1 width=8) (actual time=15240.598..15240.598 rows=1 loops=3)"
" -> Nested Loop (cost=0.41..1146059962.50 rows=41666666667 width=0) (actual time=0.211..15154.174 rows=333333 loops=3)"
" -> Parallel Seq Scan on tbl_point3dboxes t1 (cost=0.00..15530.67 rows=416667 width=52) (actual time=0.038..124.000 rows=333333 loops=3)"
" -> Index Only Scan using tbl_point3dboxes_spgist_idx on tbl_point3dboxes t2 (cost=0.41..1750.50 rows=100000 width=52) (actual time=0.044..0.044 rows=1 loops=1000000)"
" Index Cond: (b && t1.b)"
" Heap Fetches: 1000000"
"Planning Time: 0.199 ms"
"Execution Time: 15349.058 ms"

Finally equivalent test with 1e6 points using gist_geometry_ops_nd operator class

drop table if exists tbl_point3Dgeoms;
create table tbl_point3Dgeoms as
select k, random_geompoint3D(0, 1000, 0, 1000, 0, 1000) as p
from generate_series(1,1e6) k;

select * from tbl_point3Dgeoms limit 3;

create index tbl_point3Dgeoms_spgist_idx on tbl_point3Dgeoms using gist(p gist_geometry_ops_nd);
-- Query returned successfully with no result in 22.0 secs.

select count() from tbl_point3Dgeoms t1, tbl_point3Dgeoms t2 where t1.p &&& t2.p
-- 1000000
explain analyze
select count() from tbl_point3Dgeoms t1, tbl_point3Dgeoms t2 where t1.p &&& t2.p

"Finalize Aggregate (cost=1381495.95..1381495.96 rows=1 width=8) (actual time=340992.040..340992.040 rows=1 loops=1)"
" -> Gather (cost=1381495.73..1381495.94 rows=2 width=8) (actual time=340990.711..340992.032 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=1380495.73..1380495.74 rows=1 width=8) (actual time=340973.307..340973.307 rows=1 loops=3)"
" -> Nested Loop (cost=0.29..1308292.04 rows=28881479 width=0) (actual time=1.255..340868.353 rows=333333 loops=3)"
" -> Parallel Seq Scan on tbl_point3dgeoms t1 (cost=0.00..13512.67 rows=416667 width=40) (actual time=0.049..169.917 rows=333333 loops=3)"
" -> Index Scan using tbl_point3dgeoms_spgist_idx on tbl_point3dgeoms t2 (cost=0.29..2.11 rows=100 width=40) (actual time=0.731..1.021 rows=1 loops=1000000)"
" Index Cond: (t1.p &&& p)"
"Planning Time: 14.149 ms"
"Execution Time: 340994.130 ms"

Komzpa · 2018-06-03T12:06:27Z

postgis/lwgeom_box3d.c

+bool
+BOX3D_contains_internal(BOX3D *box1, BOX3D *box2)
+{
+	return (FPge(box1->xmax, box2->xmax) &&


I'm not quite sure FPge is a fit here. A box that is unbuffered by 9E-7 will be containing the original one? It's also strange that there's 1e-8 SQLMM epsilon, and FPge uses a 100 times bigger one.

#define EPSILON 1.0E-06 #define FPge(A,B) ((B) - (A) <= EPSILON)

Indeed, we could use float8_cmp_internal instead of those FP** comparisons

can it be just >= ?

The difference with float8_cmp_internal is that it takes care of NaN. Please tell me which is better.

alesuiss · 2018-06-07T12:00:25Z

We have finished implementing the sp-gist index on geometry, for both 2D and 3D; we have pushed our changes to a new branch. The index seems to work well, but of course it doesn't currently build on Travis.

Kindly let us know what to do next.

Komzpa · 2018-06-07T12:08:19Z

Hi @alesuiss,

Only PRs are being built on travis automatically - I've made a new PR for you (don't be afraid to make them too! :) )

#258

Komzpa · 2018-07-10T11:28:11Z

Geometry version of this PR was merged, closing this one.

sp-gist index for BOX3D

82dbeb4

Komzpa reviewed May 23, 2018

View reviewed changes

postgis/postgis.sql.in Outdated

COMMUTATOR = <->

);

CREATE FUNCTION temporal_left(box3D, box3D)

Copy link

Member

Komzpa May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporal -> box3d?

estebanzimanyi and others added 5 commits May 23, 2018 17:47

name fix

0d98149

comment cleanup

f33ef02

build fix

b4201a5

Merge branch 'spgist-box3d' of https://github.com/ULB-CoDE-WIT/postgis …

a74317a

…into spgist-box3d

dead code cleanup

ea3b5f6

fixed regression tests :-p

23c5cbe

estebanzimanyi and others added 5 commits May 24, 2018 14:14

sql fix

67d0c21

spgist-box3d regression tests

c52e000

name fix

f269422

regression cleanup code

e20159a

name fix

8103e16

pramsey reviewed May 30, 2018

View reviewed changes

name fix

6640c3b

Komzpa reviewed Jun 3, 2018

View reviewed changes

Komzpa mentioned this pull request Jun 7, 2018

SP-GiST index on geometry #258

Closed

Komzpa closed this Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sp-gist index for BOX3D #251

sp-gist index for BOX3D #251

estebanzimanyi commented May 23, 2018

Komzpa May 23, 2018

alesuiss commented May 24, 2018 •

edited

Loading

estebanzimanyi commented May 24, 2018 via email

alesuiss commented May 24, 2018

Komzpa commented May 24, 2018

alesuiss commented May 24, 2018 •

edited

Loading

pramsey May 30, 2018

pramsey May 30, 2018

pramsey May 30, 2018

alesuiss May 30, 2018

Komzpa May 30, 2018

alesuiss May 30, 2018

Komzpa May 30, 2018

estebanzimanyi Jun 5, 2018 •

edited

Loading

pramsey Jun 5, 2018

estebanzimanyi Jun 5, 2018

estebanzimanyi Jun 6, 2018

alesuiss Jun 6, 2018

pramsey May 30, 2018 •

edited

Loading

pramsey Jun 5, 2018 •

edited

Loading

estebanzimanyi Jun 5, 2018

estebanzimanyi Jun 5, 2018

estebanzimanyi Jun 5, 2018

Komzpa Jun 3, 2018

estebanzimanyi Jun 5, 2018 •

edited

Loading

Komzpa Jun 5, 2018

estebanzimanyi Jun 5, 2018

alesuiss commented Jun 7, 2018

Komzpa commented Jun 7, 2018

Komzpa commented Jul 10, 2018

sp-gist index for BOX3D #251

sp-gist index for BOX3D #251

Conversation

estebanzimanyi commented May 23, 2018

Choose a reason for hiding this comment

alesuiss commented May 24, 2018 • edited Loading

estebanzimanyi commented May 24, 2018 via email

alesuiss commented May 24, 2018

Komzpa commented May 24, 2018

alesuiss commented May 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estebanzimanyi Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pramsey May 30, 2018 • edited Loading

Choose a reason for hiding this comment

pramsey Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

estebanzimanyi Jun 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alesuiss commented Jun 7, 2018

Komzpa commented Jun 7, 2018

Komzpa commented Jul 10, 2018

alesuiss commented May 24, 2018 •

edited

Loading

alesuiss commented May 24, 2018 •

edited

Loading

estebanzimanyi Jun 5, 2018 •

edited

Loading

pramsey May 30, 2018 •

edited

Loading

pramsey Jun 5, 2018 •

edited

Loading

estebanzimanyi Jun 5, 2018 •

edited

Loading