GiST penalty optimization for point strips #129

Komzpa · 2017-05-10T22:02:54Z

2D improvement for #3753

Inspired by work of @x4m on improving penalty function for cubes.

2D improvement for #3753

Komzpa · 2017-05-10T22:06:16Z

Build failed, but it seems it failed not because of issues in the code but some builder configuration issues.

robe2 · 2017-05-11T13:03:13Z

@Komzpa I have a question, might be more my ignorance since I rarely look at gist code.

If the issue you describe was fixed for cube, why do you not also have a patch for?

gserialized_gist_nd.c

x4m · 2017-05-11T16:19:08Z

I'll drop some info here.

Here is scientific paper on this
https://yadi.sk/i/viaQVGgD3J43DC
@Komzpa 's code slightly differs, but in right direction.
Here's cube commitfest entry
https://commitfest.postgresql.org/10/782/
patch was not accepted, because it was broking compatibility with VAC (not in production since 2005, without IEEE 754 floats). This compatibility(of Postgres) will be broken anyway, due to LLVM usage. Than, may be, I'll try to drop in this patch one more time.
@robe2 indeed, this technique is applicable to N-dimensional case
In the paper we provide statistical proof that this approach is benificial from performance point of view for cube extension. But cube uses Guttman's split (1984), PostGIS uses Korotkov split(2009). Odds are that this patch won't bring that much of performance (x1.5 on searches over randomly inserted data, 3x on data that was inserted in order)

robe2 · 2017-05-11T17:47:30Z

@x4m thanks for the feedback. 1.5 or 3x still seems worthwhile though debating if I should just hold this for 2.4 (not backport to 2.3 since that should be focused more on bug fixes).

So sounds like you think there is still worthwhile to have an nd fix too. @Komzpa you thinking of adding to this an nd? I'll test with this 2d first to make sure no surprises.

robe2 · 2017-05-11T18:05:37Z

@Komzpa I'll try to fix travis's build issues. In meantime, can you check on this issue.

I get this when trying to compile against trunk.

`
gserialized_gist_2d.c:1:0: warning: -fPIC ignored for target (all code is position independent)
/**********************************************************************
^
gserialized_gist_2d.c: In function 'gserialized_gist_penalty_2d':
gserialized_gist_2d.c:1353:4: error: 'edge_union' undeclared (first use in this function)
edge_union = box2df_union_edge(gbox_index_orig, gbox_index_new);
^
gserialized_gist_2d.c:1353:4: note: each undeclared identifier is reported only once for each function it appears in
gserialized_gist_2d.c:1354:4: error: 'edge_orig' undeclared (first use in this function)
edge_orig = box2df_edge(gbox_index_orig);
^
: recipe for target 'gserialized_gist_2d.o' failed
make[1]: *** [gserialized_gist_2d.o] Error 1
make[1]: Leaving directory '/projects/postgis/branches/2.4/postgis'
GNUmakefile:16: recipe for target 'all' failed
make: *** [all] Error 1

`

x4m · 2017-05-11T18:21:33Z

@robe2 1.5x and 3x was an improvement with old Guttman split algorithm. Since PostGIS is using one of the most advanced split algorithms, that split algorithm may be fixing data issues being solved by this patch.
If you can benchmark I'd recommend something like this

create extension if not exists cube;

begin transaction;
SELECT setseed(.43);

create table dataTable as select cube(array[x/1000,y/1000]) c from generate_series(1,1e3,1)x, generate_series(1,1e3,1) y;

\timing
create index idx on dataTable using gist(c);

select q,(select count(*) from dataTable dt where dt.c<@q) from (select cube(array[x,y],array[x+0.1,y+0.1]) q from (select random() x,random() y from generate_series(1,1e4,1) s0) s1) s2 ;

select pg_size_pretty(pg_relation_size('idx'));

drop table datatable;

Replace cube with appropriate PostGIS type. Here we create grid of 1M points and do 10K searches in this grid. If search time is not reduced significantly, may be this trick is not usefull.

robe2 · 2017-05-11T18:29:00Z

@Komzpa after the compile error patch, things look good. Regress still passing. I'll try to commit later perhaps after I have given @x4m example a try.

Komzpa · 2017-05-11T18:46:43Z

Here's postgis version of @x4m 's code:

create extension if not exists postgis;

begin transaction;
SELECT setseed(.43);

create table dataTable as select ST_MakePoint(x/1000,y/1000) c from generate_series(1,1e3,1)x, generate_series(1,1e3,1) y;

\timing on
create index idx on dataTable using gist(c);

select q,(select count(*) from dataTable dt where dt.c && q) from (select ST_Expand(ST_MakePoint(x,y), 0.1) q from (select random() x, random() y from generate_series(1,1e4,1) s0) s1) s2 ;

select pg_size_pretty(pg_relation_size('idx'));

drop table datatable;

Komzpa · 2017-05-11T19:53:55Z

@robe2 ND committed, Travis builds successfully.

robe2 · 2017-05-11T23:47:04Z

@Komzpa I committed for trunk. I left the ticket open until I've done some speed testing. https://trac.osgeo.org/postgis/ticket/3753

Unfornately seems our github mirroring is broken at moment too.

robe2 · 2017-05-14T09:37:23Z

This trick doesn't seem useful based on my cursory tests. The index building took longer, size of index is the same, and no gain in performance.

I tested PostGIS 2.3.2 on 9.6 64-bit window 7 and got these timings using @Komzpa tests and then repeated using PostGIS 2.4.0dev that has this patch. The index size are the same and don't see much of a difference in speed, but 2.4 seems slightly worse. Hard to tell absolutely with random data and I won't rule out cause is other changes in 2.4 code base, though I don't think we've touched the gist section except for this.

Time 1: index build
Time 2: query
Time 3: index size check
Time 4: drop table

`PostGIS 2.3 test results:

First run:

Time: 14299.226 ms
Time: 63453.484 ms
Time: 12.544 ms
Time: 4.672 ms

Second run:

Time: 12983.263 ms
Time: 62485.045 ms
Time: 0.454 ms
Time: 7.881 ms

Third run:

Time: 13205.724 ms
Time: 63193.318 ms
Time: 3.722 ms
Time: 8.986 ms
Time: 62.018 ms

`

PostGIS 2.4 results:

`
First run:

Time: 18914.431 ms
Time: 64785.811 ms
Time: 7.872 ms
Time: 7.698 ms

Second run:

Time: 14776.264 ms
Time: 64393.529 ms
Time: 7.124 ms
Time: 6.232 ms

Third run:

Time: 14909.494 ms
Time: 64252.608 ms
Time: 11.676 ms
Time: 6.020 ms

`

@Komzpa did you get a chance to test this?

Komzpa · 2017-05-15T08:45:44Z

For visualization of what happens in index:

Before patch - lines overlap heavily - on the red line there are 4 overlapping lines:

After patch - no overlapping lines, you can see gaps between them:

robe2 · 2017-05-15T13:54:20Z

I'm reopening this since though I committed already seems to need further analysis and changes. @pramsey can you take this one over. Also see @x4m notes on the ticket - https://trac.osgeo.org/postgis/ticket/3753

robe2 · 2017-08-10T21:27:05Z

@Komzpa I'm going to discuss this one with @pramsey at the FOSS4G code sprint on Saturday. feel free to join us on IRC.

x4m · 2017-08-16T15:52:15Z

Hi! Were there any decisions made?

Komzpa · 2017-08-17T10:31:09Z

@x4m there were no discussions in IRC that I can see.

robe2 · 2017-08-20T14:06:03Z

We discussed at the sprint. @pramsey said he'd check it out further.

robe2 · 2017-09-05T19:29:28Z

I'm closing this out per x4m comment on - https://trac.osgeo.org/postgis/ticket/3753

that what has been committed is the best that can be done for now.

GiST penalty optimization for point strips

a1783ad

2D improvement for #3753

Komzpa mentioned this pull request May 10, 2017

Option to skip clustering/sorting on ST_GeoHash osm2pgsql-dev/osm2pgsql#757

Closed

Fix compile errors

c0eb3fe

Komzpa added 2 commits May 11, 2017 22:20

ND GiST penalty

52c8586

Update .travis.yml

093e7d3

robe2 closed this May 11, 2017

robe2 reopened this May 15, 2017

Merge branch 'svn-trunk' into patch-3

24950ba

robe2 requested a review from pramsey August 20, 2017 15:48

robe2 closed this Sep 5, 2017

Komzpa deleted the patch-3 branch October 25, 2018 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GiST penalty optimization for point strips #129

GiST penalty optimization for point strips #129

Komzpa commented May 10, 2017

Komzpa commented May 10, 2017

robe2 commented May 11, 2017

x4m commented May 11, 2017

robe2 commented May 11, 2017

robe2 commented May 11, 2017

x4m commented May 11, 2017

robe2 commented May 11, 2017

Komzpa commented May 11, 2017

Komzpa commented May 11, 2017

robe2 commented May 11, 2017

robe2 commented May 14, 2017

Komzpa commented May 15, 2017

robe2 commented May 15, 2017

robe2 commented Aug 10, 2017

x4m commented Aug 16, 2017

Komzpa commented Aug 17, 2017

robe2 commented Aug 20, 2017

robe2 commented Sep 5, 2017

GiST penalty optimization for point strips #129

GiST penalty optimization for point strips #129

Conversation

Komzpa commented May 10, 2017

Komzpa commented May 10, 2017

robe2 commented May 11, 2017

x4m commented May 11, 2017

robe2 commented May 11, 2017

robe2 commented May 11, 2017

x4m commented May 11, 2017

robe2 commented May 11, 2017

Komzpa commented May 11, 2017

Komzpa commented May 11, 2017

robe2 commented May 11, 2017

robe2 commented May 14, 2017

Komzpa commented May 15, 2017

robe2 commented May 15, 2017

robe2 commented Aug 10, 2017

x4m commented Aug 16, 2017

Komzpa commented Aug 17, 2017

robe2 commented Aug 20, 2017

robe2 commented Sep 5, 2017