Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GiST penalty optimization for point strips #129

Closed
wants to merge 5 commits into from

Conversation

Komzpa
Copy link
Member

@Komzpa Komzpa commented May 10, 2017

2D improvement for #3753

Inspired by work of @x4m on improving penalty function for cubes.

@Komzpa
Copy link
Member Author

Komzpa commented May 10, 2017

Build failed, but it seems it failed not because of issues in the code but some builder configuration issues.

@robe2
Copy link
Member

robe2 commented May 11, 2017

@Komzpa I have a question, might be more my ignorance since I rarely look at gist code.

If the issue you describe was fixed for cube, why do you not also have a patch for?

gserialized_gist_nd.c

@x4m
Copy link

x4m commented May 11, 2017

I'll drop some info here.

  1. Here is scientific paper on this
    https://yadi.sk/i/viaQVGgD3J43DC
    @Komzpa 's code slightly differs, but in right direction.
  2. Here's cube commitfest entry
    https://commitfest.postgresql.org/10/782/
    patch was not accepted, because it was broking compatibility with VAC (not in production since 2005, without IEEE 754 floats). This compatibility(of Postgres) will be broken anyway, due to LLVM usage. Than, may be, I'll try to drop in this patch one more time.
  3. @robe2 indeed, this technique is applicable to N-dimensional case
  4. In the paper we provide statistical proof that this approach is benificial from performance point of view for cube extension. But cube uses Guttman's split (1984), PostGIS uses Korotkov split(2009). Odds are that this patch won't bring that much of performance (x1.5 on searches over randomly inserted data, 3x on data that was inserted in order)

@robe2
Copy link
Member

robe2 commented May 11, 2017

@x4m thanks for the feedback. 1.5 or 3x still seems worthwhile though debating if I should just hold this for 2.4 (not backport to 2.3 since that should be focused more on bug fixes).

So sounds like you think there is still worthwhile to have an nd fix too. @Komzpa you thinking of adding to this an nd? I'll test with this 2d first to make sure no surprises.

@robe2
Copy link
Member

robe2 commented May 11, 2017

@Komzpa I'll try to fix travis's build issues. In meantime, can you check on this issue.

I get this when trying to compile against trunk.

`
gserialized_gist_2d.c:1:0: warning: -fPIC ignored for target (all code is position independent)
/**********************************************************************
^
gserialized_gist_2d.c: In function 'gserialized_gist_penalty_2d':
gserialized_gist_2d.c:1353:4: error: 'edge_union' undeclared (first use in this function)
edge_union = box2df_union_edge(gbox_index_orig, gbox_index_new);
^
gserialized_gist_2d.c:1353:4: note: each undeclared identifier is reported only once for each function it appears in
gserialized_gist_2d.c:1354:4: error: 'edge_orig' undeclared (first use in this function)
edge_orig = box2df_edge(gbox_index_orig);
^
: recipe for target 'gserialized_gist_2d.o' failed
make[1]: *** [gserialized_gist_2d.o] Error 1
make[1]: Leaving directory '/projects/postgis/branches/2.4/postgis'
GNUmakefile:16: recipe for target 'all' failed
make: *** [all] Error 1

`

@x4m
Copy link

x4m commented May 11, 2017

@robe2 1.5x and 3x was an improvement with old Guttman split algorithm. Since PostGIS is using one of the most advanced split algorithms, that split algorithm may be fixing data issues being solved by this patch.
If you can benchmark I'd recommend something like this

create extension if not exists cube;

begin transaction;
SELECT setseed(.43);

create table dataTable as select cube(array[x/1000,y/1000]) c from generate_series(1,1e3,1)x, generate_series(1,1e3,1) y;

\timing
create index idx on dataTable using gist(c);

select q,(select count(*) from dataTable dt where dt.c<@q) from (select cube(array[x,y],array[x+0.1,y+0.1]) q from (select random() x,random() y from generate_series(1,1e4,1) s0) s1) s2 ;

select pg_size_pretty(pg_relation_size('idx'));

drop table datatable;

Replace cube with appropriate PostGIS type. Here we create grid of 1M points and do 10K searches in this grid. If search time is not reduced significantly, may be this trick is not usefull.

@robe2
Copy link
Member

robe2 commented May 11, 2017

@Komzpa after the compile error patch, things look good. Regress still passing. I'll try to commit later perhaps after I have given @x4m example a try.

@Komzpa
Copy link
Member Author

Komzpa commented May 11, 2017

Here's postgis version of @x4m 's code:

create extension if not exists postgis;

begin transaction;
SELECT setseed(.43);

create table dataTable as select ST_MakePoint(x/1000,y/1000) c from generate_series(1,1e3,1)x, generate_series(1,1e3,1) y;

\timing on
create index idx on dataTable using gist(c);

select q,(select count(*) from dataTable dt where dt.c && q) from (select ST_Expand(ST_MakePoint(x,y), 0.1) q from (select random() x, random() y from generate_series(1,1e4,1) s0) s1) s2 ;

select pg_size_pretty(pg_relation_size('idx'));

drop table datatable;

@Komzpa
Copy link
Member Author

Komzpa commented May 11, 2017

@robe2 ND committed, Travis builds successfully.

@robe2
Copy link
Member

robe2 commented May 11, 2017

@Komzpa I committed for trunk. I left the ticket open until I've done some speed testing. https://trac.osgeo.org/postgis/ticket/3753

Unfornately seems our github mirroring is broken at moment too.

@robe2 robe2 closed this May 11, 2017
@robe2
Copy link
Member

robe2 commented May 14, 2017

This trick doesn't seem useful based on my cursory tests. The index building took longer, size of index is the same, and no gain in performance.

I tested PostGIS 2.3.2 on 9.6 64-bit window 7 and got these timings using @Komzpa tests and then repeated using PostGIS 2.4.0dev that has this patch. The index size are the same and don't see much of a difference in speed, but 2.4 seems slightly worse. Hard to tell absolutely with random data and I won't rule out cause is other changes in 2.4 code base, though I don't think we've touched the gist section except for this.

Time 1: index build
Time 2: query
Time 3: index size check
Time 4: drop table

`PostGIS 2.3 test results:

First run:

Time: 14299.226 ms
Time: 63453.484 ms
Time: 12.544 ms
Time: 4.672 ms

Second run:

Time: 12983.263 ms
Time: 62485.045 ms
Time: 0.454 ms
Time: 7.881 ms

Third run:

Time: 13205.724 ms
Time: 63193.318 ms
Time: 3.722 ms
Time: 8.986 ms
Time: 62.018 ms

`

PostGIS 2.4 results:

`
First run:

Time: 18914.431 ms
Time: 64785.811 ms
Time: 7.872 ms
Time: 7.698 ms

Second run:

Time: 14776.264 ms
Time: 64393.529 ms
Time: 7.124 ms
Time: 6.232 ms

Third run:

Time: 14909.494 ms
Time: 64252.608 ms
Time: 11.676 ms
Time: 6.020 ms

`

@Komzpa did you get a chance to test this?

@Komzpa
Copy link
Member Author

Komzpa commented May 15, 2017

For visualization of what happens in index:

Before patch - lines overlap heavily - on the red line there are 4 overlapping lines:

image

After patch - no overlapping lines, you can see gaps between them:
image

@robe2
Copy link
Member

robe2 commented May 15, 2017

I'm reopening this since though I committed already seems to need further analysis and changes. @pramsey can you take this one over. Also see @x4m notes on the ticket - https://trac.osgeo.org/postgis/ticket/3753

@robe2 robe2 reopened this May 15, 2017
@robe2
Copy link
Member

robe2 commented Aug 10, 2017

@Komzpa I'm going to discuss this one with @pramsey at the FOSS4G code sprint on Saturday. feel free to join us on IRC.

@x4m
Copy link

x4m commented Aug 16, 2017

Hi! Were there any decisions made?

@Komzpa
Copy link
Member Author

Komzpa commented Aug 17, 2017

@x4m there were no discussions in IRC that I can see.

@robe2
Copy link
Member

robe2 commented Aug 20, 2017

We discussed at the sprint. @pramsey said he'd check it out further.

@robe2 robe2 requested a review from pramsey August 20, 2017 15:48
@robe2
Copy link
Member

robe2 commented Sep 5, 2017

I'm closing this out per x4m comment on - https://trac.osgeo.org/postgis/ticket/3753

that what has been committed is the best that can be done for now.

@robe2 robe2 closed this Sep 5, 2017
@Komzpa Komzpa deleted the patch-3 branch October 25, 2018 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants