Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bit-setting to optimize geoms for compression (precision-based) #225

Closed
wants to merge 14 commits into
base: svn-trunk
from

Conversation

Projects
None yet
3 participants
@dbaston
Copy link
Member

dbaston commented Feb 28, 2018

This is a rework of #223 that is based on "digits of precision" (number of digits to the right of the decimal point). I think this is more useful than the "significant digits" approach of #223, and more in line with what users expect given other PostGIS functions (precision arg to ST_AsText, for example).

I also did some real-world testing using the TIGER MCD layer. This dataset is provided with six digits of precision, and the geometry column occupies 499 MB. Running postgis_optimize_geometry with a precision of 6 reduces this to 256 MB. Joining the updated table produces no differences when geometries are compared with ST_AsText(geom, 6). Here is an example of what the modified coordinates look like:

Original:

MULTIPOLYGON(((-96.915638 37.214606,-96.915629 37.215194,-96.915626 37.21539,-96.914793 37.215383,

Modified:

MULTIPOLYGON(((-96.9156379699707 37.2146058082581,-96.9156289100647 37.2151939868927,-96.91562557220
Interestingly, a quick performance check (point-in-polygon) gets about 25% worse with the storage-optimized geometry.

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 28, 2018

Codecov Report

Merging #225 into svn-trunk will decrease coverage by 0.12%.
The diff coverage is 97.18%.

Impacted file tree graph

@@              Coverage Diff              @@
##           svn-trunk     #225      +/-   ##
=============================================
- Coverage      79.22%   79.09%   -0.13%     
=============================================
  Files            203      217      +14     
  Lines          63807    67832    +4025     
=============================================
+ Hits           50551    53654    +3103     
- Misses         13256    14178     +922
Impacted Files Coverage Δ
liblwgeom/cunit/cu_algorithm.c 98.92% <100%> (+0.03%) ⬆️
postgis/lwgeom_functions_basic.c 81.07% <94.44%> (+0.18%) ⬆️
liblwgeom/lwgeom.c 81.2% <96.29%> (+0.93%) ⬆️
liblwgeom/lwcollection.c 84.48% <0%> (-0.54%) ⬇️
postgis/lwgeom_geos_clean.c 28.2% <0%> (ø)
raster/test/cunit/cu_tester.c 52.87% <0%> (ø)
libpgcommon/lwgeom_transform.c 86.05% <0%> (ø)
postgis/lwgeom_geos.c 81.5% <0%> (ø)
raster/test/cunit/cu_misc.c 100% <0%> (ø)
liblwgeom/lwgeom_transform.c 87.23% <0%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b48938...02a7f41. Read the comment docs.

return 0;

int digits_left_of_decimal = (int) (1 + log10(fabs(d)));
uint32_t bits_needed = bits_for_precision(abs(decimal_digits) + digits_left_of_decimal);

This comment has been minimized.

@dbaston

dbaston Feb 28, 2018

Author Member

@mguzelevich Could you elaborate on how to use that here?

@Komzpa

This comment has been minimized.

Copy link
Member

Komzpa commented Mar 2, 2018

Interestingly, a quick performance check (point-in-polygon) gets about 25% worse with the storage-optimized geometry.

can you share the way you performed a check? did you CLUSTER / VACUUM FULL the table after making an UPDATE?

@dbaston

This comment has been minimized.

Copy link
Member Author

dbaston commented Mar 5, 2018

There was no UPDATE involved, the procedure was basically:


CREATE TABLE cousub_a AS SELECT row_number() OVER() AS gid, geom FROM cousub;
CREATE TABLE cousub_b AS SELECT gid, postgis_optimize_geometry(geom, 6) FROM cousub_a;
CREATE INDEX ON cousub_a USING gist(geom);
CREATE INDEX ON cousub_b USING gist(geom);

SELECT count(*) FROM testpoints, cousub_a WHERE ST_Intersects(testpoints.geom, cousub_a.geom);
SELECT count(*) FROM testpoints, cousub_b WHERE ST_Intersects(testpoints.geom, cousub_b.geom);
@Komzpa

This comment has been minimized.

Copy link
Member

Komzpa commented Mar 5, 2018

@dbaston a possible scenario: instead of going to TOAST tables because the values were large, they fit into the heap itself after compression, making less tuples per page :)

easy enough to peek with select max((ctid::text::point)[1]) from cousub_a for both tables.

@strk strk closed this in c54ccbb Mar 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.