Add BRIN index support for geometry and geography. trac #3591 #106

Closed
wants to merge 4 commits into
from

Projects

None yet

5 participants

@rjuju
rjuju commented Jun 24, 2016

Patch by Giuseppe Broccolo, Ronan Dunklau and Julien Rouhaud.

@rjuju rjuju Add BRIN index support for geometry and geography.
Patch by Giuseppe Broccolo, Ronan Dunklau and Julien Rouhaud.
7f48284
@rjuju rjuju changed the title from Add BRIN index support for geometry and geography. to Add BRIN index support for geometry and geography. trac #3591 Jun 24, 2016
@strk
Member
strk commented Jun 24, 2016

On Fri, Jun 24, 2016 at 09:28:53AM -0700, Julien Rouhaud wrote:

A regress/regress_lots_of_3dpoints.sql (50033)
A regress/regress_lots_of_geographies.sql (50023)

Could this be generated instead ?

@rjuju
rjuju commented Jun 24, 2016
A regress/regress_lots_of_3dpoints.sql (50033)
A regress/regress_lots_of_geographies.sql (50023)

Could this be generated instead ?

Sure. Data would be a little more uniform but that's not a problem for the tests. We can also add some less data if it's causing too much overhead in the installcheck process.

@strk
Member
strk commented Jun 25, 2016
@rjuju
rjuju commented Jun 25, 2016

What is the test focused on, exactly ?

Validating that a BRIN index can be created without error, and that an index scan will return the same rows as a sequential scan. Are there some known corner-cases (we're using box2df and gidx internally) we should also test?

@strk
Member
strk commented Jun 25, 2016
@dbaston
Contributor
dbaston commented Jun 29, 2016 edited

It doesn't seem like EMPTY geometries aren't supported, currently:

CREATE TABLE test (geom geometry); INSERT INTO test (geom) VALUES ('POLYGON EMPTY'); CREATE INDEX ON test USING brin(geom);

produces:
ERROR: Error while extracting the box2df from the geom

@rjuju
rjuju commented Jun 29, 2016

@strk: Agreed, we'll instead generate a more lightweight test case, testing both NULL and empty geometries

@dbaston (and @strk): Indeed, empty geometries fail. I didn't dig too much on this yet but it shouldn't be a big problem since the BRIN inclusion infrastructure allows storing an emptiness information.

How would you prefer to work for this patch, should we fix problems as soon as they're identified or should we wait for a more thorough review and then fix all of them at once? (assuming that BRIN support is a feature you'd like to add into postgis).

@dbaston
Contributor
dbaston commented Jun 29, 2016 edited

Sure. Data would be a little more uniform but that's not a problem for the tests. We can also add some less data if it's causing too much overhead in the installcheck process.

Do these serve the purpose?

For 3D points:

SELECT ST_Translate(ST_LineInterpolatePoint('LINESTRING (-5000 -5000, -2000 -2000, 2000 3000, 5000 5000)', i::float/50000), random(), random(), -1 + 2*random()) 
FROM generate_series(1, 50000) i;

For geographies:

SELECT
  i AS num, 
  ST_MakePolygon(ST_MakeLine(array_append(geoms, geoms[1])))::geography AS the_geog 
FROM 
  (SELECT i, array_agg(ST_LineInterpolatePoint(ST_Boundary(ST_Expand(geom, random())), random())) AS geoms FROM generate_series(1, 3), 
  (SELECT i, ST_Translate(ST_LineInterpolatePoint('LINESTRING (-13 49, 0 0, 0 -13, 16 -51)', i::float/1000), random(), random(), -1 + 2*random()) geom 
   FROM generate_series(1, 1000) i) centers GROUP BY i) points

@rjuju
rjuju commented Jun 29, 2016

Sorry I was not clear. I don't think that this test case can use any randomly generated data, since it needs to check that the returned rows are the one we expect. From a BRIN point of view, having randomness should only affect performance in index scans. I was thinking that a simple st_makepoint based on a generate_series (with some NULLS and empty geometries added every x entries, so that they don't end up in the same range) is a naive dataset but should be good enough for testing purpose.

gbroccolo added some commits Jul 3, 2016
@gbroccolo gbroccolo Add support for empty 2D geometries - it's a prototype, need more tes…
…ting
8ad5c75
@gbroccolo gbroccolo Revert "Add support for empty 2D geometries - it's a prototype, need …
…more testing"

The idea is to add a new branch, "brin_support_dev", rebased on the original commit
and used for the changes suggested during the review phase by the PostGIS team.

This reverts commit 8ad5c75.
82f689b
@gbroccolo @rjuju gbroccolo Fix issues raised during first review
- Handle empty geometries
- Generate test dataset instead of keeping huge file in source tree
- Use less data for testing
- Test suites now includes NULL and empty data

This commit introduce new brin_common.(c|h) files, for code shared between
box2df and gidx files. It for now only contains a
is_gserialized_from_datum_empty(Datum) function, wrapper around
gserialized_is_empty(GSERIALIZED).
501387e
@rjuju
rjuju commented Jul 24, 2016 edited

Hello,

I just pushed a patch that fixes all the issues raised for now. We waited a little before doing this in case more issues would be raised to avoid too many commits and ease reviewing.

Also, I just noticed that travis CI only runs regression tests on a 9.4 database. Since BRIN support has been added in 9.5, the "successful check" only means that configure steps does successfully ignore BRIN code in pre 9.5 clusters. Regression tests run successfully on my laptop on 9.5 and 9.6beta3 clusters though.

@robe2
Member
robe2 commented Jul 28, 2016

I tested this patch on my PostgreSQL 9.5.2 VC++ building (compiled under mingw but tested under VC++).

It seems to fail during uninstall, not sure why.

Here is my output:

`PostgreSQL 9.5.2, compiled by Visual C++ build 1800, 64-bit
Postgis 2.3.0dev - r15025 - 2016-07-28 16:41:06
scripts 2.3.0dev r15025
GEOS: 3.5.0-CAPI-1.9.0 r4090
PROJ: Rel. 4.9.1, 04 March 2015
SFCGAL: 1.3.0

Running tests

loader/Point .............. ok
loader/PointM .............. ok
loader/PointZ .............. ok
loader/MultiPoint .............. ok
loader/MultiPointM .............. ok
loader/MultiPointZ .............. ok
loader/Arc .............. ok
loader/ArcM .............. ok
loader/ArcZ .............. ok
loader/Polygon .............. ok
loader/PolygonM .............. ok
loader/PolygonZ .............. ok
loader/TSTPolygon ......... ok
loader/TSIPolygon ......... ok
loader/TSTIPolygon ......... ok
loader/PointWithSchema ..... ok
loader/NoTransPoint ......... ok
loader/NotReallyMultiPoint ......... ok
loader/MultiToSinglePoint ......... ok
loader/ReprojectPts ........ ok
loader/ReprojectPtsGeog ........ ok
loader/Latin1 .... ok
loader/Latin1-implicit .... ok
loader/mfile .... ok
dumper/literalsrid ....... ok
dumper/realtable ....... ok
affine .. ok
bestsrid .. ok
binary .. ok
boundary .. ok
cluster .. ok
concave_hull .. ok
ctors .. ok
dump .. ok
dumppoints .. ok
empty .. ok
estimatedextent .. ok
forcecurve .. ok
geography .. ok
geometric_median .. ok
in_geohash .. ok
in_gml .. ok
in_kml .. ok
in_encodedpolyline .. ok
iscollection .. ok
legacy .. ok
long_xact .. ok
lwgeom_regress .. ok
measures .. ok
minimum_bounding_circle .. ok
normalize .. ok
operators .. ok
out_geometry .. ok
out_geography .. ok
polygonize .. ok
polyhedralsurface .. ok
postgis_type_name .. ok
regress .. ok
regress_bdpoly .. ok
regress_index .. ok
regress_index_nulls .. ok
regress_management .. ok
regress_selectivity .. ok
regress_lrs .. ok
regress_ogc .. ok
regress_ogc_cover .. ok
regress_ogc_prep .. ok
regress_proj .. ok
relate .. ok
remove_repeated_points .. ok
removepoint .. ok
setpoint .. ok
simplify .. ok
simplifyvw .. ok
size .. ok
snaptogrid .. ok
split .. ok
sql-mm-serialize .. ok
sql-mm-circularstring .. ok
sql-mm-compoundcurve .. ok
sql-mm-curvepoly .. ok
sql-mm-general .. ok
sql-mm-multicurve .. ok
sql-mm-multisurface .. ok
swapordinates .. ok
summary .. ok
temporal .. ok
tickets .. ok
twkb .. ok
typmod .. ok
wkb .. ok
wkt .. ok
wmsservers .. ok
knn_recheck .. ok
temporal_knn .. ok
hausdorff .. ok
regress_buffer_params .. ok
offsetcurve .. ok
relatematch .. ok
isvaliddetail .. ok
sharedpaths .. ok
snap .. ok
node .. ok
unaryunion .. ok
clean .. ok
relate_bnr .. ok
delaunaytriangles .. ok
interrupt .. ok
interrupt_relate .. ok
interrupt_buffer .. ok
clipbybox2d .. ok
subdivide .. ok
voronoi .. ok
in_geojson .. ok
regress_brin_index .. ok
regress_brin_index_3d .. ok
regress_brin_index_geography .. ok
regress_sfcgal .. ok
sfcgal/empty .. ok
sfcgal/geography .. ok
sfcgal/legacy .. ok
sfcgal/measures .. ok
sfcgal/regress_ogc_prep .. ok
sfcgal/regress_ogc .. ok
sfcgal/regress .. ok
sfcgal/tickets .. ok
sfcgal/concave_hull .. ok
sfcgal/wmsservers .. ok
sfcgal/approximatemedialaxis .. ok
uninstall . failed (Error encountered loading /projects/postgis/branches/2.3brin/regress/00-regress-install/share/contrib/postgis/uninstall_postgis.sql: /projects/postgis/tmp/2.3_pg9.5w64/regress_lo g)
Makefile:284: recipe for target 'check' failed
make[1]: *** [check] Error 1
make[1]: Leaving directory '/projects/postgis/branches/2.3brin/regress'
GNUmakefile:16: recipe for target 'check' failed
make: *** [check] Error 1
`

Error in regress log shows:

uninstall_postgis.sql:48: ERROR: operator family "brin_geography_inclusion_ops" does not exist for access method "brin"

Are you seeing same thing or its something with my environment wrong?

@robe2
Member
robe2 commented Jul 28, 2016

Looking at the code, I see family:

CREATE OPERATOR FAMILY brin_geography_inclusion_family USING brin; CREATE OPERATOR CLASS brin_geography_inclusion_ops

The operator class seems to be called that and not the family or is the error misleading?

@robe2
Member
robe2 commented Jul 28, 2016 edited

okay I think our uninstall script generator makes the assumption that operator class and family have the same name (in utils\create_undef.pl)

I'd rather not muck with that code and it's irrelevant for extension support anyway which handles uninstall by default already. I do like keeping conventions though. Would you have an issue with renaming the family:

brin_geography_inclusion_ops
brin_geometry_inclusion_ops_2d
brin_geometry_inclusion_ops_3d
brin_geometry_inclusion_ops_4d

To keep with our convention?

I'm testing now to confirm that fixes things, but if you are amenable to that, I'm ready to accept the patch, barring no one has any issues with that, after I test in my 9.6beta3 instance, so we can move on to stress testing.

@strk
Member
strk commented Jul 28, 2016

I think the "same name" convention was also due to "familes" being
created implicitly by some version of PostgreSQL thus making it
impossible to figure out the name from the input .sql file

So, I'm all for keeping the convention, if possible.

@rjuju
rjuju commented Jul 28, 2016

That's strange, I never hit this issue on my environment:

$ make installcheck
RUNTESTFLAGS=" --extension" make check
[...]
 in_geojson .. ok 
 regress_brin_index .. ok 
 regress_brin_index_3d .. ok 
 regress_brin_index_geography .. ok 
 uninstall .. ok (4578)

Run tests: 118
Failed: 0

It'd be surprising to have a different behavior with linux and windows. I don't have sfcgal configured and therefore no related test launched, but it also shouldn't be an issue. Anyway, I'm fine with using the same name for opclass and opfamily. I'll send a patch for this if needed.

@strk
Member
strk commented Jul 28, 2016
@robe2
Member
robe2 commented Jul 28, 2016

@rjuju you didn't run into an issue since you passed in --extension.

Like I said for extension install, uninstall is built into the extension machinery, so you'd only run into issue if you left out the --extension part.

I just tested on 9.6beta3 and works fine with renaming family.

@robe2
Member
robe2 commented Jul 28, 2016

So are we okay with keeping convention and I'll commit to trunk?

@gbroccolo

Hi Regina,

Yes, you can rename the OpFamily - is this issue present just for the geography OpFamily?

@strk strk pushed a commit that closed this pull request Jul 31, 2016
@robe2 robe2 Add support for BRIN indexes (2nd Quadrant, Giuseppe Broccolo, Julie…
…n Rouhaud)

Closes #3591
Closes #106

git-svn-id: http://svn.osgeo.org/postgis/trunk@15028 b70326c6-7e19-0410-871a-916f4a2858ee
f49d428
@strk strk closed this in f49d428 Jul 31, 2016
@strk strk pushed a commit that referenced this pull request Jul 31, 2016
@robe2 robe2 Add support for BRIN indexes - missed new files on last commit
Closes #3591
Closes #106

git-svn-id: http://svn.osgeo.org/postgis/trunk@15029 b70326c6-7e19-0410-871a-916f4a2858ee
2b3c01b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment