Skip to content
Permalink
Browse files

Apply spatial filter when && is in query (#176)

When deparsing the filter, note cases with a Var && Const pattern and use of the geometry type, and turn those into an OGR spatial filter call. Allows pushdown of spatial filters, from FDW to OGR, and for many OGR drivers down to the source system as well (ie, FDW turns && into OGR spatial filter, and OGR turns spatial filter back into an appropriate SQL function on the remote end).
  • Loading branch information
pramsey committed Nov 18, 2019
1 parent 2747194 commit a7d277b0e4a18a4a526344430958cd6d211c4d25
Showing with 682 additions and 220 deletions.
  1. +2 −2 .editorconfig
  2. +2 −1 .travis.yml
  3. +34 −13 README.md
  4. +30 −0 input/file.source
  5. +37 −10 input/pgsql.source
  6. +89 −0 input/postgis.source
  7. +66 −43 ogr_fdw.c
  8. +31 −25 ogr_fdw.h
  9. +3 −4 ogr_fdw_common.h
  10. +163 −108 ogr_fdw_deparse.c
  11. +36 −0 output/file.source
  12. +63 −14 output/pgsql.source
  13. +126 −0 output/postgis.source
@@ -11,8 +11,8 @@ insert_final_newline = true
trim_trailing_whitespace = true

# Test files need kid gloves
[*.source]
insert_final_newline = false
[*.{source,out}]
insert_final_newline = true
trim_trailing_whitespace = false

# C files want tab indentation
@@ -18,7 +18,8 @@ env:
- PG_VERSION=9.6 GDAL_VERSION=2.2.4
- PG_VERSION=10 GDAL_VERSION=2.3.3
- PG_VERSION=11 GDAL_VERSION=2.4.3
# - PG_VERSION=12 GDAL_VERSION=3.0.2 # gdal 3 support in ogr_fdw is tbd
- PG_VERSION=12 GDAL_VERSION=2.4.3
# TODO: add gdal 3 support

before_script:
- sudo /etc/init.d/postgresql stop
@@ -4,16 +4,16 @@ Travis: [![Build Status](https://secure.travis-ci.org/pramsey/pgsql-ogr-fdw.png)

## Motivation

OGR is the vector half of the [GDAL](http://www.gdal.org/) spatial data access library. It allows access to a [large number of GIS data formats](http://www.gdal.org/ogr_formats.html) using a [simple C API](http://www.gdal.org/ogr__api_8h.html) for data reading and writing. Since OGR exposes a simple table structure and PostgreSQL [foreign data wrappers](https://wiki.postgresql.org/wiki/Foreign_data_wrappers) allow access to table structures, the fit seems pretty perfect.
OGR is the **vector** half of the [GDAL](http://www.gdal.org/) spatial data access library. It allows access to a [large number of GIS data formats](http://www.gdal.org/ogr_formats.html) using a [simple C API](http://www.gdal.org/ogr__api_8h.html) for data reading and writing. Since OGR exposes a simple table structure and PostgreSQL [foreign data wrappers](https://wiki.postgresql.org/wiki/Foreign_data_wrappers) allow access to table structures, the fit seems pretty perfect.

## Limitations

This implementation currently has the following limitations:

* **PostgreSQL 9.3+** This wrapper does not support the FDW implementations in older versions of PostgreSQL.
* **Only non-spatial query restrictions are pushed down to the OGR driver.** PostgreSQL foreign data wrappers support delegating portions of the SQL query to the underlying data source, in this case OGR. This implementation currently pushes down only non-spatial query restrictions, and only for the small subset of comparison operators (>, <, <=, >=, =) supported by OGR.
* **Spatial restrictions are not pushed down.** OGR can handle basic bounding box restrictions and even (for some drivers) more explicit intersection restrictions, but those are not passed to the OGR driver yet.
* **OGR connections every time** Rather than pooling OGR connections, each query makes (and disposes of) two new ones, which seems to be the largest performance drag at the moment for restricted (small) queries.
* **PostgreSQL 9.3 or higher.** This wrapper does not support the FDW implementations in older versions of PostgreSQL.
* **Limited non-spatial query restrictions are pushed down to OGR.** OGR only supports a minimal set of SQL operators (>, <, <=, >=, =).
* **Only bounding box filters (&&) are pushed down.** Spatial filtering is possible, but only bounding boxes, and only using the && operator.
* **OGR connections every time** Rather than pooling OGR connections, each query makes (and disposes of) **two** new ones, which seems to be the largest performance drag at the moment for restricted (small) queries.
* **All columns are retrieved every time.** PostgreSQL foreign data wrappers don't require all columns all the time, and some efficiencies can be gained by only requesting the columns needed to fulfill a query. This would be a minimal efficiency improvement, but can be removed given some development time, since the OGR API supports returning a subset of columns.

## Download
@@ -92,13 +92,34 @@ Copy the `CREATE SERVER` and `CREATE FOREIGN SERVER` SQL commands into the datab

And you can query the table directly, even though it's really just a shape file.

> SELECT * FROM pt_two;
fid | geom | name | age | height | birthday
-----+--------------------------------------------+-------+-----+--------+------------
0 | 0101000000C00497D1162CB93F8CBAEF08A080E63F | Peter | 45 | 5.6 | 1965-04-12
1 | 010100000054E943ACD697E2BFC0895EE54A46CF3F | Paul | 33 | 5.84 | 1971-03-25

```sql
SELECT * FROM pt_two;
```
```
fid | geom | name | age | height | birthday
-----+--------------------------------------------+-------+-----+--------+------------
0 | 0101000000C00497D1162CB93F8CBAEF08A080E63F | Peter | 45 | 5.6 | 1965-04-12
1 | 010100000054E943ACD697E2BFC0895EE54A46CF3F | Paul | 33 | 5.84 | 1971-03-25
```

You can also apply filters, and see the portions that will be pushed down to the OGR driver, by setting the debug level to `DEBUG1`.

```sql
SET client_min_messages = debug1;
SELECT name, age, height
FROM pt_two
WHERE height < 5.7
AND geom && ST_MakeEnvelope(0, 0, 1, 1);
```
```
DEBUG: OGR SQL: (height < 5.7)
DEBUG: OGR spatial filter (0 0, 1 1)
name | age | height
-------+-----+--------
Peter | 45 | 5.6
(1 row)
```

## Examples

@@ -371,5 +392,5 @@ Once you've figured out your issue, don't forget to remove the `CPL_DEBUG` optio
SET client_min_messages = notice;
ALTER SERVER myserver_latin1 OPTIONS (SET config_options 'SHAPE_ENCODING=LATIN1');
@@ -116,4 +116,34 @@ SELECT c.*
JOIN no_geom c
ON (c.fid = g.g);



------------------------------------------------
-- FGDB test

CREATE SERVER fgdbserver
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource '/vsizip/@abs_srcdir@/data/Querying.zip/Querying.gdb',
format 'OpenFileGDB' );

CREATE FOREIGN TABLE cities (
fid bigint,
shape bytea,
city_fips varchar(5),
city_name varchar(40),
state_fips varchar(2),
state_name varchar(25),
state_city varchar(7),
type varchar(25),
capital varchar(1),
elevation integer,
pop1990 integer,
popcat integer
) SERVER "fgdbserver"
OPTIONS (layer 'Cities');

SET client_min_messages = DEBUG1;

SELECT fid, city_name, pop1990 FROM cities WHERE pop1990 = 17710;
SELECT fid, city_name, pop1990 FROM cities WHERE city_name = 'Williston';
@@ -1,9 +1,6 @@
CREATE SERVER pgserver
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'PG:dbname=contrib_regression host=localhost',
format 'PostgreSQL' );

----------------------------------------------------------------------
-- Create local table

CREATE TABLE bytea_local (
fid serial primary key,
geom bytea,
@@ -20,6 +17,7 @@ CREATE TABLE bytea_local (
);

----------------------------------------------------------------------
-- Populate local table

INSERT INTO bytea_local (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES ('Jim', '14232'::bytea, 23, 1, 4.3, 5.5, '2010-10-10'::date, '13:23:21'::time, '2010-10-10 13:23:21'::timestamp, 'this', 'y' );
@@ -29,6 +27,13 @@ INSERT INTO bytea_local (name, geom, age, size, value, num, dt, tm, dttm, varch,
VALUES (NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

----------------------------------------------------------------------
-- Create remote table

CREATE SERVER pgserver
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'PG:dbname=contrib_regression host=localhost',
format 'PostgreSQL' );

CREATE FOREIGN TABLE bytea_fdw (
fid integer,
@@ -57,15 +62,37 @@ EXPLAIN VERBOSE
FROM bytea_fdw;

----------------------------------------------------------------------
-- Remote Query and OGR SQL pushdown

INSERT INTO bytea_fdw (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES ('Margaret', '2222'::bytea, 12, 5, 1.4, 19.13, '2001-11-23'::date, '9:12:34'::time, '2001-02-11 09:23:11'::timestamp, 'them', 'y' )
RETURNING fid, name, geom, age, size, value, num, dt, tm, dttm, varch, yn;
SET client_min_messages = DEBUG1;

SELECT fid, name, geom, age, size, value, num, dt, tm, dttm, varch, yn
FROM bytea_fdw
WHERE fid = 4;

SELECT fid, name, dt
FROM bytea_fdw
WHERE name IS NULL;

SELECT fid, name
FROM bytea_fdw
WHERE name = 'Jim' AND age <= 30;

SELECT fid, name, dt
FROM bytea_fdw
WHERE name = 'Jim' AND age <= 30 AND dt > '2010-10-1'::date;

SELECT fid, name
FROM bytea_fdw
WHERE name = 'Jim' OR name IS NULL;

----------------------------------------------------------------------
-- Remote Update

INSERT INTO bytea_fdw (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES ('Margaret', '2222'::bytea, 12, 5, 1.4, 19.13, '2001-11-23'::date, '9:12:34'::time, '2001-02-11 09:23:11'::timestamp, 'them', 'y' )
RETURNING fid, name, geom, age, size, value, num, dt, tm, dttm, varch, yn;

UPDATE bytea_fdw
SET name = 'Maggie', num = 45.34, yn = 'n'
WHERE age = 12;
@@ -89,4 +116,4 @@ SELECT a.fid, a.name, b.name
FROM bytea_local a
JOIN bytea_fdw b
USING (fid);


@@ -0,0 +1,89 @@
CREATE EXTENSION postgis;

CREATE TABLE geometry_local (
fid serial primary key,
geom geometry(Point, 4326),
name varchar,
age bigint,
size integer,
value float8,
num numeric(6,2),
dt date,
tm time,
dttm timestamp,
varch char(8),
yn char
);

----------------------------------------------------------------------
-- Populate local table

INSERT INTO geometry_local (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES ('Jim', 'SRID=4326;POINT(0 0)', 23, 1, 4.3, 5.5, '2010-10-10'::date, '13:23:21'::time, '2010-10-10 13:23:21'::timestamp, 'this', 'y' );
INSERT INTO geometry_local (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES ('Marvin', 'SRID=4326;POINT(100 0)', 34, 2, 5.4, 10.13, '2011-11-11'::date, '15:21:45'::time, '2011-11-11 15:21:45'::timestamp, 'that', 'n' );
INSERT INTO geometry_local (name, geom, age, size, value, num, dt, tm, dttm, varch, yn)
VALUES (NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL);

----------------------------------------------------------------------
-- Create remote table

CREATE SERVER pgservergeom
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource 'PG:dbname=contrib_regression host=localhost',
format 'PostgreSQL' );

CREATE FOREIGN TABLE geometry_fdw (
fid integer,
geom geometry(point, 4326),
name varchar,
age bigint,
size integer,
value float8,
num numeric(6,2),
dt date,
tm time,
dttm timestamp,
varch char(8),
yn char
) SERVER pgservergeom OPTIONS (layer 'geometry_local');


SELECT fid, name, geom, age, size, value, num, dt, tm, dttm, varch, yn FROM geometry_fdw;

SELECT a.name, b.name
FROM geometry_local a
JOIN geometry_fdw b
USING (fid);

EXPLAIN VERBOSE
SELECT fid, name, geom, age, size, value, num, dt, tm, dttm, varch, yn
FROM geometry_fdw;

----------------------------------------------------------------------
-- Remote Query and OGR SQL pushdown

SET client_min_messages = DEBUG1;

SELECT name, age, ST_AsText(geom)
FROM geometry_fdw
WHERE name = 'Jim' AND age <= 30
AND geom && ST_MakeEnvelope(-1, -1, 1, 1, 4326);

SELECT name, ST_AsText(geom)
FROM geometry_fdw
WHERE ST_Intersects(geom, ST_MakeEnvelope(-1, -1, 1, 1, 4326));

SELECT name, ST_AsText(geom)
FROM geometry_fdw
WHERE geom && ST_MakeEnvelope(-180, -90, 180, 90, 4326);

SELECT name, ST_AsText(geom)
FROM geometry_fdw
WHERE ST_MakeEnvelope(-180, -90, 180, 90, 4326) && geom;

SELECT name, ST_AsText(geom)
FROM geometry_fdw
WHERE ST_MakeEnvelope(-180, -90, 180, 90, 4326) &&
ST_MakeEnvelope(-180, -90, 180, 90, 4326);

0 comments on commit a7d277b

Please sign in to comment.
You can’t perform that action at this time.