Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add PostGIS support #86

Closed
zross opened this issue May 5, 2016 · 11 comments
Closed

Request to add PostGIS support #86

zross opened this issue May 5, 2016 · 11 comments

Comments

@zross
Copy link

zross commented May 5, 2016

I'm following up on a tweet exchange with Hadley. It would be spectacular RPostgres offered PostGIS support (support for PostgreSQL tables that have a geom).

I can't give a perfectly reproducible example since it will depend on your versions of PostGIS etc, but here is an attempt

Create a spatial DB from the command line

createdb --host localhost --port 5433 --username postgres --template postgis_21_sample  mypostgis 

Create a new table, add a geometry column and fill with one polygon (use psql or pgAdmin)

DROP TABLE IF EXISTS buildings;
CREATE TABLE buildings ( ID int4, name varchar(25) );
SELECT AddGeometryColumn('public', 'buildings', 'geom', 4326, 'MULTIPOLYGON', 2);
INSERT INTO buildings VALUES(1, 'some name', ST_GeomFromText('MULTIPOLYGON(((-76.5088 42.454, -76.4577 42.45921,  -76.448 42.415889, 
-76.5157 42.4199, -76.5088 42.454)))', 4326));
CREATE INDEX bldg_gist ON buildings USING gist(geom);

In R show what happens with RPostgres

You can read in the non-spatial fields just fine, but you get a warning and you get the raw geom data when your query includes the spatial variable (geom). Ideally, if you query against a spatial file it would return some kind of sp object like, in this case, a SpatialPolygonsDataFrame.

library(RPostgres)
con <- dbConnect(RPostgres::Postgres(),dbname = 'mypostgis', 
                 host = 'localhost', 
                 port = 5433,
                 user = 'postgres',
                 password = 'xxx')

res <- dbSendQuery(con, "SELECT id, name FROM buildings")
dbFetch(res)
#   id      name
#1  1 some name

res <- dbSendQuery(con, "SELECT * FROM buildings")
# Warning: Unknown field type (16400) in column geom
dat <- dbFetch(res)
dat
#   id      name
#1  1 some name
#                                                                                                                                                                                                                   geom
#1 0106000020E610000001000000010300000001000000050000001B0DE02D902053C0
# 273108AC1C3A454045D8F0F44A1D53C082FFAD64C73A4540E9263108AC1C53C011E2CAD93B
# 35454005A3923A012153C092CB7F48BF3545401B0DE02D902053C0273108AC1C3A4540
class(dat)
# data.frame

The postGIStools package offers some support

With the postGIStools package you can read in PostGIS data

con <- dbConnect(RPostgreSQL::PostgreSQL(),dbname = 'mypostgis', 
                 host = 'localhost', 
                 port = 5433,
                 user = 'postgres',
                 password = 'spatial')
library(postGIStools)

# SELECT * FROM buildings does not work for some reason
dat <- get_postgis_query(con, "SELECT id, name, geom FROM buildings", 
                         geom_name = "geom")
class(dat)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"

This is perfect except I found it to be slow on a medium size table. With 400k points, it took a pretty long time.

Speed

The image below shows that it takes almost 5 minutes to read in 400k points as a spatial file, but it can be read in faster if I write to shapefile first and then use readOGR. To me this suggests that there is likely a faster way to read this data directly into R.

image

Additional discussion on this

I'm not sure what the speediest way to bring in spatial data from PostGIS but I think bringing in a table of just 400k points should take less than 5 minutes as it seems to with postGIStools. And, given the example above where I write to shapefile and read in in half the time, it's clear that there should be better alternatives. There is some discussion about this at the postGIStools GitHub site.

@zross zross changed the title Please add PostGIS support Request to add PostGIS support May 5, 2016
@mdsumner
Copy link

mdsumner commented May 5, 2016

You are right about the speed, and I know there's a lot of options here - it's just that communicating remotely like this makes it hard to elucidate many things. I don't have functional DBs to summon, I should sort that out but it's just another thing.

Please try, after this "res <- " line, in R

res <- dbSendQuery(con, "SELECT * FROM buildings")
## please try
g <- wkb::readWKB(res$geom)
class(g)

and report details.

There are other options / details / points to note:

rgdal::readOGR (via GDAL, with the right drivers built in) can read natively from Postgres or PostGIS, there are gdal.org pages about the details

wkb::readWKB or rgeos::readWKT can re-compose Spatial geometry lists from geometry columns, but

the author of postGIStools reported that readWKB was slower than the internals of the workflow used here: SESYNC-ci/postGIStools#3

@zross
Copy link
Author

zross commented May 5, 2016

Thanks for input. I believe you can't use a query with readOGR, (just the whole table?). The other downside there is that there is not a driver for Windows. I use readOGR to grab data from PostGIS if I'm using a Mac, but not on Windows. Roger Bivand has said he does not think there will ever be a Windows driver for this.

Regarding the wkb perhaps I'm missing something but I get the following:

res <- dbSendQuery(con, "SELECT * FROM buildings")
# Warning: Unknown field type (16400) in column geom
dat <- dbFetch(res)
g <- wkb::readWKB(res$geom)
#Error in res$geom : $ operator not defined for this S4 class
g <- wkb::readWKB(dat$geom)
#Error in wkb::readWKB(dat$geom) : wkb must be a list

In postGIStools: he uses a variation on:

res <- dbSendQuery(con, "SELECT ST_AsText(geom) geom FROM buildings")
dat <- dbFetch(res)
final<-rgeos::readWKT(dat$geom)
# [1] "SpatialPolygons"
# attr(,"package")
# [1] "sp"

@wildintellect
Copy link

It's not that there isn't a windows driver, it's that they (rgdal) don't want to bundle the additional OGR drivers in the CRAN package build. If you build the package against a different OGR then it can work, there are quite a few OGR/GDAL drivers in this boat.

Yes I can confirm that rgdal does not support queries through ogr. Pull requests(Patches) to add query support to readOGR is an option (has been mentioned on r-sig-geo several times), if anyone wants to contribute it.

My workaround is that I create Views, then use readOGR on the view...

@pmarchand1
Copy link

As discussed in the linked ticket, I think the main reason readWKB is slower than readWKT at the moment is the R vs. C implementation.

On the PostgreSQL side, one additional complication is that neither ST_AsText nor ST_AsBinary import the projection information, so another term must be added to the query: ST_SRID (geom).

@zross
Copy link
Author

zross commented May 6, 2016

For reference, here is what I use to get the proj4

SELECT proj4text FROM spatial_ref_sys WHERE srid = (SELECT ST_SRID(geom) FROM the_table);

@eduardszoecs
Copy link

In postGIStools: he uses a variation on:

res <- dbSendQuery(con, "SELECT ST_AsText(geom) geom FROM buildings")
dat <- dbFetch(res)
final<-rgeos::readWKT(dat$geom)

That's what I am using most of the time...
Didn't know about postGIStools - looks promising ;)

@dnbucklin
Copy link

For those coming across this now, we've recently released on CRAN the rpostgis package, which implements a function rpostgis::pgGetGeom in a similar manner to dbReadTable from RPostgreSQL, but with the ability to add addtional SQL using clauses:

conn<-dbConnect("PostgreSQL")
pts<-pgGetGeom(conn,name=c("schema","table"),geom="geom",clauses="limit 10000")

We also found that the wkb::readWKB implementation was slower than rgeos::readWKT and used the latter for all retrieval functions (with the exception I mention below), though with writing back to the DB (with rpostgis::pgInsert), we used wkb::writeWKB in some cases, and it is generally faster than rgeos::writeWKT.

@zross , for large point tables (single point per record) in particular, simply extracting the X and Y coordinates in the SQL query (e.g., ST_X(geom), ST_Y(geom))and then creating a SpatialPoints from the two columns in R proved to be faster than rgeos::readWKT, and this is how we implemented it in rpostgis::pgGetGeom. I just tested it with 1M points (execution time was 10s for a SpatialPoints, and 50s for a SpatialPointsDataFrame with 9 columns of data).

@zross
Copy link
Author

zross commented Sep 8, 2016

This is exciting, thanks for letting me know!

@etiennebr
Copy link
Contributor

Related: #114

@krlmlr
Copy link
Member

krlmlr commented Dec 4, 2017

Closing in favor of https://github.com/r-dbi/DBI/issues/203.

@krlmlr krlmlr closed this as completed Dec 4, 2017
@github-actions
Copy link

github-actions bot commented Dec 7, 2020

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants