Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable database connection for st_read and st_write #558

Merged
merged 119 commits into from
Apr 11, 2018

Conversation

etiennebr
Copy link
Member

This PR enables database (i.e. postgis) connection for st_read and st_write. It also updates tests and documentation.

I've moved st_*_db() to a new deprecated.R. I didn't make any effort to reconcile the arguments for st_read.character and st_read.PostgreSQLConnection), which might be a little sloppy. I'm open to trying to reconcile arguments as much as possible since this is an opportunity, if you have any preference, please let me know.

R/db.R Outdated
#' @details st_write_db was written with help of Josh London, see \url{https://github.com/r-spatial/sf/issues/285}
st_write_db = function(conn = NULL, obj, table = deparse(substitute(obj)), geom_name = "wkb_geometry",
#' @details database reading was written with help of Josh London, see \url{https://github.com/r-spatial/sf/issues/285}
st_write.PostgreSQLConnection = function(conn = NULL, obj, table = deparse(substitute(obj)), geom_name = "wkb_geometry",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the arguments before ... in the generic need to have identical names to that of the generic, which is: obj, dsn, layer; similar for st_read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that if we put the connection (dsn) in second, then we'll need to use S4 generic. Otherwise we'll only catch st_write.sf(). It also seems more common to see the connection as the first argument. S4?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to branch in st_write.sf - then I'm pragmatic and would like to avoid S4.

For write_xxx function it's more common to see the object to write as first argument, I'd say.

R/deprecated.R Outdated
st_read(...)
}

st_write_db <- function(...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, the argument order needs to change: st_write_db(conn, obj, layer) should become st_write(obj, conn, layer, ...) but then with correct names obj, dsn, layer.

@etiennebr
Copy link
Member Author

There is something I can't explain yet when reading the crs from the database.It might be a problem of version, but on my personal computer EPSG 28992 is different than from postgis for some decimals of the towgs84parameter.

#  crs from the database
st_crs(st_read("PG:dbname=postgis", "meuse"))  
#> Coordinate Reference System:
#>   EPSG: 28992 
#>   proj4string: "+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 
#> +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.4171,50.3319,465.5524,
#> -0.398957,0.343988,-1.87740,4.0725 +units=m +no_defs"
# local crs
st_crs(st_read(pg, "meuse"))
#> Coordinate Reference System:
#>   EPSG: 28992 
#>   proj4string: "+proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 
#> +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.2369,50.0087,465.658,
#> -0.406857,0.350733,-1.87035,4.0812 +units=m +no_defs"

Where pg is a connection to a postgis database. See tests for details.bh

@edzer
Copy link
Member

edzer commented Dec 3, 2017

Yes, this indicates changes to the EPSG database, present in different proj.4 versions; this particular one is well known. National geodesy institutes update this when they improve the mapping from their national system to (in this case) the global datum WGS84.

@etiennebr etiennebr force-pushed the generic-st_read branch 2 times, most recently from c410cce to 0dac8dc Compare December 10, 2017 03:37
@etiennebr
Copy link
Member Author

etiennebr commented Dec 10, 2017

To clarify the PR:

  • Uses st_write() and st_read() rather than st_read_db() and st_write_db() (both deprecated)
  • Uses generic dbWriteTable(), which, thanks to DBI package makes it very simple to add support for other (postgis enabled) databases.
  • Adds support for DBIObjects in addition to RPostgreSQL
  • Makes RPostgreSQL required in the Imports 👎 . Maybe there's another way I'm not aware of?
  • Now that RPostgres is on CRAN, it would probably be possible to remove RPostgreSQL support., but speaking for myself, I'd need a little transition period.
  • Adds tests, namely support for odbc connections (on postgres database). It works in most ordinary cases, but there are some glitches such as overwrite and append outside of default (public) schema fails. But it seems to relate to the odbc package.
  • Provides an option to create invalid crs with st_crs(valid = FALSE) to synch with database (basically to replace structure(list(epsg=..., proj4string=...), class="crs")).

To do:

Probably other things. Happy to discuss any of these and modify the pull request before it gets merged.

@edzer
Copy link
Member

edzer commented Dec 10, 2017

Cool that RPostgres is out! Let me know when you have travis running here; looking good. Release sf 0.5-6 planned early Jan.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 13, 2017

Err, sorry to barge in here, but #592 might be a problem? Or would that be solved with the dbWriteTable? Still there would be an inconsistency w.r.t. allowed field types on a PostgreSQL backend with st_write vis-a-vis dbWriteTable?

UPDATE: Sorry, I should have read st_read.R more carefully. I guess this (from st_write) actually solves #592:

if (inherits(dsn, c("DBIObject", "PostgreSQLConnection"))) {
    	if(is.null(layer)) layer <- deparse(substitute(obj))
return(dbWriteTable(dsn, name = layer, value = obj, ..., factorsAsCharacter = factorsAsCharacter))

@etiennebr
Copy link
Member Author

@dpprdan, I'll add explicit test for logical type. It's the best way to convince everyone :)

@etiennebr
Copy link
Member Author

This PR is ready to merge 🎉 ! however the commits are messy. @edzer, would you like me to clean up the commits before merging?

@edzer
Copy link
Member

edzer commented Apr 8, 2018

When testing the RPostgres tests, I see

> pg <- DBI::dbConnect(RPostgres::Postgres(), host = "localhost", dbname = "postgis")
Error in connection_create(names(opts), as.vector(opts)) : 
  fe_sendauth: no password supplied

any idea? the postgis db doesn't have a password set.

@edzer
Copy link
Member

edzer commented Apr 8, 2018

When I remove the host = "localhost" argument, things work. Should I modify my postgres configuration? Do you need the host setting for your tests?

@etiennebr
Copy link
Member Author

etiennebr commented Apr 9, 2018

Check your ~/.pgpass. Your credentials should be there, if not add them. It should solve your problem.

I thought I could do without the host, but it seems more robust to provide it. For instance to connect locally to a Docker instance it needs the localhost specification, and it seems to work fine on Travis. Same thing with PG:host=localhost.

@edzer
Copy link
Member

edzer commented Apr 9, 2018

Thanks, that works now. Using localhost, the users is apparently forced to have a password.

@etiennebr
Copy link
Member Author

It seems to be driver dependent. For instance, ODBC was less demanding.

By the way, ODBC is not fully supported, there are still some issues, but it works for most cases. So the test is just skipped for now.

suppressWarnings(could_schema <- DBI::dbGetQuery(pg, q) %>% nrow() > 0)

skip_if_not(could_schema, "Could not create schema (might need to run 'GRANT CREATE ON DATABASE postgis TO <user>')")
expect_error(st_write(pts, pg, Id(schema = "public", table = "sf_meuse__")), "exists")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting

Error: `st_write(pts, pg, Id(schema = "public", table = "sf_meuse__"))` threw an error with unexpected message.
Expected match: "exists"
Actual message: "unable to find an inherited method for function ‘dbWriteTable’ for signature ‘\"PqConnection\", \"SQL\", \"missing\""

here -- any idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you are running the latest DBI and RPostgres.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but that didn't resolve it; sessionInfo below; postgresql 9.6.8.

When I outcomment these test, next thing I get stuck with is

> st_write(pts, pg, Id(schema = "CAP__", table = "Meuse_tbl"))
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for functiondbWriteTablefor signature"PqConnection", "SQL", "missing"

What am I missing?

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sp_1.2-7       testthat_2.0.0 DBI_0.8       

loaded via a namespace (and not attached):
 [1] bit_1.1-12      compiler_3.4.3  magrittr_1.5    R6_2.2.2       
 [5] hms_0.4.2       tools_3.4.3     Rcpp_0.12.16    bit64_0.9-7    
 [9] grid_3.4.3      blob_1.1.1      pkgconfig_2.0.1 rlang_0.2.0    
[13] RPostgres_1.1.0 lattice_0.20-35

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is because the test uses the new way of naming tables in DBI. It uses DBI::Id() (prior was DBI::SQL()). Also Rpostgres was just recently updated (r-dbi/RPostgres#179) could you try devtools::install_github("r-dbi/rpostgres"). I was surprised that travis ran the test without complaining.

Looking at the travis logs (devtools::session_info())

DBI              0.8       2018-03-02 cran (@0.8)

Can't find RPostgres, but it must be 1.1.0. (I'm using RPostgres 1.1.0 2018-04-05 from Github (r-dbi/RPostgres@ef2ab88))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not causing this, but in general, shouldn't RPostgres be in Suggests in DESCRIPTION, (preferably with minimal version number)? Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I don't know. I think the whole point of using DBI is to only require DBI, but it's true that we sould still suggest RPostgres, but then should we suggest other drivers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are tests for them, they ought to be added, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our own education, I searched a bit for other examples and I found that dplyr uses data.table in tests, but it's nowhere in the DESCRIPTION. I still think it's a good idea to suggest it, though. Maybe @krlmlr knows the reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved the data.table backend to dtplyr a while ago. The tests are gone now: https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-group-by.r.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks (I got confused in the commits)! Makes sense. So if I understand correctly we should Import DBI and Suggest all drivers we explicitly test.

@edzer edzer merged commit e6e9615 into r-spatial:master Apr 11, 2018
edzer added a commit that referenced this pull request Apr 11, 2018
@edzer
Copy link
Member

edzer commented Apr 11, 2018

Cool -- now merged in master!! I had to install DBI from github to get the thing going, in the end, and remove one host=localhost from an st_read test that used the gdal driver.

@etiennebr
Copy link
Member Author

Yay! Writing the code was nothing compared to having the tests run successfully in another environment :)

@edzer
Copy link
Member

edzer commented Apr 12, 2018

This now also requires DBI and RPostgres versions from gitnub to pass travis checking.

@etiennebr could you please restore code coverage here with either adding tests or nocovs? And could you pls complete the NEWS entries? Also: would you be tempted to draft an r-spatial.org blog post on all the new capabilities? I'd be happy to help here.

@etiennebr
Copy link
Member Author

@edzer yes and yes!

@etiennebr
Copy link
Member Author

When is the next release?

@edzer
Copy link
Member

edzer commented Apr 18, 2018

A month from now (2 months between releases is required by CRAN), or perhaps earlier if 3.5.0 burns the whole thing down before that (see here).

@dpprdan
Copy link
Contributor

dpprdan commented Apr 24, 2018

May I humbly suggest that two minor but significant details be mentioned in the blog post?

  1. The name of geometry column does not default to wkb_geom anymore when writing to DB.
  2. The geometry column is not necessarily the last column (can that become a problem?) in the DB table.

@edzer
Copy link
Member

edzer commented Apr 24, 2018

I think 2. was caused by #718. Can that be a problem? We could, in principle, move it to last before writing.

@dpprdan
Copy link
Contributor

dpprdan commented Apr 24, 2018

@edzer I cannot say for sure, but I think I noticed that already earlier than two days ago, i.e. before you merged the fix for #718.

Re: 1. To be more precise, GDAL renames the geometry column to wkb_geom, DBI does not and leaves the name as-is (right?).

@etiennebr
Copy link
Member Author

Thanks @dpprdan, indeed, st_write copies the table to the database as closely as it can. It uses the column names and order from the data frame. I'll make sure to mention that, but I don't think it's an issue (or a surprise). GDAL creates more surprises and indeed shouldn't be encouraged.

etiennebr added a commit to etiennebr/sf that referenced this pull request Jun 5, 2018
etiennebr added a commit to etiennebr/sf that referenced this pull request Jun 5, 2018
etiennebr added a commit to etiennebr/sf that referenced this pull request Jul 14, 2018
* `sf::` is now a data source
* `st_example` allows to load files (fuzzy match)
* Use these sources for examples and vignettes

related: r-spatial#558
etiennebr added a commit to etiennebr/sf that referenced this pull request Jul 14, 2018
* `sf::` is now a data source
* `st_example` allows to load files (fuzzy match)
* Use these sources for examples and vignettes

related: r-spatial#558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet