Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary objects truncated to 255 characters when read #202

Closed
etiennebr opened this issue Jul 23, 2018 · 11 comments
Closed

Binary objects truncated to 255 characters when read #202

etiennebr opened this issue Jul 23, 2018 · 11 comments
Labels
bug postgres

Comments

@etiennebr
Copy link

@etiennebr etiennebr commented Jul 23, 2018

Binary objects seem to be truncated when converted to characters somewhere when reading from a table (executing the query straight works fine). It is probably an interaction between the db engine and odbc, since it works for RPostgres connections.

library(magrittr)
library(DBI)
odbc_con <- dbConnect(
    odbc::odbc(), 
    "PostgreSQL"
    )                    
pg_con <- dbConnect(
    RPostgres::Postgres(), 
    host = "localhost",             
    dbname = "postgis"
    )

# create a long binary object
query <- "SELECT ('[' || repeat('1,', 150) || '2]')::json;"

dbGetQuery(odbc_con, query)$json %>% nchar()
#> [1] 303
dbGetQuery(odbc_con, glue::glue("CREATE TABLE x AS {query}"))
#> data frame with 0 columns and 0 rows
dbReadTable(odbc_con, "x")$json %>% nchar()
#> [1] 255
dbReadTable(pg_con, "x")$json %>% nchar()
#> [1] 303

dbExecute(odbc_con, "DROP TABLE x;")
#> [1] -902937952

Created on 2018-07-23 by the reprex package (v0.2.0).

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_CA:en                    
#>  collate  en_CA.UTF-8                 
#>  tz       America/Toronto             
#>  date     2018-07-23
#> Packages -----------------------------------------------------------------
#>  package   * version date       source                     
#>  backports   1.1.2   2017-12-13 cran (@1.1.2)              
#>  base      * 3.4.4   2018-03-16 local                      
#>  bit         1.1-14  2018-05-29 CRAN (R 3.4.4)             
#>  bit64       0.9-7   2017-05-08 CRAN (R 3.4.2)             
#>  blob        1.1.1   2018-03-25 cran (@1.1.1)              
#>  compiler    3.4.4   2018-03-16 local                      
#>  datasets  * 3.4.4   2018-03-16 local                      
#>  DBI       * 1.0.0   2018-07-09 Github (r-dbi/dbi@40a7de4) 
#>  devtools    1.13.6  2018-06-27 CRAN (R 3.4.4)             
#>  digest      0.6.15  2018-01-28 cran (@0.6.15)             
#>  evaluate    0.10.1  2017-06-24 CRAN (R 3.4.2)             
#>  glue        1.2.0   2017-10-29 CRAN (R 3.4.2)             
#>  graphics  * 3.4.4   2018-03-16 local                      
#>  grDevices * 3.4.4   2018-03-16 local                      
#>  hms         0.4.2   2018-03-10 cran (@0.4.2)              
#>  htmltools   0.3.6   2017-04-28 CRAN (R 3.4.1)             
#>  knitr       1.20    2018-02-20 CRAN (R 3.4.4)             
#>  magrittr  * 1.5     2014-11-22 CRAN (R 3.4.0)             
#>  memoise     1.1.0   2017-04-21 CRAN (R 3.4.0)             
#>  methods   * 3.4.4   2018-03-16 local                      
#>  odbc        1.1.6   2018-07-23 Github (r-dbi/odbc@8cf2858)
#>  pkgconfig   2.0.1   2017-03-21 CRAN (R 3.4.2)             
#>  Rcpp        0.12.18 2018-07-23 cran (@0.12.18)            
#>  rlang       0.2.1   2018-05-30 CRAN (R 3.4.4)             
#>  rmarkdown   1.10    2018-06-11 CRAN (R 3.4.4)             
#>  RPostgres   1.1.1   2018-05-06 CRAN (R 3.4.4)             
#>  rprojroot   1.3-2   2018-01-03 cran (@1.3-2)              
#>  stats     * 3.4.4   2018-03-16 local                      
#>  stringi     1.2.3   2018-06-12 CRAN (R 3.4.4)             
#>  stringr     1.3.1   2018-05-10 CRAN (R 3.4.4)             
#>  tools       3.4.4   2018-03-16 local                      
#>  utils     * 3.4.4   2018-03-16 local                      
#>  withr       2.1.2   2018-03-15 cran (@2.1.2)              
#>  yaml        2.1.19  2018-05-01 CRAN (R 3.4.4)

Database

PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

Related to r-spatial/sf#721 (I simplified the example by using json rather than postgis extension.)

@ellisvalentiner
Copy link

@ellisvalentiner ellisvalentiner commented Aug 2, 2018

@etiennebr I think the issue is that nanodbc doesn't provide support for JSON (probably because it's not required in the ANSI SQL standard) and falls back to VARCHAR(255).

One solution is to cast the JSON as text before collecting the result from the database.

I think it would be a nice feature to be able to register custom types - e.g. automatically cast JSON columns as text and pass to jsonlite::fromJSON.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 22, 2018

Is this a problem across different databases? I've seen problems for accessing a geometry column from SQL Server, where using .STAsText() worked fine.

@krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 22, 2018

@jimhester: Does this ring a bell? Is there anything specific in nanodbc or in ODBC itself that would trigger this? Or is this a parameter we could adjust?

@jimhester
Copy link
Contributor

@jimhester jimhester commented Aug 23, 2018

I think there likely is a parameter that could be changed in the driver options, but I have not yet looked into how Postgres stores JSON data. Will certainly do so next time I am working on odbc, which should be in the next month or two.

@jimhester jimhester added the bug label Aug 23, 2018
@dpprdan
Copy link

@dpprdan dpprdan commented Nov 16, 2018

I'd like to stress @etiennebr's last sentence, meaning that this issue is not limited to JSON data, but binary data in general. Originally, it came up with reading a geometry column from PostgreSQL/PostGIS.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 19, 2019

A workaround is to convert json to jsonb, which is the recommended format in the PostgreSQL documentation.

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

library(magrittr)
library(DBI)
con <- dbConnect(odbc::odbc(), "PostgreSQL")

query <- "SELECT ('[' || repeat('1,', 150) || '2]')::json;"
dbGetQuery(con, query)$json
#> [1] "[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2]"
dbExecute(con, "DROP TABLE x")
#> [1] 154109836
dbExecute(con, glue::glue("CREATE TABLE x AS {query}"))
#> [1] 1
dbGetQuery(con, "SELECT to_jsonb(json) as json2 FROM x")$json2 %>% nchar()
#> [1] 453

Created on 2019-11-19 by the reprex package (v0.3.0)

Note the data is still a character vector when transmitted to R, but it does not have the 255 truncation limit. The last query is larger than the first due to added spaces between items in the output.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 19, 2019

But there still seems to be issues, and notably even to_json() works, but not just naively reading the full table. I don't know what is different between the two approaches.

@ellisvalentiner
Copy link

@ellisvalentiner ellisvalentiner commented Nov 19, 2019

Setting the MaxVarcharSize to 0 in the connection fixes this for me. See my post.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 20, 2019

As I don't think we will be adding Postgres specific types anytime soon and the fact there is a decent workaround for this I think I am going to close the issue.

@etiennebr
Copy link
Author

@etiennebr etiennebr commented Nov 23, 2019

I think @ellisvalentiner's solution works beautifully. I wonder how we could make it easier for users to either discover the source of the problem, or set a more robust default. I have no idea what's involved in making maxvarcharsize = 0 as a default (rather than 255, defined by nanodbc), but it could make it more robust to errors.

Since there are no warnings when the varchar are truncated, it is hard for the user to understand what's the origin of the error and can require a significant investigation to understand that the characters are truncated during the conversion of a binary object. I believe raising a warning when the max is reached could be a solution, but is more complex than setting a default.

@jimhester
Copy link
Contributor

@jimhester jimhester commented Nov 25, 2019

The default is set by the database, not by nanodbc or the odbc package. Also I doubt that maxvarcharsize = 0 would work for all databases, though it does work for PostgreSQL. I don't think there is much we can do about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug postgres
Projects
None yet
Development

No branches or pull requests

5 participants