Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with pq__varchar columns #444

Open
dmkaplan2000 opened this issue Oct 10, 2023 · 11 comments
Open

How to deal with pq__varchar columns #444

dmkaplan2000 opened this issue Oct 10, 2023 · 11 comments

Comments

@dmkaplan2000
Copy link

I have some queries that return columns of type pq__varchar. I was wondering if there was some simple way to convert these data into a more R friendly format inside R?

Thanks,
David

@krlmlr
Copy link
Member

krlmlr commented Oct 10, 2023

Thanks. Could you please provide a reproducible example? Please see https://reprex.tidyverse.org/ for guidance.

@dmkaplan2000
Copy link
Author

Given any RPostgres connection to a postgresql database, con, try the following:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
class(x$v)

@dmkaplan2000
Copy link
Author

Ideally I would be able to treat the elements of x$v like vectors/arrays in R, extracting elements as needed, etc., but it isn't clear how to do this with a pq__varchar. I was wondering if there was some sort of straight forward function or method for transforming these into something more familiar to R.

@dmkaplan2000
Copy link
Author

dmkaplan2000 commented Oct 26, 2023

The less than ideal solution that I eventually used for this situation was to parse the string representation of the array in R. But this only works easily because I have 1-D arrays with strings that do not have quote characters or commas in them. A better solution working directly from the pq__varchar variable would be preferable. Here is an example of a solution:

x = dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")

x$v_list = lapply(x$v,\(.) eval(parse(text=sub("[{]","c('",sub("[}]","')",gsub(",","','",.))))))

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

Reprex:

con <- DBI::dbConnect(RPostgres::Postgres())

x <- DBI::dbGetQuery(con,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
x
#>               v
#> 1         {a,b}
#> 2 {abc,def,fgh}
class(x$v)
#> [1] "pq__varchar"

Created on 2024-04-01 with reprex v2.1.0

@paleolimbot: Is there support for nested data in adbcpostgresql?

@paleolimbot
Copy link

It seems to!

library(adbcdrivermanager)

con <- adbcpostgresql::adbcpostgresql() |> 
  adbc_database_init(uri = "postgresql://localhost:5432/postgres?user=postgres&password=password") |> 
  adbc_connection_init()

con |> 
  read_adbc(
    "WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a"
  ) |> 
  as.data.frame() |> 
  dplyr::pull()
#> <list_of<character>[2]>
#> [[1]]
#> [1] "a" "b"
#> 
#> [[2]]
#> [1] "abc" "def" "fgh"

Created on 2024-04-01 with reprex v2.1.0

I say this lightly, but I actually spent quite a bit of time getting this to work. It also works for postgres' "record" type, where it should give you back a nested data frame. There is a slight limitation on reading the array type...I believe multidimensional arrays will just be returned as a flat list_of and will (currently) loose the dimensions.

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

Thanks, this is wonderful! I suspect it will also work via adbi, https://adbi.r-dbi.org/. @dmkaplan2000: can you confirm?

@krlmlr
Copy link
Member

krlmlr commented Apr 1, 2024

This brings up memories of my work bringing nested data to the duckdb R client. I do appreciate the effort!

@dmkaplan2000
Copy link
Author

I tried getting this to work with adbi, but I must admit I can't figure out how to successfully connect to a database. First I tried opening a connection with:

con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432/DBNAME?user=USER&password=PASSWORD")

That worked without error, but when I tried to use the connection to execute a query I got:

> dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a")
Error in adbcdrivermanager::adbc_statement_get_parameter_schema(stmt) : 
  NOT_IMPLEMENTED

Based on what I could find online, I tried reducing the URI to uri="postgresql://localhost:5432". That also worked for starting the connection, but not for executing a query.

Do you know what I am doing wrong? The help on adbi and adbcpostgresql is pretty spartan...

@krlmlr
Copy link
Member

krlmlr commented Apr 15, 2024

It's a moonshot, but can you try dbGetQuery(immediate = TRUE) ? This is DBI's current way of telling "we don't want a prepared query, just execute". I wonder if adbi takes action based on that flag, and if not, whether it should.

CC @nbenn.

@dmkaplan2000
Copy link
Author

immediate=TRUE fixed the issue!

> con2 = dbConnect(adbi::adbi("adbcpostgresql"),uri="postgresql://localhost:5432")
> x=dbGetQuery(con2,"WITH a(v) AS (VALUES(ARRAY['a','b']::varchar[]),(ARRAY['abc','def','fgh']::varchar[])) SELECT * FROM a",immediate=TRUE)
> x$v
<list_of<character>[2]>
[[1]]
[1] "a" "b"

[[2]]
[1] "abc" "def" "fgh"

So this query works with adbi, but I am not sure this really solves my issue in the sense that it fixes one thing by breaking another. My real queries involve, in addition to character arrays, spatial data provided by the postgis extension to postgresql. RPostgres plays nicely with the sf package so the spatial data can be processed, but I do not think that adbi can be used with sf. At least my initial examination does not indicate that this is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants