Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think about dbListFields() vs. dbColumnInfo() #75

Closed
krlmlr opened this issue Jan 25, 2016 · 19 comments
Closed

Think about dbListFields() vs. dbColumnInfo() #75

krlmlr opened this issue Jan 25, 2016 · 19 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Jan 25, 2016

The latter provides more information, but requires a result set. Currently, dplyr seems to require it, too (r-dbi/RMySQL#130 (comment)).

I'd vote for including it again, perhaps with a default implementation that forwards to colnames(dbGetQuery(..., n = 0)) (#76).

@krlmlr
Copy link
Member Author

krlmlr commented Mar 11, 2016

  • Soft-deprecate dbListFields()
  • Add dbColumnInfo(Connection, character) to obtain column info for a table
  • Add default implementation for dbListFields() that forwards to dbColumnInfo(Connection)

@krlmlr krlmlr added this to the 0.5 milestone Jul 6, 2016
@russellpierce
Copy link

In working on this please keep an eye on what happens with the Postgres drivers. Neither the old RPostgreSQL or the new RPostgres seem to have methods for dbColumnInfo at the moment.

@krlmlr
Copy link
Member Author

krlmlr commented Feb 28, 2017

Postponed to a later release.

@krlmlr
Copy link
Member Author

krlmlr commented May 5, 2017

We can't add dbColumnInfo(Connection, character) because that would require altering the interface of the generic, and cause havoc downstream.

@russellpierce
Copy link

@krlmlr So, what is the plan then? I'm sorry I'm not following

@krlmlr
Copy link
Member Author

krlmlr commented May 5, 2017

Something else then, perhaps a new generic altogether -- dbGetColumnInfo()? Another option would be a Table class that encapsulates a DBIConnection and a name, so that this can be passed to dbColumnInfo() as one piece. A third option would be to pass the table name as part of ... and not declare it in the interface.

?

@russellpierce
Copy link

Passing as ... seems reasonable at first blush. What's the downside?

@krlmlr
Copy link
Member Author

krlmlr commented May 5, 2017

I don't particularly like the name of dbColumnInfo(), it's a noun. I'm still not sure if this alone warrants the definition of a new generic, given that dbColumnInfo() seems to be implemented by several backends already.

I'm also not sure if we should prefer two different generics, one for tables given by name, the other for result sets. What would these be called?

@russellpierce
Copy link

YMMV, but a verbed noun is sometimes challenging is it fetch, get, retrieve, load, etc? That dbColumnInfo works with existing backends does seem to recommend it. dbResultInfo? /shrug

@krlmlr
Copy link
Member Author

krlmlr commented May 6, 2017

I'd settle for "get" in this case; rows are "fetched", but rows affected, row number, info etc. are "got" in DBI. Yeah, back compatibility seems an issue. dbDataType() is another example that doesn't have a verb. Lets pass the table name in ... .

@krlmlr
Copy link
Member Author

krlmlr commented Mar 9, 2018

Current plan: Add default implementation for dbListFields() in DBI, using the new Id class. Keep current name for dbColumnInfo().

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2018

@hadley: The current implementation of dbColumnInfo() in our three backends returns a data frame with name and type columns. The latter contains "double" for numeric values and (for some backends) "string" for character values. This contradicts the description in ?DBI::dbColumnInfo:

A data.frame with one row per output field in res. Methods MUST include name, field.type (the SQL type), and data.type (the R data type) columns, and MAY contain other database specific information like scale and precision or whether the field can store NULLs.

What are the intended semantics of the "type" column? Do we specify that the SQL type should be returned too?

@hadley
Copy link
Member

hadley commented Apr 23, 2018

I think it would be better to consider dbColumnInfo() as informative, something that you'd look at interactively and not compute on. Otherwise we will be creating a huge amount of work for little gain.

@russellpierce
Copy link

I appreciate all of this is complex and y'all are balancing a number of considerations. I'm going to share my perspective as an outsider to see if it might be informative.

dbColumnInfo() operates on a result set whereas dbListFields() operates against a given table. As such dbColumnInfo() seems to provide a service somewhat distinct from dbListFields(). Shouldn't that be programable against / better specified than 'informative for interactive work'?

On a related note, right now I use dbColumnInfo() over dbListFields() for getting information about temp tables under RPostgreSQL, dbListFields() seems to not work as I might otherwise hope.

Error in [.data.frame(dbGetQuery(conn, paste("select a.attname from pg_attribute a, pg_class c, pg_tables t, pg_namespace nsp", :
undefined columns selected

Would a new generic that can be flexible across connection/table vs resultset calls and provide a consistent output, as krlmlr proposed in May last year, be a huge amount of work? [Which might be unrelated to the question of what dbColumnInfo() should return right now].

Thanks to both of you for your time and all of your efforts!

@hadley
Copy link
Member

hadley commented Apr 23, 2018

I think if we did want to provide some common way of returning the names and types of a table and query, we'd be better of starting from scratch rather than using the existing functions. IMO, the right data structure to return would be a zero-row data frame, as there's no standard way of representing R "types" as a string.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2018

We already have dbFetch(res, n = 0).

Current dbColumnInfo() returns more information for RPostgres, thanks to @etiennebr. I wonder if these should be attributes for the columns returned by dbFetch() instead.

krlmlr added a commit to r-dbi/RPostgres that referenced this issue Apr 23, 2018
- The `dbListFields()` method now works correctly if the `name` argument is a quoted identifier or of class `Id`, and throws an error if the table is not found (r-dbi/DBI#75).
krlmlr added a commit that referenced this issue Apr 23, 2018
- Add default implementation for `dbListFields(DBIConnection, Id)`, this relies on `dbQuoteIdentifier(DBIConnection, Id)` (#75).
krlmlr added a commit that referenced this issue Apr 23, 2018
- The `dbListFields()` method is now fully specified (#75).
krlmlr added a commit that referenced this issue Apr 23, 2018
- The `dbColumnInfo()` method is now fully specified (#75).
krlmlr added a commit to r-dbi/DBItest that referenced this issue Apr 23, 2018
- Add specification for `dbListFields()` (r-dbi/DBI#75).
krlmlr added a commit to r-dbi/DBItest that referenced this issue Apr 23, 2018
- Add specification for `dbColumnInfo()` (r-dbi/DBI#75).
@hadley
Copy link
Member

hadley commented Apr 23, 2018

Yes, but dbFetch(n = 0) will still often cause the database to do a bunch of work, so will not be performant for many queries. We may want a wrapper for dplyr's subquery + WHERE 0 = 1 approach.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2018

I'm not sure we can do much better than dbFetch(n = 0) in the general case, because dbSendQuery() already sends the query to the database. ?

Let's shelve this for now.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2018

Specified dbListFields() and dbColumnInfo().

@krlmlr krlmlr closed this as completed Apr 23, 2018
krlmlr automation moved this from To Do to Done Apr 23, 2018
@ghost ghost removed the ready label Apr 23, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Oct 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
krlmlr
  
Done
Development

No branches or pull requests

3 participants