Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema support #220

Merged
merged 15 commits into from
Feb 2, 2018
Merged

Schema support #220

merged 15 commits into from
Feb 2, 2018

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Jan 29, 2018

  • New dbListObject() generic, returns a data frame with columns table (list of Id objects), id (a SQL vector), and is_prefix (a logical, indicates if this is a schema prefix or a table/view)
  • New dbUnquoteIdentifier() generic, seems necessary e.g. for dbExistsTable()
  • Export Id()
    • Requires all arguments to be named
    • Does not depend on the connection
    • Only when calling dbQuoteIdentifier() the connection is used

Fixes #24. CC @zozlak @stephlocke

@codecov-io
Copy link

codecov-io commented Jan 29, 2018

Codecov Report

Merging #220 into master will decrease coverage by 2.26%.
The diff coverage is 9.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #220      +/-   ##
==========================================
- Coverage   40.71%   38.44%   -2.27%     
==========================================
  Files          16       16              
  Lines         560      593      +33     
==========================================
  Hits          228      228              
- Misses        332      365      +33
Impacted Files Coverage Δ
R/interpolate.R 99.25% <ø> (ø) ⬆️
R/deprecated.R 0% <ø> (ø) ⬆️
R/DBDriver.R 3.22% <ø> (ø) ⬆️
R/data.R 26.19% <ø> (ø) ⬆️
R/DBResult.R 0% <ø> (ø) ⬆️
R/transactions.R 0% <ø> (ø) ⬆️
R/DBConnection.R 0% <0%> (ø) ⬆️
R/table.R 0% <0%> (ø) ⬆️
R/quote.R 32.18% <4%> (-12.27%) ⬇️
R/data-types.R 41.17% <50%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13d4c55...1dd0ebc. Read the comment docs.

R/DBConnection.R Outdated
@@ -362,6 +362,36 @@ setGeneric("dbListTables",
valueClass = "character"
)

#' List remote objects
#'
#' Returns the names of remote tables accessible through this connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tables -> objects ?

R/DBConnection.R Outdated
#'
#' Returns the names of remote tables accessible through this connection
#' (possibly via a prefix) as a data frame.
#' This should, where possible, include temporary objects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "This should include temporary objects, but not all database drivers support listing temporary objects"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include a sentence describing how this is different to dbListTables()?

R/DBConnection.R Outdated
#' @inheritSection DBItest::spec_sql_list_objects Additional arguments
#'
#' @inheritParams dbGetQuery
#' @param prefix An optional prefix, passed to [dbUnquoteIdentifier()].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worded too generically for me to understand

R/quote.R Outdated
#' use in a query as a column name, to make sure that you
#' generate valid SQL and avoid SQL injection.
#' use in a query as a column or table name, to make sure that you
#' generate valid SQL and avoid SQL injection. The inverse operation is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

injection -> injection attacks.

R/quote.R Outdated
@@ -100,6 +106,9 @@ setGeneric("dbQuoteIdentifier",

quote_identifier <-
function(conn, x, ...) {
if (is.list(x)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think auto-vectorisation over lists is best avoided because it violates type-stability (and we generally avoid it elsewhere)

R/quote.R Outdated

#' Unquote identifiers
#'
#' Call this method to convert a [SQL] object created by [dbQuoteIdentifier()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this documentation I don't know what this is going to return. It sounds like it's going to return table objects which doesn't seem right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return values are part of the specs in DBItest which I haven't adapted yet.

The method is returning a Table object. Why does this seem wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root of my confusion here is that Table is a reference/identifier, not a table.

@@ -3,21 +3,30 @@ setClass("Table", slots = list(name = "character"))

#' Refer to a table nested in a hierarchy (e.g. within a schema)
#'
#' Objects of class `Table` have a single slot `name`, which is a named
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this a table seems slightly confusing since it's the name of a table, not a table itself. Maybe TableRef?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And doesn't every other object start with db?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table() doesn't need the connection, it's just a dumb object like SQL(). We could use Identifier() too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest Id()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id() looks very prone to collisions with other packages...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This search doesn't find packages that export crossing: https://github.com/search?utf8=%E2%9C%93&q=user%3Acran+%22export+crossing%22+file%3ANAMESPACE&type=Code . I remember a recent discussion on Twitter about searching for packages that export a function?

SqlId() looks inconsistent to SQL(), and SQLId() looks ... interesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it needs to be filename: not file:. https://github.com/search?utf8=%E2%9C%93&q=user%3Acran+filename%3ANAMESPACE+%22export+Id%22&type= is the correct search, and again no conflicts for Id, the only functions are lowercase id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but it seems like GitHub limits the length of time a query can run. Anyway I ran the search on my Local CRAN mirror, there are no CRAN packages that export Id.

You can call it Identifier if you like it better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed (from a scraping of my local CRAN mirror) that no package exports an Id class either. Renaming.

R/quote.R Outdated
if (is(x, "SQL")) {
. <- strsplit(as.character(x), '^"|"$|"[.]"')
. <- lapply(., `[`, -1L)
split <- .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you likely converted this from a magrittr pipeline, but I think this would be better as

split <- lapply(strsplit(as.character(x), '^"|"$|"[.]"'), `[`, -1L)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually I think it is more explicit to use tail than [ here, but that is largely a matter of taste.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like nested expressions, and the dot looks like the lesser of all evils here to me, because I don't need to invent names for intermediates I don't care about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I really don't like the use of . here. It feels so different to the style of the rest of the code.

@@ -3,21 +3,30 @@ setClass("Table", slots = list(name = "character"))

#' Refer to a table nested in a hierarchy (e.g. within a schema)
#'
#' Objects of class `Table` have a single slot `name`, which is a named
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest Id()

R/quote.R Outdated
#'
#' dbUnquoteIdentifier(
#' ANSI(),
#' SQL(c("Schema"."Table", "UnqualifiedTable"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something this does not parse. I think you are missing a set of quotes.

SQL(c("Schema"."Table", "UnqualifiedTable"))
#> Error: unexpected symbol in "SQL(c("Schema"."

R/table.R Outdated
Table <- function(...) {
new("Table", name = c(...))
components <- c(...)
if (is.null(names(components)) || any(names(components) == "")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we want to force naming, particularly for the most common case (schema & table).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which comes first -- schema or table? If we force naming, that confusion won't arise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it makes it much more cumbersome to use the function. If it is cumbersome to use no one is going to use it. Having Table("foo", "bar") map to 'foo'.'bar' seems like an easy solution, and if your backend uses different names you can instead use named arguments.

if (length(components) == 2) && (is.null(names(components)) || all(names(components) == ""))) {
  names(components) <- c("schema", "table")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backends are free to implement their own wrappers with named arguments, and probably should too.

For instance, in BigQuery, "schema" means something entirely different. I'd rather keep things strict for the first implementation, we can always relax later but it's more difficult to impose restrictions on a function that has been exported.

@krlmlr krlmlr merged commit e59d1cd into master Feb 2, 2018
@krlmlr krlmlr deleted the f-#24-schema branch February 3, 2018 00:56
krlmlr added a commit that referenced this pull request Feb 10, 2018
…ListObjects()` and `dbUnquoteIdentifier()`, methods for `Id` that call `dbQuoteIdentifier()` and then forward (#220). - `dbListConnections()` is soft-deprecated by documentation. - Default implementations of `dbQuoteIdentifier()` and `dbQuoteLiteral()` preserve names, default implementation of `dbQuoteString()` strips names (#173). - Breaking change: If the `names` argument is unset, `SQL()` strips the names from the output.
krlmlr added a commit that referenced this pull request Mar 9, 2018
…ps the names from the output if the `names` argument is unset. - The `dbReadTable()`, `dbWriteTable()`, `dbExistsTable()`, `dbRemoveTable()`, and `dbListFields()` generics now specialize over the first two arguments to support implementations with the `Id` S4 class as type for the second argument. Some packages may need to update their documentation to satisfy R CMD check again. New generics ------------ - Schema support: Export `Id()`, new generics `dbListObjects()` and `dbUnquoteIdentifier()`, methods for `Id` that call `dbQuoteIdentifier()` and then forward (#220). - New `dbQuoteLiteral()` generic. The default implementation uses switchpatch to avoid dispatch ambiguities, and forwards to `dbQuoteString()` for character vectors. Backends may override methods that also dispatch on the second argument, but in this case also an override for the `"SQL"` class is necessary (#172). Default implementations ----------------------- - Default implementations of `dbQuoteIdentifier()` and `dbQuoteLiteral()` preserve names, default implementation of `dbQuoteString()` strips names (#173). - Specialized methods for `dbQuoteString()` and `dbQuoteIdentifier()` are available again, for compatibility with clients that use `getMethod()` to access them (#218). - Add default implementation of `dbListFields()`. - The default implementation of `dbReadTable()` now has `row.names = FALSE` as default and also supports `row.names = NULL` (#186). API changes ----------- - The `SQL()` function gains an optional `names` argument which can be used to assign names to SQL strings. Deprecated generics ------------------- - `dbListConnections()` is soft-deprecated by documentation. - `dbListResults()` is deprecated by documentation (#58). - `dbGetException()` is soft-deprecated by documentation (#51). - The deprecated `print.list.pairs()` has been removed. Bug fixes --------- - Fix `dbDataType()` for `AsIs` object (#198, @yutannihilation). - Fix `dbQuoteString()` and `dbQuoteIdentifier()` to ignore invalid UTF-8 strings (r-dbi/DBItest#156). Documentation ------------- - Help pages for generics now contain a dynamic list of methods implemented by DBI backends (#162). - `sqlInterpolate()` now supports both named and positional variables (#216, @hannesmuehleisen). - Point to db.rstudio.com (@wibeasley, #209). - Reflect new 'r-dbi' organization in `DESCRIPTION` (@wibeasley, #207). Internal -------- - Using switchpatch on the second argument for default implementations of `dbQuoteString()` and `dbQuoteIdentifier()`.
@github-actions github-actions bot locked and limited conversation to collaborators Oct 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guidelines to handle schemes and views
5 participants