Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbSendQuery documentation unclear regarding when SQL request is executed vs prepared #268

Closed
tomcnolan opened this issue Nov 16, 2018 · 39 comments · Fixed by #282
Closed

dbSendQuery documentation unclear regarding when SQL request is executed vs prepared #268

tomcnolan opened this issue Nov 16, 2018 · 39 comments · Fixed by #282
Labels
Milestone

Comments

@tomcnolan
Copy link

@tomcnolan tomcnolan commented Nov 16, 2018

Hi folks, my team at Teradata is implementing a DBI Driver for the Teradata Database.

The documentation for dbSendQuery says:

The dbSendQuery() method only submits and synchronously executes the SQL query to the database engine.

The documentation for dbBind says:

For parametrized or prepared statements, the dbSendQuery() and dbSendStatement() functions can be called with statements that contain placeholders for values. The dbBind() function binds these placeholders to actual values ...

The documentation appears to be saying that the dbSendQuery method will execute a non-parameterized SQL request, and will prepare (but not execute) a parameterized SQL request.

The documentation does not indicate how a DBI Driver should distinguish between a parameterized versus non-parameterized SQL request, in order to decide how to transmit the SQL request to the backend database in the appropriate manner (prepare versus execute).

In other database APIs, such as JDBC, there are separate API calls for prepare versus execute, so expected driver behavior is obvious.

My expectation is that the dbSendQuery documentation can be augmented to explain the decision-making criteria for whether the specified SQL request is to be prepared versus executed.

In particular, the documentation should indicate whether this behavior is driver-specific. If there is a recommended way for the application to indicate to the driver whether to prepare versus execute, such as an extra named parameter for the dbSendQuery, then the documentation should mention that.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Nov 26, 2018

Thanks for the feedback.

RPostgres, RMariaDB, RSQLite and most likely odbc use the same code path both for parametrized and non-parametrized SQL. The SQL will be parsed for placeholders by the database library. Would that be an option for Teradata?

I remember problems with RMariaDB where some queries could not be executed as parametrized SQL, we had to work around and revert to a different code path after failure. Perhaps you could look for params = list() or params = NULL in the dbSendQuery() call?

Happy to adapt/enhance DBItest to this use case.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Nov 26, 2018

The SQL will be parsed for placeholders by the database library. Would that be an option for Teradata?

We would like to avoid that approach if at all possible. We try to avoid parsing SQL in our drivers.

some queries could not be executed as parametrized SQL, we had to work around and revert to a different code path after failure.

This is also true for the Teradata Database. Some kinds of SQL requests cannot be prepared. Unfortunately, this approach won't work for us, because the same error code is returned for syntax errors and errors due to a SQL request that can't be prepared.

Perhaps you could look for params = list() or params = NULL in the dbSendQuery() call?

Yes, this approach could work. This approach would have 3 use cases:

  1. dbSendQuery() with no params argument -- the driver tells the database to execute the non-parameterized SQL request.

  2. dbSendQuery() with params=NULL argument -- the driver tells the database to prepare (but not execute) the parameterized SQL request. The app must subsequently execute the prepared request with dbBind()

  3. dbSendQuery() with non-NULL params argument -- the driver tells the database to prepare and execute the parameterized SQL request with the specified parameter values.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Nov 26, 2018

Thanks. Unfortunately, the cases 1 and 2 are indistinguishable. How about:

  1. dbSendQuery(params = NULL) : Prepare but wait for dbBind()

  2. dbSendQuery(params = list()) : Execute right away (possibly using a different code path)

  3. dbSendQuery(params = list(...)) : Prepare and bind

This would be consistent with existing implementations and, I guess, achieve what you're looking for.?

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Nov 26, 2018

How about:

  1. dbSendQuery(params = NULL) : Prepare but wait for dbBind()
  2. dbSendQuery(params = list()) : Execute right away (possibly using a different code path)
  3. dbSendQuery(params = list(...)) : Prepare and bind

Yes, that should work.

The documentation does not currently talk about the expected behavior for non-parameterized SQL requests.

Can the documentation for dbSendQuery be augmented to discuss use case # 2 for immediately executing a non-parameterized SQL request?

  1. dbSendQuery(params = list()) : Execute right away (possibly using a different code path)

Thanks!

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Dec 30, 2018

Yes, we need to add test cases and implementations in the different backends. I'll consider this for the upcoming update in February.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Jan 8, 2019

This may be even more relevant for dbSendStatement().

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Mar 6, 2019

@krlmlr I have implemented dbSendQuery in the Teradata driver and have encountered a problem with our proposed solution.

Many other DBI methods such as dbExecute funnel down to calling dbSendQuery and do not explicitly specify a params argument and implicitly let params default to NULL. Those methods expect the SQL request to be executed, not just prepared without being executed.

Therefore, given how other DBI methods call dbSendQuery, the default behavior for dbSendQuery when params is not specified (meaning params is NULL) must be to execute the SQL request.

We should state generally that params must be NULL for a non-parameterized SQL request, and params must be non-NULL for a parameterized SQL request. That distinction actually makes sense, and is intuitive.

dbSendQuery behavior must be as follows:

  1. dbSendQuery(params = NULL) : Execute the non-parameterized SQL request. This is the default when params is not specified.
  2. dbSendQuery(params = list()) : Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request will be executed with a subsequent call to dbBind().
  3. dbSendQuery(params = list(...)) : Prepare and execute the parameterized SQL request with the specified bind parameter values.

Thanks!

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Mar 17, 2019

A few loose thoughts:

dbExecute() should forward to dbSendStatement() -- if this is the same operation in Teradata, dbSendStatement() can just forward to dbSendQuery(). Do you really need to override dbExecute() ?

I skimmed the DBI specification, time of execution/query doesn't currently seem to be specified indeed. It can't be "too early", though: it's currently not mandatory for the caller to indicate whether the query contains placeholders. This is done with a call to dbBind() -- if that call is missing, dbFetch() or dbGetRowsAffected() work if the query doesn't contain placeholders, and throw an error otherwise.

NULL has the same semantics as "argument absent". You mentioned dbExecute() where the expectation is that the statement is executed immediately -- I'd second that, but dbExecute(params = NULL) could forward to dbSendStatement(params = list()) to achieve the desired behavior.

Agree to update the documentation once this is adopted.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Mar 18, 2019

@krlmlr

dbExecute() should forward to dbSendStatement() -- if this is the same operation in Teradata,

Yes dbExecute should forward to dbSendStatement
Yes they are the same operation for Teradata.

dbSendStatement() can just forward to dbSendQuery().

Yes.

Do you really need to override dbExecute() ?

No, we do not need to override dbExecute. We do not want to override dbExecute.
We want to be able to inherit the standard dbExecute implementation in DBI.

dbExecute(params = NULL) could forward to dbSendStatement(params = list()) to achieve the desired behavior.

It could, but it does not.
If that behavior was added in a future release of DBI, then it would produce different behavior than existing old releases of DBI.

Our goal for the Teradata driver is to be compatible with DBI >= 0.8

Therefore, we need the meaning of the params argument to be consistent across all releases of DBI >= 0.8

dbExecute, dbSendStatement, and dbSendQuery behavior all must treat the params argument in the same manner:

  1. params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.

  2. params = list() : Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request is prepared with dbSendQuery and will be executed with a subsequent call to dbBind. When params = list() is specified for dbExecute or dbSendStatement the parameterized SQL request is prepared but cannot subsequently be used with dbBind, effectively acting only as a SQL syntax validation check.

  3. params = list(...) : Prepare and execute the parameterized SQL request with the specified bind parameter values.

We have already implemented this behavior in the Teradata DBI driver, and it works well with DBI 0.8.

  • We inherit the standard dbExecute from DBI, which forwards the params argument to dbSendStatement.
  • We inherit the standard dbSendStatement from DBI, which forwards the params argument to dbSendQuery.
  • We implement dbSendQuery and we interpret the params argument as listed above.
@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Mar 19, 2019

Thanks. I hear you, but the suggested treatment of params = NULL will not work because NULL has the same semantics as "argument absent". If no params are given, dbSendQuery() and dbSendStatement() still accepts a query/statement with placeholders, and will (in the general case) create a result set that waits for a dbBind() call.

Does your implementation pass the tests in DBItest?

It would be great to achieve consensus here. Can we find a way that doesn't break current implementations of RMariaDB, RPostgres and RSQLite and achieves what you are looking for?

Compatibility with DBI 0.8 will require overriding a few methods, but can be done. Alternatively, you could require a recent version of DBI.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Mar 19, 2019

@krlmlr

If you are willing to change the default implementation for DBI methods, then I recommend that the distinction between preparing and executing a parameterized SQL request be made explicit by introducing two new methods: dbExecuteStatement and dbExecuteQuery.

In addition, the DBI documentation must be changed to clearly articulate the behavior of each of the DBI methods with respect to parameterized versus non-parameterized SQL.

Here is a proposal that I believe is compatible with existing DBI drivers...

(Existing method, modified documentation, modified default implementation)
dbExecute(conn, statement, ...) is intended for executing either non-parameterized or parameterized SQL requests.
dbExecute comes with a default implementation that calls dbExecuteStatement then dbGetRowsAffected followed by dbClearResult.
dbExecute has an optional params argument interpreted as follows:

  • params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.

(New method)
dbExecuteStatement(conn, statement, ...) is intended for executing either non-parameterized or parameterized SQL requests.
dbExecuteStatement comes with a default implementation that calls dbSendStatement, suitable for backends that do not distinguish between prepare and execute operations.
Backends with separate prepare versus execute operations should override dbExecuteStatement.
dbExecuteStatement has an optional params argument interpreted as follows:

  • params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.

(Existing method, modified documentation, unchanged default implementation)
dbSendStatement(conn, statement, ...) is intended for preparing or executing parameterized SQL requests.
dbSendStatement comes with a default implementation that calls dbSendQuery for backends that do not distinguish between DML statements and queries.

  • params = NULL : Prepare but do not execute the parameterized SQL request. This is the default when params is not specified. The request can be executed with a subsequent call to dbBind.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.

(Existing method, modified documentation, modified default implementation)
dbGetQuery(conn, statement, ...) is intended for executing either non-parameterized or parameterized SQL requests.
dbGetQuery comes with a default implementation that calls dbExecuteQuery then dbFetch followed by dbClearResult.
dbGetQuery has an optional params argument interpreted as follows:

  • params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.

(New method)
dbExecuteQuery(conn, statement, ...) is intended for executing either non-parameterized or parameterized SQL requests.
dbExecuteQuery comes with a default implementation that calls dbSendQuery, suitable for backends that do not distinguish between prepare and execute operations.
Back ends with separate prepare versus execute operations should override dbExecuteQuery.
dbExecuteQuery has an optional params argument interpreted as follows:

  • params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.

(Existing method, modified documentation)
dbSendQuery(conn, statement, ...) is intended for preparing or executing parameterized SQL requests.

  • params = NULL : Prepare but do not execute the parameterized SQL request. This is the default when params is not specified. The request can be executed with a subsequent call to dbBind.
  • params = list(...) : Execute the parameterized SQL request with the specified bind parameter values.
@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Mar 19, 2019

Thanks. I was contemplating new interface methods, it's a bit of work but seems worth it.

I'm not sure I follow your proposal entirely, I'll draft mine and we see if we can meet halfway.

I'm proposing two new methods, dbSendQueryNow() and dbSendStatementNow().

  • Can accept parameters, but guarantee immediate execution. Failure if not enough parameters supplied.
  • Result set object may accept new/updated parameters through dbBind(), but isn't required to
  • Default implementation calls dbSendXXX() + dbBind() (or just dbSendXXX(params = ...) )
  • Default dbGetQuery() and dbExecute() are modified to call dbSendXXXNow()

Is the dbExecuteXXX() from your proposal the same as dbSendXXXNow() from mine? I'm confused by the naming scheme and the similarity to the existing dbExecute().

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Mar 19, 2019

@krlmlr Yes, your proposed methods sound similar to mine.
Your proposed dbSendQueryNow sounds like my proposed dbExecuteQuery.
Your proposed dbSendStatementNow sounds like my proposed dbExecuteStatement.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 15, 2019

@tomcnolan: I feel that new generics are a bit too much for this problem. Did we discuss an optional argument to dbSendQuery()? This design seems to work for {odbc}: r-dbi/odbc#272.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 16, 2019

@krlmlr
I read r-dbi/odbc issues 163 and 127 which indicate that the "prepare versus execute" problem occurs for other drivers also.

Clearly people expect dbExecute and dbSendStatement to actually execute the SQL request, not just prepare it.

I feel that new generics are a bit too much for this problem.

I disagree.

DBI lacks a distinction between preparing and executing a SQL request. Other database access APIs such as ODBC and JDBC offer separate API functions for those two operations.

Did we discuss an optional argument to dbSendQuery()?

No, we have not discussed that approach yet.

In my opinion, adding a non-default immediate = TRUE argument is problematic and will likely lead to behavior differences between DBI drivers.

We have implemented the following behavior in the DBI driver for Teradata:

dbExecute, dbSendStatement, dbGetQuery, and dbSendQuery all treat the params argument in the same manner:

  • params = NULL : Execute the non-parameterized SQL request. This is the default when params is not specified.
  • params = list() : Prepare but do not execute the parameterized SQL request. Assumes the parameterized SQL request is prepared with dbSendStatement or dbSendQuery and will be executed with a subsequent call to dbBind. When params = list() is specified for dbExecute or dbGetQuery the parameterized SQL request is prepared and no DBIResult is returned so dbBind cannot subsequently be used, effectively acting only as a SQL syntax validation check.
  • params = list(...) : Prepare and execute the parameterized SQL request with the specified bind parameter values.
@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Apr 19, 2019

@tomcnolan it's not feasible to change the meaning of NULL because it would break backward compatibility. Another option would be to use a sentinel object:

  • params = NULL: current behaviour: prepare but don't execute.
  • params = none(): execute immediately.
  • params = list(...): current behaviour: prepare, bind, then execute.

It's not clear that is better than adding new immediate argument, although it does make it impossible to both supply parameters and execute immediately.

I think adding new generics here is too heavy.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 19, 2019

@hadley @krlmlr People expect dbExecute and dbGetQuery to actually execute a non-parameterized SQL request.
People do not expect dbExecute and dbGetQuery to just prepare but not execute a non-parameterized SQL request.
dbExecute and dbGetQuery don't return a DBIResult object, so there is no way to do a subsequent dbBind.
It seems like a really bad workaround to require people to specify immediate = TRUE for calls to dbExecute and dbGetQuery with a non-parameterized SQL request.
If you are committed to the immediate parameter idea, then the default should be immediate = TRUE for dbExecute and dbGetQuery.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 19, 2019

@tomcnolan: dbGetQuery() and dbExecute() do run the query immediately (parametrized or not), it's just that the (internal) parametrized API is always used currently. I think we can adapt the DBI spec to explicitly state that dbGetQuery() and dbExecute() are allowed use the "immediate" internal API. This would be reflected by default values: immediate = NA for dbGetQuery() and dbExecute(), which means the implementation is free to choose, immediate = FALSE for dbSendQuery() and dbSendStatement(). How does that sound?

Summary

Different interfaces for prepared queries and immediate execution found in (at least) Teradata, ODBC, MSSQL and MySQL/MariaDB. The current default is to always use prepared queries. We need to offer a clear and unambiguous way to access the "immediate" API.

I agree with @hadley that we shouldn't make backward-incompatible changes at this point. I also agree with @tomcnolan that we should make it easy for backend implementers to support the new interface so that no behavioral differences occur. Tests will be one aspect of this, but these might be difficult to implement in a backend-agnostic way. I also think we need to differentiate between dbSend...() and dbGetQuery()/dbExecute(), but we also want to have a consistent interface for all functions, possibly with different defaults.

The "immediate" API is optional, but important for some queries/backends. I rather like the idea of an optional immediate argument now, this doesn't overload the params argument.

Most users will call dbGetQuery() or dbExecute() to access the database. Here, the query parameters are known up front, and the implementation can make its own choice of the interface to use (immediate = NA). The DBI spec should not constrain the backend to use a particular mode of operation here, also we cannot test it in a backend-agnostic way. In addition, the immediate argument is a hint to specify what API should be used internally, overridable by the user.

dbSendQuery() and dbSendStatement() are methods that are rarely called by end users. As stated, we can't change the default behavior, but we can support a hint to call the non-parametrized API.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 19, 2019

@krlmlr Yes, I agree with this proposal. Thanks!
Will this require a new version of DBI?
How soon can the official DBI documentation be updated to reflect this?

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

@krlmlr I have a follow-up question regarding this topic:

This would be reflected by default values: immediate = NA for dbGetQuery() and dbExecute(), which means the implementation is free to choose

What is the expected behavior from the driver when the app specifies immediate = FALSE for dbGetQuery or dbExecute? Is the driver supposed to stop with an error?

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 23, 2019

I'll add it to the spec very soon, there's a related PR in the {odbc} package.

The current implementation for {odbc}, {RSQLite}, {RMariaDB} and {RPostgres} always uses the corresponding "prepared query" API, even for queries/statements without parameters. I'd argue that immediate = FALSE shouldn't be an error but rather use the "prepared query" API as requested. @jimhester suggests that immediate = FALSE should remain the (internal) default for the {odbc} package even after the change (r-dbi/odbc#272 (comment)).

Can Teradata use the "prepared query" API for queries without parameters?

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

For the Teradata DBI Driver, we have added the immediate parameter for the dbExecute, dbGetQuery, dbSendStatement, and dbSendQuery methods as discussed above.

The Teradata DBI Driver will behave as follows for the dbExecute, dbGetQuery, dbSendStatement, and dbSendQuery methods:

  • When bound parameter values are specified with the params argument, the immediate argument is ignored, and the SQL request is executed immediately.
  • When no bound parameter values are specified, and immediate = NA or immediate = TRUE is specified, then the SQL request is executed immediately. immediate = NA is the default for the dbExecute and dbGetQuery methods.
  • When no bound parameter values are specified, and immediate = FALSE is specified, then the SQL request is prepared but not executed. immediate = FALSE is the default for the dbSendStatement and dbSendQuery methods.

The only remaining question is the expected behavior of the dbExecute and dbGetQuery methods when no bound parameter values are specified and immediate = FALSE is specified. Earlier you said:

dbGetQuery() and dbExecute() do run the query immediately (parametrized or not)

So it seems reasonable that dbExecute and dbGetQuery should stop with an error if immediate = FALSE is specified. Please provide some guidance on this issue. Thanks!

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

@krlmlr

Can Teradata use the "prepared query" API for queries without parameters?

The Teradata Database allows almost all non-parameterized SQL requests to be prepared.

Some (obviously non-parameterized) DDL commands cannot be prepared, so the Teradata DBI Driver cannot blindly attempt to prepare all SQL requests.

The Teradata DBI Driver must assume by default that an app intends to execute a SQL request, and only prepare a SQL request if it's unambiguously certain that the app intended to prepare a SQL request.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 23, 2019

I think the following implementation sketch of dbGetQuery() should do the job (ignoring S4 specifics):

dbGetQuery <- function(..., params = NULL, immediate = NA) {
  immediate <- decide_immediate(immediate, params, ...)
  res <- dbSendQuery(..., params = params, immediate = immediate)
  on.exit(dbClearResult(res))

  dbFetch(res, ...)
}

This doesn't differ much from the current implementation, only the immediate argument is new. dbSendQuery() is still supposed to return a result set object, only dbFetch() actually returns data.

Similarly for dbExecute() .

Ideally you shouldn't even need to override dbGetQuery(), but you may wish to do so if it's important to always run non-parametrized queries with the "direct" API. Depending on the backend this may be the riskier option (thinking about subtle issues like data type conversion), which is why I agree that the other backends should continue to use the "prepared" API by default.

Thank you for your patience and for your very valuable input!

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

@krlmlr
The Teradata DBI Driver implementation of the dbGetQuery method looks similar to your example code above; however, please note that if the app specifies immediate = FALSE when calling dbGetQuery then the request will only be prepared and not executed, which will cause the call to dbFetch to fail.

We still need guidance as to the expected behavior from dbGetQuery when no bound parameter values are specified and immediate = FALSE is specified. Thanks!

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

Here is what our implementation of dbGetQuery looks like:

setMethod ("dbGetQuery", signature ("TeradataConnection", "character"), function (conn, statement, n = -1, params = NULL, immediate = NA, ...) {

  # Optional arguments params and immediate appear before dots in order to accommodate partial argument matching.

  if (conn@m_bTraceLog) {
    cat (paste0 ("> enter dbGetQuery ", conn, " immediate=", immediate, " ", statement, "\n"))
    on.exit (cat (paste0 ("< leave dbGetQuery ", conn, " immediate=", immediate, " ", statement, "\n")))
  }

  res <- DBI::dbSendQuery (conn, statement, params, immediate)
  tryCatch ({

    return (DBI::dbFetch (res, n = n))

  }, finally = {

    DBI::dbClearResult (res)

  }) # end finally

}) # end method dbGetQuery
@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 23, 2019

What needs to be done to actually execute the request in the immediate = FALSE case? Why would the subsequent dbFetch() call fail?

Have you seen DBIlog, which is a generic logging wrapper for DBI connections? I'm planning to send it to CRAN soon.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

The DBI documentation says that dbFetch is supposed to stop with an error if dbFetch is called before dbBind. On this page https://dbi.r-dbi.org/reference/dbbind

Until dbBind() has been called, the returned result set object has the following behavior:

  • dbFetch() raises an error (for dbSendQuery())

In both your version of dbGetQuery and our version of dbGetQuery, when no params are specified and immediate = FALSE is specified, the dbGetQuery method calls dbSendQuery to just prepare (but not execute) the SQL request and then calls the dbFetch method without having called the dbBind method.

The dbFetch method is supposed to stop with an error in that situation.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 23, 2019

I see now that the text is unclear -- it's referring only to queries that contain placeholders. For queries that don't contain placeholders, the dbFetch() call is valid even without calling dbBind() :

  1. res <- dbSendQuery()
  2. dbFetch(res)
  3. dbClearResult(res)

We might need a basic documentation that describes these workflows, for both parametrized and non-parametrized queries.

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 23, 2019

OK, let's assume that the SQL request is parameterized. What is the expected behavior of the dbGetQuery method when called with a parameterized SQL request, but no bound parameter values, and immediate = FALSE?

In that situation, both your version of dbGetQuery and our version of dbGetQuery will fail when calling the dbFetch method.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 23, 2019

Yes, this is the expected behavior, regardless of the value of the immediate argument.

library(DBI)

conn <- dbConnect(RSQLite::SQLite())

dbGetQuery(conn, "SELECT ? AS a", params = list(1))
#>   a
#> 1 1

dbGetQuery(conn, "SELECT ? AS a")
#> Error: Query needs to be bound before fetching

Created on 2019-04-23 by the reprex package (v0.2.1.9000)

@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 24, 2019

@krlmlr @hadley
We have discovered that dbplyr expects dbSendQuery to immediately execute the SQL request and calls dbFetch after dbSendQuery -- please note function db_collect.DBIConnection:
https://github.com/tidyverse/dbplyr/blob/master/R/verb-compute.R#L118

We suspect that other existing libraries, like dbplyr, also expect dbSendQuery to immediately execute the SQL request and call dbFetch after dbSendQuery.

The Teradata DBI Driver must be able to interoperate with dbplyr and other existing libraries that expect immediate execution from dbSendQuery, so the Teradata implementation of dbSendStatement and dbSendQuery will have a default argument immediate = NA and the Teradata DBI Driver will assume immediate execution for the default immediate = NA.

The Teradata DBI Driver will behave as follows for the dbExecute, dbGetQuery, dbSendStatement, and dbSendQuery methods:

  • When bound parameter values are specified with the params argument, the immediate argument is ignored, and the SQL request is executed immediately.
  • Argument immediate = NA is the default for the dbExecute, dbGetQuery, dbSendStatement, and dbSendQuery methods.
  • When no bound parameter values are specified, and immediate = NA or immediate = TRUE is specified, then the SQL request is executed immediately.
  • When no bound parameter values are specified, and immediate = FALSE is specified, then the SQL request is prepared but not executed.
  • When no bound parameter values are specified, and immediate = FALSE is specified, then dbGetQuery will fail with an error due to fetch attempt before bind.
@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 24, 2019

dbFetch() can be run immediately after dbSendQuery() for non-parametrized queries. The "Examples" section in ?dbFetch gives a hint, but I agree it's not very explicit. The documentation needs improvement here.

The following reprex gives an overview over the expected behavior (irrespective of the immediate argument):

library(DBI)

con <- dbConnect(RSQLite::SQLite())

res <- dbSendQuery(con, "SELECT 1 AS a")
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

res <- dbSendQuery(con, "SELECT 1 AS a")
dbBind(res, params = list())
#> Error: Query does not require parameters.
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

res <- dbSendQuery(con, "SELECT ? AS a")
dbBind(res, params = list(1))
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

res <- dbSendQuery(con, "SELECT ? AS a")
dbFetch(res)
#> Error: Query needs to be bound before fetching
dbClearResult(res)

dbDisconnect(con)

Created on 2019-04-24 by the reprex package (v0.2.1.9000)

dbBind() is required if the query has parameters, and also throws an error if the query doesn't require parameters.

Currently, dbSendQuery() is required to raise an error if the query syntax is invalid: https://dbi.r-dbi.org/reference/dbsendquery#value. Would it help if that requirement is lifted?

Let me try to rephrase the problem, please correct me if I'm mistaken.

  1. The query and all parameters should be sent "in one packet" to Teradata.
  2. Currently, there's no way to tell from the dbSendQuery() call if the query will contain parameters. Analysis of the query text is not an option.
  3. Some non-parametrized queries even require a different "direct" API. However, in principle, the "parametrized" API can be used for non-parametrized queries.
@tomcnolan

This comment has been minimized.

Copy link
Author

@tomcnolan tomcnolan commented Apr 24, 2019

@krlmlr Yes, your points 1, 2, 3 are correct.

For the Teradata DBI Driver to provide this behavior from your example, dbSendQuery must attempt to execute the SQL request. The driver would not know before execution whether the request contains parameter markers or not.

res <- dbSendQuery(con, "SELECT 1 AS a")
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

Per your point 3, the driver cannot blindly attempt to prepare all SQL requests, because the Teradata Database supports a prepare operation for most, but not every, kind of SQL request. The error returned by the Teradata Database for an unsupported prepare operation is the same as the generic syntax error.

The Teradata DBI Driver needs explicit unambiguous direction from the app as to whether the app wants to prepare versus the app wants to execute. Note that this problem doesn't arise in other language database APIs such the JDBC API which offers non-parameterized SQL functionality via Connection.createStatement and Statement.execute separately from parameterized SQL functionality via Connection.prepareStatement and PreparedStatement.execute.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Apr 25, 2019

Thanks.

It feels that lifting the requirement that dbSendQuery() checks the query syntax would help, even without introducing an immediate argument (see further below).

When to send the query to the database?

The Teradata DBI Driver should send the query to the database on one of the following events:

  1. dbSendQuery() is called with the params argument set
    • if query parameters are inconsistent, the database throws an error
    • params = list() is a supported mode of operation
  2. dbFetch() is called on the result returned by dbSendQuery(): the query is sent to the database, indicating that no parameters are given
    • if parameters are missing, the database checks and throws an error
  3. dbBind() is called: the query and all parameters are sent to the database
    • if query parameters are inconsistent, the database throws an error

The following example illustrates these modes of operation:

library(DBI)

# Database communication occurs after <<<<< lines

con <- dbConnect(RSQLite::SQLite())

## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
res <- dbSendQuery(con, "SELECT ? AS a", params = list(1))
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

res <- dbSendQuery(con, "SELECT 1 AS a")
## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

res <- dbSendQuery(con, "SELECT ? AS a")
## <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dbBind(res, params = list(1))
dbFetch(res)
#>   a
#> 1 1
dbClearResult(res)

Created on 2019-04-25 by the reprex package (v0.2.1.9000)

As per your requirement, no client-side checking occurs, and the query and all parameters are sent in at the same time. No need to introduce new arguments to dbSendQuery() for now. The only behavior that is not here supported would be the "eager error checking" currently in place, I'm happy to do away with it:

library(DBI)

con <- dbConnect(RSQLite::SQLite())

dbSendQuery(con, "SELECTT ? AS a")
#> Error: near "SELECTT": syntax error

Created on 2019-04-25 by the reprex package (v0.2.1.9000)

It feels like dbSendQuery() is a misnomer here, in the sense that it does not always sends the query to the database. We're bound to use that name and the behavior for compatibility reasons, but we should update documentation to reflect this detail.

Most users will/should use dbGetQuery() and dbExecute() anyway, most of these subtleties don't occur there.

"Prepared" vs. "direct" API

This would come in addition to the behavior described above. It would only hint at the choice of the API. In particular, dbSendQuery(immediate = TRUE) could also send the query to the database right away, because no parameters need to be waited for.

It would help if you please could point me to the upstream API the Teradata DBI Driver is using, and perhaps to the DBI Driver's source code.

@krlmlr krlmlr added this to the 1.1.0 milestone Aug 23, 2019
@hannesmuehleisen

This comment has been minimized.

Copy link
Contributor

@hannesmuehleisen hannesmuehleisen commented Sep 5, 2019

From my perspective, the duality of dbSendQuery to either execute or prepare a query was always strange. I think the cleanest solution would be to add a dbPrepare method, which could be called by a default dbSendQuery implementation if parameters are given.

@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Sep 7, 2019

Unfortunately, that ship has sailed now -- I'd rather not add new methods to DBI at this time.

@krlmlr krlmlr closed this in #282 Sep 23, 2019
krlmlr added a commit that referenced this issue Sep 23, 2019
- Specify `immediate` argument to `dbSendQuery()`, `dbGetQuery()`, `dbSendStatement()` and `dbExecute()` (#268).
krlmlr added a commit to r-dbi/DBItest that referenced this issue Sep 23, 2019
- Specify `immediate` argument (r-dbi/DBI#268).
@krlmlr

This comment has been minimized.

Copy link
Member

@krlmlr krlmlr commented Dec 16, 2019

DBI 1.1.0 is on CRAN now. Thanks again for your input!

@hannesmuehleisen

This comment has been minimized.

Copy link
Contributor

@hannesmuehleisen hannesmuehleisen commented Dec 17, 2019

What we plan to do in DuckDB is always internally prepare statements, even if no parameters are given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.