Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification request on Statement::add #229

Closed
lukaseder opened this issue Jun 4, 2021 · 10 comments · Fixed by #231
Closed

Specification request on Statement::add #229

lukaseder opened this issue Jun 4, 2021 · 10 comments · Fixed by #231
Assignees
Labels
type: documentation A documentation update
Milestone

Comments

@lukaseder
Copy link
Contributor

To create a statement batch from a prepared statement with bind variables, we can see an example in the documentation:
https://r2dbc.io/spec/0.9.0.M1/spec/html/#statements.batching

// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc");
Publisher<? extends Result> publisher = statement.execute();

I've noticed that the MariaDB driver requires an additional trailing .add() call for the last set of bind values to be added to the batch, more like:

// connection is a Connection object
Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ($1, $2)");
statement.bind(0, "John Doe").bind(1, "Happy Books LLC").add();
statement.bind(0, "Jane Doe").bind(1, "Scary Books Inc").add();
Publisher<? extends Result> publisher = statement.execute();

Unfortunately, the spec isn't very clear about who's right. Both versions make sense in a way:

  • The documentation version is consistent with adding only one set of binds and then calling Statement::execute directly, without any call to Statement::add.
  • The MariaDB version is consistent with JDBC, where Statement::addBatch is called after each set of bind values, and finally the batch is executed with Statement::executeBatch. It is also more consistent with io.r2dbc.spi.Batch, where you need two calls to Batch::add to execute 2 statements, not 1 call.

I think that last bit hints at the real problem here. Unlike JDBC, the SPI is re-using Statement::execute, and as such, leaving room for interpretation to the semantics of Statement::add

I think there are two things to be improved here:

  1. Clearly specify whether Statement::add is
    1. required for the last set of binds (then, most drivers will have a bug)
    2. optional (then, the MariaDB driver has a bug, and another ambiguity arises, see below)
    3. wrong (then, the MariaDB driver has a bug)
    4. implementation specific (undesirable, as the callers would be left with the burden of writing driver-agnostic code, but then, no one has a bug)
  2. Optionally, refactor the API to remove this ambiguity, to require something more similar to JDBC and to io.r2dbc.spi.Batch

There's an ambiguity with respect to the optionality of the Statement::add call as suggested in 1ii). Consider this example, without binds:

Statement statement = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
statement.add(); // This adds 1 row
statement.add(); // This adds 1 row
Publisher<? extends Result> publisher = statement.execute(); // What happens here?

Without binds (which is an edge case, yes, but the SPI should specify what happens in this case), there may or may not be an additional, 3rd row added. I tend to think that there's a third row, consistent with:

Statement s1 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s1.execute(); // 1 row is inserted

Statement s2 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s2.add().execute(); // 2 rows are inserted

Statement s3 = connection.createStatement("INSERT INTO books (author, publisher) VALUES ('a', 'b')");
Publisher<? extends Result> publisher = s3.add().add().execute(); // 3 rows are inserted

So, all of this hints at 1iii) or 2) being the most desirable options here, with a strong favour of 1iii), because an API re-design is costly, and probably not really necessary here.

@mp911de
Copy link
Member

mp911de commented Jun 4, 2021

Statement.add() should semantically translate to createStatement(…) providing a fluent API to specify bindings. It's also in place to avoid the need for overhead for inspecting whether a statement is parametrized. Calling add() should record the current statement state of bindings and prepare the statement object for accepting the next set of bindings whereas no bound parameters represent an empty binding set.

We should refine the specification wording to reflect that intent.

cc @rusher @rhedgpeth

@mp911de mp911de added the type: documentation A documentation update label Jun 4, 2021
@rusher
Copy link

rusher commented Jun 22, 2021

Is it the best solution @mp911de ?

How framework expect using driver when batching, probably looping, then calling add, like https://github.com/micronaut-projects/micronaut-r2dbc/blob/master/data-r2dbc/src/main/java/io/micronaut/data/r2dbc/operations/DefaultR2dbcRepositoryOperations.java#L774

postgresql and mysql driver implementation when having parameter doesn't care about last add() :
i.e. in the following example the last .add() can either be present or remove :

        connection
                .createStatement("INSERT INTO batchStatement values ($1, $2)")
                .bind(0, 2)
                .bind(1, "test")
                .add()
                .bind(0, 3)
                .bind(1, "test2")
                .add()
                .execute()

I'll change mariadb driver to do likewise if spec indicate that is the way.

@mp911de
Copy link
Member

mp911de commented Jun 22, 2021

add is defined as:

Save the current binding and create a new one.

which translates to

      connection
                .createStatement("INSERT INTO batchStatement values ($1, $2)")
                .bind(0, 2)
                .bind(1, "test")
                .add()
                .bind(0, 3)
                .bind(1, "test2")
                .execute()

Calling add before execute without bindings should actually fail on the server-side so that the server reports missing bindings.

@rusher
Copy link

rusher commented Jun 22, 2021

It makes sense, this was initial implementation, changed after mariadb-corporation/mariadb-connector-r2dbc#8 after differences with other drivers that permits .add() i did not check that was optionnal in those connectors at that time. :/

anyway, at least clarifying this is nice.

@mp911de mp911de self-assigned this Jun 23, 2021
@mp911de
Copy link
Member

mp911de commented Jun 23, 2021

I currently struggle to find how we can improve the specification. The method is documented with Save the current binding and create a new one. and the docs show a correct example. Would extend the Javadoc to the following help:

    /**
     * Save the current binding and create a new one to indicate the statement should be executed again with new bindings provided through subsequent calls to {@code bind} and {@code bindNull}.
     * …
     */
    Statement add();

@lukaseder
Copy link
Contributor Author

I think the difficulty both for users and implementors of the SPI is to understand that the last binding must not be "added", but executed directly, and that this works differently from JDBC.

Perhaps this is better explained by example than by prose?

Also, the TCK seems to be wrong:

https://github.com/r2dbc/r2dbc-spi/blob/47e2517/r2dbc-spi-test/src/main/java/io/r2dbc/spi/test/TestKit.java#L329-L343

And

https://github.com/r2dbc/r2dbc-spi/blob/47e2517/r2dbc-spi-test/src/main/java/io/r2dbc/spi/test/TestKit.java#L577-L593

@mp911de
Copy link
Member

mp911de commented Jun 23, 2021

Good catch, need to fix that. I'll come up with a pull request.

@lukaseder
Copy link
Contributor Author

Good catch, need to fix that. I'll come up with a pull request.

Perhaps, how about two types of test cases:

  • A test that exposes the correct behaviour and runs
  • A test that has a trailing add() call and expects failure

@mp911de
Copy link
Member

mp911de commented Jun 23, 2021

A test that has a trailing add() call and expects failure

Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.

@lukaseder
Copy link
Contributor Author

Makes sense for 0.9, introducing that kind of change in a bugfix release asks for trouble.

Sure, I'm thinking that all of this issue here is going into 0.9 only, for the benefit of new, future drivers that will get it right from the beginning.

@mp911de mp911de added this to the 0.8.6.RELEASE milestone Jun 23, 2021
@mp911de mp911de linked a pull request Jul 2, 2021 that will close this issue
mp911de added a commit that referenced this issue Sep 9, 2021
Javadoc now contains an example and outlines the semantics. Also, revised TestKit to reflect the correct usage of add() and introduced a new test that is expected to fail on wrong Statement.add usage.

[resolves #229][#231]
Signed-off-by: Mark Paluch <mpaluch@vmware.com>
@mp911de mp911de closed this as completed in 3aaf4cc Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation A documentation update
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants