-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow consuming arbitrary result segments (rows, out parameters, update count, messages) #27
Comments
That said, calling Right now, I also think we should include bits of this answer into the Javadoc/spec doc. |
So, we cannot probe |
Exactly, if you consume rows, you can't get the update count and vice versa. I'm not sure whether a result type would work since reactive is all about deferring and streaming. So we would need to consume the response to inspect it but at the same time, we do not want to buffer responses so the response would be again consumed by peeking on the stream. Can you shed a bit of light on the use case? In Spring Data R2DBC we basically have a similar situation. We have two cases where affected row count vs. rows plays a role:
|
Well, the use case is simple, see also this discussion: https://groups.google.com/forum/#!topic/r2dbc/QZpTpQtj1HA Some databases (e.g. SQL Server) allow for arbitrary statement batches, e.g. DECLARE @t TABLE(i INT);
INSERT INTO @t VALUES (1),(2),(3);
RAISERROR('message 1', 16, 2, 3);
RAISERROR('message 2', 16, 2, 3);
SELECT * FROM @t
RAISERROR('message 3', 16, 2, 3); It should produce a result like this:
Now, the behaviour of these batches are not necessarily constant / known in advance. For instance, you could wrap some There definitely needs to be some way to discover without knowing what kind of result we're getting at any given position. I know this is an edge case, even for SQL Server users. But I do think it's worth thinking about this before releasing 1.0.0.GA. This is an SPI to be consumed by tools that are not necessarily in control of the SQL being executed, so those tools need to be able to discover the results entirely dynamically. |
I think we should be able to allow result consumption and then asking for affected row counts. The other way round would not work. So you could call |
... in fact, if you do not want to peek on the stream, then the only solution I can see is to replace the current 2 methods on public interface Result {
<T> Publisher<T> map(
Function<Integer, T> rowsUpdated,
BiFunction<Row, RowMetadata, ? extends T> rowsFetched
);
} This will make it a bit harder to be forwards compatible, e.g. if stored procedure out parameter support will be added only in a later release. In that case, maybe a wrapper type for these functions might be useful: public interface Result {
interface Mapping<T> {
Function<Integer, T> rowsUpdated();
BiFunction<Row, RowMetadata, ? extends T> rowsFetched();
Function<Object, T> outParameterFetched();
}
<T> Publisher<T> map(Mapping<T> mapping);
} |
It would be possible, but complicated to get right. This is what makes JDBC so messy, the fact that it is such a stateful implementation of a stream. In order to get things right, the correct order of method calls has to be observed, and that's quite hard. Even if you say that this SPI shouldn't be used by clients directly, it will be, and there will be tons of questions by beginners. :) |
Due to the poor documentation, it’s not clear, but this is not true. It may be PostgreSQL-specific (again, better specification) but you absolutely can peek and we do maintain a small amount of state to facilitate this. We result
.rowsUpdated()
.flatMap(count -> doSomethingWithCount(count))
.switchIfEmpty(result.rows()
.flatMap(row -> doSomethingWithRow(row))) |
For SQL Server, we receive count along with the Done message which comes last (either with a direct response or with the consumption of the entire cursor) so for SQL server we can't easily cache and replay the entire stream. |
Have you considered having different API for select statements vs DML statements? ResultSet containing two almost mutually exclusive types of data (number of rows affected vs rows returned) seems to point at separate concerns being handled in one interface. Having separate methods on Also*, if R2DBC someday begins supporting streaming data into the database, it will need a separate method on
*This is a complete hypothetical right now, as we have no way of supporting streaming into the database with Cloud Spanner. |
That may sound reasonable for most queries (on an 80/20 basis), but not for edge cases, which an SPI like R2DBC should support IMO. Have you seen my comment here: #27 (comment) RDBMS like SQL Server support mixed result type statement batches, where a mix of result sets, update counts, and exceptions/messages/signals can result from a query execution. On an SPI level, I'm not really fan of distinguishing between "types" of statements. On a higher level API level, such convenience API for the 80/20 case might definitely make sense. |
@lukaseder I was kind of inspired by your comment :) Looking at the stored-proc-like syntax got me to wonder: what would/could a client do with such mixed output, other than print it. This makes a separate result type that knows what it is more useful -- instead of getting back a dozen
|
The main point of batching is always to reduce network traffic between client and server. If it is possible to send more "messages" in a single "envelope", we can improve throughput. Printing is a trivial use case of course, but composing messages for other use cases than printing them is surely desireable in general. This was exceptionally difficult to get right with JDBC. I'm here to make sure it is less difficult with R2DBC - of course, without compromise on the much more popular use case of receiving only a single result set or a single update count.
I understand the wish for distinction (it would have to focus on the nature of the outcome, not the statement type, as DML statements, assuming you mean insert/update/delete/merge, can also produce result sets via returning/output/data change delta table syntax, or triggers), but I still doubt the SPI needs to do it. An API building on top of it: Definitely. An example is this one: https://github.com/r2dbc/r2dbc-client Perhaps, what you really wish for is to be able to provide a filter / predicate already when executing a statement. This has been discussed elsewhere, among other places (I forgot?), here: https://groups.google.com/d/msg/r2dbc/12nq6d1l62Q/rPjStz5xBwAJ That discussion on the group also evolves around out parameters (which can in turn be result sets, again). They're another way for a database to interleave a new type of result with the other ones. |
Like Lukas mentioned, this SPI is intended for driver/client library communication. Neither of these components is aware which type of statement is going to be executed when passing-thru statements. Having this kind of split requires the client side to be aware of what type of SQL is going to be executed. An Regarding streaming into the database: We entirely are missing the necessary infrastructure because most database protocols work command-oriented. If you had a stream source today, you would do the following: Connection c = …;
Flux<String> stream = …;
stream.flatMap(it -> {
return c.createStatement("INSERT INTO person VALUES($1)").bind("$1", it).execute();
}).flatMap(Result::getRowsUpdated) |
In SQL Server, and since 12c in Oracle, yes, they can. |
Thanks for confirming. This is yet another reason to keep the class structure as-is. |
Thanks! Having the client-friendly layer above the driver makes sense. For Cloud Spanner, we'll have to do query type detection to handle DML, since there is no API that supports every type of query we'd have to handle. I'll stop hijacking this thread for now. |
I'm currently verifying what can be done with current versions of R2DBC. Looks like this is still an open issue?
System.out.println((
Flux.from(connectionFactory.create())
.flatMap(c -> c
.createStatement("DECLARE @t TABLE(i INT);\n"
+ "INSERT INTO @t VALUES (1),(2),(3);\n"
+ "SELECT * FROM @t;\n")
.execute())
.flatMap(it -> {
System.out.println("IT: " + it);
return it.getRowsUpdated();
})
.collectList()
.block()
)); The output being only a single update count:
I've created a new issue for this: r2dbc/r2dbc-mssql#196 |
This is still an open issue. A challenge from what we've seen in drivers is to actually know what type of result something is. To know, whether it's an update-count only, drivers need to consume the entire response for Postgres and SQL Server (making sure there are no data rows). To determine whether something is an error/warning result, drivers need to consume the entire result to ensure there are no update counts or rows associated with the response. In several cases, such an indicator would remove the streaming nature and that isn't something we're looking for. |
For context, I'm looking at the current implementation of public Mono<Integer> getRowsUpdated() {
return this.messages
.<Long>handle((message, sink) -> {
if (message instanceof AbstractDoneToken) { ... }
if (message instanceof ErrorToken) { ... }
...
public <T> Flux<T> map(BiFunction<Row, RowMetadata, ? extends T> f) {
Assert.requireNonNull(f, "Mapping function must not be null");
return this.messages
.<T>handle((message, sink) -> {
if (message.getClass() == ColumnMetadataToken.class) { ... }
if (message.getClass() == RowToken.class || message.getClass() == NbcRowToken.class) { ... }
if (message instanceof ErrorToken) { ... }
... This looks exactly like what I'm thinking of here. Except that these instanceof calls or Class comparisons are distributed across two methods. Yes, that earlier proposal (#27 (comment)) still sounds like the way to go here. It's not much more user-friendly than the JDBC version, but then again, this isn't something people do every day. In fact, it's hardly being done outside of SQL Server, Sybase, and very rarely MySQL, from what I've seen (I haven't noticed that Oracle's approach, using The cases where jOOQ needs to be able to distinguish between update counts and results, we already know the order of generated statements. |
Do you want to come up with a pull request so we can discuss your proposal for R2DBC SPI 0.9? |
Sure why not. Any guidelines of what a PR should include? Or is this for high level discussion? |
Starting with the interface extensions and signal types would be a good starting point. I think we agree that there's a need for such an extension and design-wise we have a rough idea as well. |
Clients may not know in advance whether a Result contains update counts or row data. The current SPI design does not allow for checking the Result for its contents, nor for trying out both possibilities and reverting to the other if one fails. This suggestion offers a solution to this problem: - A new Mapping<T> type is added containing functions for the individual mappings. This type is important for forwards compatibility, such that additional types of results (e.g. out parameters or signals, exceptions, warnings) can be added later - The new Mapping<T> allows for mapping between 1 element (update count, row) to 0-N elements via a Publisher<T>. This is useful to allow for skipping values directly on the individual Result level - A new map(Mapping<T>) method is added as a breaking change - Default implementations delegating to the new methods are added for the existing methods
Clients may not know in advance whether a Result contains update counts or row data. The current SPI design does not allow for checking the Result for its contents, nor for trying out both possibilities and reverting to the other if one fails. This suggestion offers a solution to this problem: - A new Mapping<T> type is added containing functions for the individual mappings. This type is important for forwards compatibility, such that additional types of results (e.g. out parameters or signals, exceptions, warnings) can be added later - The new Mapping<T> allows for mapping between 1 element (update count, row) to 0-N elements via a Publisher<T>. This is useful to allow for skipping values directly on the individual Result level - A new map(Mapping<T>) method is added as a breaking change - Default implementations delegating to the new methods are added for the existing methods Signed-off-by: Lukas Eder <lukas.eder@gmail.com>
I've created a PR without signal types yet (assuming this means warnings/exceptions/etc.): #207. Looking forward to the discussion. |
While implementing a suggestion for #27 in #207, I couldn't help but wonder if the existing Let's look at the inconsistency of this design: public interface Result {
Publisher<Integer> getRowsUpdated();
<T> Publisher<T> map(BiFunction<Row, RowMetadata, ? extends T> mappingFunction);
} Observations:
Regarding the continued evolution, I don't have a clear picture regarding |
The chance to break the API was Ever Given since we introduced the 0.x versioning scheme. While we don't want to fully redesign R2DBC, we are free to change things. The difference in design between both methods is based on how they are supposed to work. We didn't want to impose additional implementation/consumption complexity than already necessary. The row update count is a pure value being emitted eventually. There's no streaming of multiple values. Row consumption is subject to streaming and typically, each row is associated with some resources (at least buffers). Therefore handing these out as From what we've seen in drivers, consuming the updated rows count is the most-lightweight consumption of results. Row consumption (or avoiding consumption of rows) doesn't add a significant overhead as drivers need to decode metadata/row frames anyway to be able to parse the wire protocol. Additional overhead comes only with
The idea of
Having a mapping/flat-mapping operator that can handle out parameters seems to be a way out. Getting more insights into how one would like to consume out params would be indeed beneficial to come up with a proper design. In any case, collection out parameters such as arrays should work well with a Regarding cursors, it seems that databases return a reference to a cursor that then needs to be fetched (see |
Alright, thanks for the explanations. I will try to think about this stuff again next week to see what a client like jOOQ would expect from the SPI to be able to read all sorts of OUT parameters. For most cases, consuming them all in one go doesn't seem unreasonable, so a dedicated API like create or replace function f (a out int, b out int)
as
$$
begin
a := 1;
b := 2;
end;
$$
language plpgsql;
select * from f() Resulting in:
It's both weird and elegant 😀. Other RDBMS are not like that, but they could be. So, perhaps, the existing A |
I'm assuming that if the statement's returning stream is finished, the
Publisher
returned byStatement#execute()
will simply stop producing results.But if there is a result, in order to distinguish whether it is an update count or a result set, should we simply try calling
Result#getRowsUpdated()
, and check if the resultingPublisher
is empty (which would mean that there must be a result set)?Maybe, it would be a bit nicer if there was a third method
Result#getResultType()
returning an enum that describes the result. Or, at least, specify the behaviour in the JavadocThe text was updated successfully, but these errors were encountered: