Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exposing Postgres and MySQL drivers through R2DBC #98

Closed
mp911de opened this issue Mar 4, 2019 · 22 comments

Comments

Projects
None yet
3 participants
@mp911de
Copy link
Contributor

commented Mar 4, 2019

R2DBC is an initiative to establish a common SPI for relational database drivers embracing reactive programming properties: Event-oriented, non-blocking, and ideally stream-oriented access to databases. R2DBC is built entirely on Reactive Streams defining a minimal API surface to be implemented by drivers without the need to re-implement common client functionality in every driver.

What is the benefit of doing so?

Benefits:

  • Following an API that is well-suited for client developers without the need to entirely change how the client is supposed to work
  • Participating in a growing client-ecosystem (MyBatis, draft, JDBI, Spring) with clients providing a humane API. Application developers get a choice of which client they want to use for specific use-cases.
  • Ideomatic usage with Kotlin Coroutines for Reactive Streams.
  • With the choice of clients, jasync-sql isn't required to provide additional functionality such as object mapping or SQL generation. That's up to the client.
  • Choice of plug and play for a growing eco-system of drivers beyond Postgres, MySQL, H2, and Microsoft SQL Server.

Drawbacks:

  • https://xkcd.com/927/
  • Dependency on Reactive Streams (could be optional, still)
  • Additional complexity by maintaining two driver frontends
@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

In order to understand what is exactly required to do so, it will be great to get some more details on how exactly to implement the SPI.
In addition, I guess there are some features differences (I know one lib but not the other so not 100% sure). Some that I have noticed:

  • Batching is not supported in jasync.
  • I couldn't find connection pool in r2dbc. What is the concept over there?
  • There is no prepared statement support in r2dbc?

In addition, in order to estimate the effort, can you please tell if you (or someone else from pivotal) can assist with making those api's work together?

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

Thanks for having a look. The core concept behind R2DBC is providing a minimal SPI that abstracts the database protocol without building abstractions for user-space features such as abstracting bind parameters into a common form (e.g. JDBC's ? is a good example for things that each driver has to implement).

Batching is not supported in jasync

Batching is grouping multiple statements together and issuing these in as few TCP packets as possible and then consume one or multiple result sets. If you don't provide batching, then throwing UnsupportedOperationException is how you would implement Connection.createBatch().

I couldn't find connection pool in r2dbc. What is the concept over there?

We feel that with R2DBC we provide a minimal API surface that can be filled by drivers in the most useful way. There is no pooling in the SPI because there is no need (from a client perspective) to distinguish between new connections and pooled connections. The entry point for obtaining a Connection is ConnectionFactory. It can produce either a new connection each time you ask for a connection or return a pooled instance. That being said, a ConnectionFactory can be wrapped by another ConnectionFactory that returns cached Connection instances.

There is no prepared statement support in r2dbc?

This is not required to express on SPI level. Simple statements do not require preparation. If one wants to use prepared statements for simple statements, then a driver can provide the option to use prepared statements for all simple statements.

Statements with parameters (or statements that should be cursored) can be identified by the driver and handled accordingly.

I did an excercise to create a wrapper for a callback-based driver and it turns out to be a quite straight-forward approach, once you're familiar with your reactive library of choice.

We're happy to assist you during review.

One final note: R2DBC builds on backpressure and stream-oriented data access (e.g. emitting items one by one and not reading more data from a cursor than requested by the application). The APIs used in jasync are non-blocking and Future-oriented, so a R2DBC wrapper can benefit from these. Happy to discuss extension options for e.g. streaming-oriented consumption of a ResultSet.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

I think maybe the best approach here is to create a wrapper that will implement the SPI (similar to the example you provided). We did a similar thing with micronaut and vertx. The main difference that I see is the ability to provide a callback that will handle rows. In jasync you'll get the rows back from the result. I guess that is doable.

not reading more data from a cursor than requested by the application

I am not 100% sure what that means, because the buffers of the client (netty in our case) are still filling with the data returns from the server whether the client read it or not.

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

I am not 100% sure what that means, because the buffers of the client (netty in our case) are still filling with the data returns from the server whether the client read it or not.

There are two things you could do:

  • Read data in chunks (e.g. issue multiple EXECUTE calls for a portal/cursor with a proper fetch size)
  • Disable auto-read (this might be a bit too dramatic)

Netty allows disabling auto-read and this prevents netty from reading data IFF the client has read sufficient data. This is a powerful switch and might be not applicable to clients that do not expect this kind of behavior.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

@mp911de Do you know what other drivers do in that case?

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

The SQL Server driver does both things - multiple cursor fetches and read suspension. Suspending reading is what it gets basically for free as the I/O layer is built on top of Reactive Streams (reactor-netty).

Other drivers, such as MongoDB and Apache Cassandra make only use of cursor reads along with a configurable fetch size.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

I think I have a better understanding of the scope now.
Let's say we will start without additional features to the driver, but just to develop a wrapper for it.
Do you think a better place to put that is here or as a module of https://github.com/r2dbc ?
In my opinion, the latter might be better, similar to the way it was done with both vertx and micronaut.

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

The plan sounds good for a first step.

Given that a wrapper is very closely related to the driver and that it might want to use internal functionality, that is not exposed through public API I strongly encourage putting the code as close as possible to the original project. This gives you full control over the code and its lifecycle. Once the repo is in place, we will add links to community-driven drivers on our r2dbc.io page.

We aim for evolution of drivers with R2DBC and aim for driver vendors to have code ownership from the earliest point in time. Our goal is facilitating a common API between drivers and clients and not so much maintenance of drivers.

One of the areas we need to improve is how to create a good eco-system integration so people interested in using a driver can quickly find a driver they want to use. Maybe a BOM (bill-of-materials) or a driver-registry can help.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

Although I don't have a plan of using internal API, I am ok with keeping it here and referencing it from r2dbc pages.
I will start work on it and link the PR from here, so we can discuss there any questions arise.
As I mentioned before, any assistance will be appreciated! Thanks for the suggestion!

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2019

Awesome news! Let me know if you have any questions. Excited to look at your PR.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2019

see this PR: #99

@mirromutth

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Should r2dbc Connection to wrap the pooling jasync connection?

I see code like this:

import com.github.jasync.sql.db.Connection as JasyncConnection

class JasyncClientConnection(private val jasyncConnection: JasyncConnection) : Connection, Supplier<JasyncConnection> {
    // ...
}

So if this jasyncConnection is a ConnectionPool, the transaction state cannot be maintained on single r2dbc Connection, like this:

val r2dbcConnection = xxxFactory.getOrCreateOrXxxx()
r2dbcConnection.beginTransaction().then(Mono.defer {
    val someLargeTask = r2dbcConnection.createStatement()
    // ...
    someLargeTask.execute().then(r2dbcConnection.commitTransaction())
})

Even user can create a transaction in thread A and commit it on thread B, which may have no subscriber context, so transaction state should be marked by single ConcreteConnection.

Maybe we should use ConcreteConnection instead of Connection:

import com.github.jasync.sql.db.ConcreteConnection as JasyncConnection

Or append a method isPooling to Connection, that is a bad idea which changes jasync API so I didn't propose it.

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

There are a couple of states on a connection:

  • Auto-Commit
  • Isolation Level
  • Savepoints

Typically, any transactional participant should be a good citizen in the sense that it considers the initial state and then leaves the connection back in the same state.

I think, having a pool implementation is of great value because your driver does not require an external pool. I also think pooling or non-pooling should be entirely transparent and what the pool could do is providing defaults for auto-commit and isolation-level and make sure that newly obtained connections are associated with that state. This is totally independent of R2DBC though.

To your initial question, whether an R2DBC Connection should wrap a pooled or a non-pooled one. I think, both. You should be able to handle that preference through the ConnectionFactory. It would be great to configure the R2DBC ConnectionFactory with an unpooled implementation, and with a pooled one.

Going further, R2DBC has a ConnectionFactory discovery mechanism that allows URL-based configuration. URL's are parsed in the SPI and ConnectionFactoryProviders are asked whether they can provide a ConnectionFactory (here's the MS SQL Server implementation).

The idea behind is, that a lot of configuration frameworks use already URL's for JDBC, MongoDB, Redis and other databases and this makes configuration very convenient for users.

@mirromutth

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

Sorry, I may have to correct my words, I understand that pooled connections are necessary.

I means...use different implementations to handle different connections, for example:

// first kt
class JasyncClientPooledConnection(private val jasyncConnection: JasyncPooledConnection) {
     // ...
}
// second kt
class JasyncClientConnection(private val jasyncConnection: JasyncConcreteConnection) {
     // ...
}
@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

In my opinion, pooling is important for a complete solution and to provide such jasync added a pool. However, when using the r2dbc spi I think that either the spi should provide pooling interface (and probably one implementation to all drivers) or that should be a layer on top of it. Keeping it as an underlying pool might make it confusing to users for transaction and connection state like prepared statement cache etc')

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Why do you think things get confusing if the API does not express whether a connection gets reused from a pool vs. freshly created connections?

In a typical pooling setup, prepared statement caches aren't emptied but drivers do typically something opaque.

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

@mirromutth I also think there are different connection impls required for pooled vs. non pooled connections because close() needs to do two things. In any case, we can make use of inheritance and just override close().

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

I think for a transaction, for example, you might end up starting in one connection and trying to commit from another one. Consider the following example:

fun <A> inTransaction(f: (Connection) -> CompletableFuture<A>)
                            : CompletableFuture<A> {  
    return this.sendQuery("BEGIN").flatMap { _ ->    
        val p = CompletableFuture<A>()      
        f(this).onComplete { ty1 ->      
            sendQuery(
                if (ty1.isFailure) "ROLLBACK" else "COMMIT") 
            .onComplete { ty2 ->        
                if (ty2.isFailure && ty1.isSuccess)           
                    p.failed((ty2 as Failure).exception)        
                else          
                    p.complete(ty1)      
            }    
        }    
        p  
   }
}

If the connection is pooled the above might happen.

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

This isn't a pooling issue, that's a state management and expectation issue. Your source of connections must give you a consistent initial state, no matter what the source is.

If a pooled connection is left in a dirty state, then it's the responsibility of the pool to return a connection in a fresh state, at least for auto-commit and isolation level. This is independent of R2DBC and you don't want to require the application to take care of something that belongs into an infrastructure component.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

released in version 0.9.31

@oshai oshai closed this Mar 19, 2019

@mp911de

This comment has been minimized.

Copy link
Contributor Author

commented Mar 19, 2019

Awesome news. Excited to see how this progresses now. I suppose, for a release, building against R2DBC 1.0M7 instead of snapshots would give the release more stability, but there's no need to rush here.

@oshai

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

ok, I will update it and it will be on next release. for now I created a sample based on the one you did in the PR: https://github.com/jasync-sql/jasync-sql/tree/master/samples/mysql-r2dbc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.