Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs after several iterations of a SELECT query that retrieves about 200 items in R2DBC. #16669

Open
k163377 opened this issue May 10, 2024 · 1 comment

Comments

@k163377
Copy link

k163377 commented May 10, 2024

Expected behavior

Hang does not happen.

Actual behavior

After sending several requests (less than 10) with curl to a Spring WebFlux endpoint that performs SELECT by jOOQ, the query hangs and the response cannot be retrieved.
Requests were always sent after one previous request had finished (i.e. no parallel requests were made).

If normal, the ReactorNettyClient will output a log READ COMPLETE and, almost without pause, the following.

...  [actor-tcp-nio-1] i.r.p.client.ReactorNettyClient          : [id: 0xbac1c713, L:/127.0.0.1:54855 - R:localhost/127.0.0.1:15432] READ: 32768B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
...

On the other hand, when it hangs, the log after READ COMPLETE is not output.
Although not strictly verified, there appears to be no difference in the logs output before READ COMPLETE, such as FLUSH and Response: ReadyForQuery{transactionStatus=IDLE}.

Steps to reproduce the problem

I apologise for the fact that I have only been able to test this in a product environment, which is not the smallest configuration, and even lacks sample code.

This hang has been observed with two different queries (each querying a different DB), both with a limit(200) limit on the number of fetches.
When executed directly, one of the queries completes within 0.5 seconds and the other in about 2 seconds.
The hang was definitely reproduced when 200 records were retrieved.

Also, when the SQL generated by jOOQ was issued as a string directly to a connection using createStatement, it did not hang.

The versions of Spring and other libraries are as follows.

  • kotlin -> 1.9.23
  • kotlinx-coroutines -> 1.7.3
  • org.springframework.boot:spring-boot-starter-webflux -> 3.2.4
  • io.r2dbc:r2dbc-spi -> 1.0.0.RELEASE
  • io.r2dbc:r2dbc-pool -> 1.0.1.RELEASE

I don't know if this is relevant, but this product uses jOOQ in an unusual way for RLS.

First, a context setting query is issued with postAllocate in the ConnectionPool.
The TenantIdGetter is reading the string bound to the Reactor Context (the problem was reproduced when this was replaced by Mono.just).

val defaultConnectionFactory = ConnectionFactories.get(r2dbcProperties.url)

val config = ConnectionPoolConfiguration.builder(defaultConnectionFactory)
    .postAllocate { connection ->
        // tenantIdGetter.getTenantIdMono() returns Mono<String>
        tenantIdGetter.getTenantIdMono().flatMap {
            val query = DSL.set(DSL.name("app.current_tenant"), DSL.value(it)).toString()
            connection.createStatement(query).execute().toMono()
        }.then()
    }
    .build()

return ConnectionPool(config)

Next, connection issuing is done outside of jOOQ.
Here is an excerpt of the part where the query is actually issued (replacing Flux.from with Flux.usingWhen also reproduced the problem).

private val cfi: ConnectionFactory // -> ConnectionPool is injected

fun Connection.toDslContext() = DSL.using(this, SQLDialect.POSTGRES, JOOQ_SETTINGS).dsl()

override fun <R : Record> read(query: DSLContext.() -> Select<R>): Flow<R> = cfi.create().toMono().flatMapMany { connection ->
    Flux.from(query(connection.toDslContext()))
        .doFinally {
            connection.close().toMono().subscribe()
        }
}.asFlow()

jOOQ Version

3.17.23

Database product and version

Aurora PostgreSQL 14

Java Version

17

JDBC / R2DBC driver name and version (include name if unofficial driver)

org.postgresql:r2dbc-postgresql -> 1.0.4.RELEASE

@lukaseder
Copy link
Member

Thanks for your report.

It is very hard to track down race conditions without very clear and complete reproducers. We have a template here to help with Minimal, Complete, Verifiable Examples: https://github.com/jOOQ/jOOQ-mcve

I understand that it is also hard to minimise the problem to a simple reproducer, but without a very obvious path to reproduction, from experience, I will be unlikely to reproduce the exact problem. For example, while you listed the dependencies, how can I be sure I'll be setting up the exact environment you have? What about this bit: This hang has been observed with two different queries (each querying a different DB). How are things set up exactly? Does the issue reproduce outside of Aurora, with a vanilla PostgreSQL instance? Etc.

Additionally, you're working with an older jOOQ version. Perhaps this problem has been fixed in 3.19.8?

I will be happy to look at a minimal reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants