Cancel signal does not interrupt cursored query fetching #536

frdeboffles · 2022-08-03T00:15:23Z

When going through the simple query protocol the Flux operator discardOnCancel (https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/PostgresqlStatement.java#L248) does not seem to be propagated to the ReactorNettyClient.Conversation sink.

When a query is canceled, the Operators#discardOnCancel operator is supposed to discard the cancel event and send a very large Subscription#request to the subscriptions (see https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/util/FluxDiscardOnCancel.java#L125). But the FluxSink#onRequest is never called (see https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/client/ReactorNettyClient.java#L710) when FluxDiscardOnCancel.FluxDiscardOnCancelSubscriber#cancel is called. This leads to a dead ReactorNettyClient.Conversation since it is not canceled and its demand is exhausted.

Since this ReactorNettyClient.Conversation is able to handle the cancel state, I removed the Flux operator discardOnCancel from the PostgresqlStatement. Then everything worked as the ReactorNettyClient.Conversation was able to go through until the end even if the sink was canceled.

Do we still need this discardOnCancel operator here? If so, why is the large request not propagated to the ReactorNettyClient.Conversation?

The text was updated successfully, but these errors were encountered:

mp911de · 2022-08-04T08:11:17Z

When a query is canceled, the Operators#discardOnCancel operator is supposed to discard the cancel event and send a very large Subscription#request

Not necessarily. When canceling the frontend publisher (the one used by the code that calls the driver), we must first drain the protocol state to avoid lingering responses on the inbound channel. We do not know what's next when canceling the frontend publisher. We can only reason about certain arrangements such as cursor consumption.

The driver is able to pipeline requests and canceling a frontend publisher can easily cancel the "next" command that was already sent to the server.

But the FluxSink#onRequest is never called

I'm actually wondering why this is. The mentioned discardOnCancel operator protects the Publisher<Result> against cancellation (i.e. when applying flatMap or concatMap operators). Do you have a bit of code that reproduces the problem?

frdeboffles · 2022-08-09T14:48:02Z

I'll try to come up with a small example. Thank you for your response.

frdeboffles · 2022-08-19T17:49:34Z

Sorry for the delay in responding.
While trying to reproduce, I found out that the issue of FluxSink#onRequest not being called with a large value was only happening when I was not using parameter bindings.
Here is a unit test based on the AbstractIntegrationTests that reproduces the issue.

package io.r2dbc.postgresql;

import java.time.Duration;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import org.springframework.jdbc.core.JdbcOperations;
import reactor.test.StepVerifier;

final class CancelIntegrationTests extends AbstractIntegrationTests {

  @Override
  protected void customize(PostgresqlConnectionConfiguration.Builder builder) {
    builder.fetchSize(100);
  }

  @BeforeEach
  void setUp() {
    super.setUp();
    JdbcOperations jdbcOperations = SERVER.getJdbcOperations();
    jdbcOperations.execute("DROP TABLE IF EXISTS foo;");
    jdbcOperations.execute("CREATE TABLE foo AS \n"
        + "  SELECT i FROM generate_series(1,200000) as i;");
  }

  void cancelExchange(boolean withBinding) {
    io.r2dbc.postgresql.api.PostgresqlStatement statement;
    if (withBinding) {
      statement = this.connection.createStatement("SELECT * FROM foo WHERE $1 = $1")
        .bind(0, 1);
    } else {
      statement = this.connection.createStatement("SELECT * FROM foo");
    }
    statement
      .execute()
      .flatMap(r -> r.map((row, meta) -> row.get(0, Integer.class)))
      .as(StepVerifier::create)
      .expectNextCount(5)
      .thenCancel()
      .verify(Duration.ofSeconds(2));
  }

  @Test
  @Timeout(10)
  void cancel_withBinding() {
    for (int i = 0; i < 10; i++) {
      try {
        cancelExchange(true);
      } catch (AssertionError e) {
        System.out.println("Ignore error " + e.getMessage());
      }
    }
  }

  @Test
  @Timeout(10)
  void cancel_withoutBinding() {
    for (int i = 0; i < 10; i++) {
      try {
        cancelExchange(false);
      } catch (AssertionError e) {
        System.out.println("Ignore error " + e.getMessage());
      }
    }
  }
}

In the above tests, I create a table with 200 000 rows and then request everything with a fetch size of 100.
When running with binding, the test cancel_withBinding() runs in about 1.5s on my machine without errors. Also setting up debug conditional logs on Conversation#incrementDemand shows that a demand of Long.MAX_VALUE is done upon each cancellation.
Now, when running without the binding, the test cancel_withoutBinding() fails on the 10 seconds timeout. The first call is successful but the second call is blocked until the first call has processed all the 200 000 rows (they are not sent to the consumer, therefore on the first iteration of the loop the behavior is identical on both tests). On this one, I was unable to see any call to Conversation#incrementDemand with Long.MAX_VALUE.

I hope this helps. Please note that our workaround is to add a fake binding to our requests.

mp911de · 2023-06-19T07:52:28Z

Thanks a lot for providing more detail on this. The issue is that there is quite some stream mapping going on. There are outer and inner streams that are guarded by discardOnCancel. The outer operator did not propagate the cancellation to the inner one and therefore, the fetching of the large result keeps going.

I fixed the issue now and the inner loop no longer keeps fetching its results.

[resolves #536] Signed-off-by: Mark Paluch <mpaluch@vmware.com>

mp911de changed the title ~~Cancel signal not fully propagated to the ReactorNettyClient.Conversation~~ Cancel signal does not interrupt cursored query fetching Jun 19, 2023

mp911de added the type: bug A general bug label Jun 19, 2023

mp911de closed this as completed in fd7d24c Jun 19, 2023

mp911de added a commit that referenced this issue Jun 19, 2023

Cancel cursor fetching if the outer stream gets canceled.

5688111

[resolves #536] Signed-off-by: Mark Paluch <mpaluch@vmware.com>

mp911de added this to the 1.0.2.RELEASE milestone Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel signal does not interrupt cursored query fetching #536

Cancel signal does not interrupt cursored query fetching #536

frdeboffles commented Aug 3, 2022

mp911de commented Aug 4, 2022

frdeboffles commented Aug 9, 2022

frdeboffles commented Aug 19, 2022

mp911de commented Jun 19, 2023

Cancel signal does not interrupt cursored query fetching #536

Cancel signal does not interrupt cursored query fetching #536

Comments

frdeboffles commented Aug 3, 2022

mp911de commented Aug 4, 2022

frdeboffles commented Aug 9, 2022

frdeboffles commented Aug 19, 2022

mp911de commented Jun 19, 2023