Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel signal does not interrupt cursored query fetching #536

Closed
frdeboffles opened this issue Aug 3, 2022 · 4 comments
Closed

Cancel signal does not interrupt cursored query fetching #536

frdeboffles opened this issue Aug 3, 2022 · 4 comments
Labels
type: bug A general bug
Milestone

Comments

@frdeboffles
Copy link

When going through the simple query protocol the Flux operator discardOnCancel (https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/PostgresqlStatement.java#L248) does not seem to be propagated to the ReactorNettyClient.Conversation sink.

When a query is canceled, the Operators#discardOnCancel operator is supposed to discard the cancel event and send a very large Subscription#request to the subscriptions (see https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/util/FluxDiscardOnCancel.java#L125). But the FluxSink#onRequest is never called (see https://github.com/pgjdbc/r2dbc-postgresql/blob/main/src/main/java/io/r2dbc/postgresql/client/ReactorNettyClient.java#L710) when FluxDiscardOnCancel.FluxDiscardOnCancelSubscriber#cancel is called. This leads to a dead ReactorNettyClient.Conversation since it is not canceled and its demand is exhausted.

Since this ReactorNettyClient.Conversation is able to handle the cancel state, I removed the Flux operator discardOnCancel from the PostgresqlStatement. Then everything worked as the ReactorNettyClient.Conversation was able to go through until the end even if the sink was canceled.

Do we still need this discardOnCancel operator here? If so, why is the large request not propagated to the ReactorNettyClient.Conversation?

@mp911de
Copy link
Collaborator

mp911de commented Aug 4, 2022

When a query is canceled, the Operators#discardOnCancel operator is supposed to discard the cancel event and send a very large Subscription#request

Not necessarily. When canceling the frontend publisher (the one used by the code that calls the driver), we must first drain the protocol state to avoid lingering responses on the inbound channel. We do not know what's next when canceling the frontend publisher. We can only reason about certain arrangements such as cursor consumption.

The driver is able to pipeline requests and canceling a frontend publisher can easily cancel the "next" command that was already sent to the server.

But the FluxSink#onRequest is never called

I'm actually wondering why this is. The mentioned discardOnCancel operator protects the Publisher<Result> against cancellation (i.e. when applying flatMap or concatMap operators). Do you have a bit of code that reproduces the problem?

@frdeboffles
Copy link
Author

I'll try to come up with a small example. Thank you for your response.

@frdeboffles
Copy link
Author

Sorry for the delay in responding.
While trying to reproduce, I found out that the issue of FluxSink#onRequest not being called with a large value was only happening when I was not using parameter bindings.
Here is a unit test based on the AbstractIntegrationTests that reproduces the issue.

package io.r2dbc.postgresql;

import java.time.Duration;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import org.springframework.jdbc.core.JdbcOperations;
import reactor.test.StepVerifier;

final class CancelIntegrationTests extends AbstractIntegrationTests {

  @Override
  protected void customize(PostgresqlConnectionConfiguration.Builder builder) {
    builder.fetchSize(100);
  }

  @BeforeEach
  void setUp() {
    super.setUp();
    JdbcOperations jdbcOperations = SERVER.getJdbcOperations();
    jdbcOperations.execute("DROP TABLE IF EXISTS foo;");
    jdbcOperations.execute("CREATE TABLE foo AS \n"
        + "  SELECT i FROM generate_series(1,200000) as i;");
  }

  void cancelExchange(boolean withBinding) {
    io.r2dbc.postgresql.api.PostgresqlStatement statement;
    if (withBinding) {
      statement = this.connection.createStatement("SELECT * FROM foo WHERE $1 = $1")
        .bind(0, 1);
    } else {
      statement = this.connection.createStatement("SELECT * FROM foo");
    }
    statement
      .execute()
      .flatMap(r -> r.map((row, meta) -> row.get(0, Integer.class)))
      .as(StepVerifier::create)
      .expectNextCount(5)
      .thenCancel()
      .verify(Duration.ofSeconds(2));
  }

  @Test
  @Timeout(10)
  void cancel_withBinding() {
    for (int i = 0; i < 10; i++) {
      try {
        cancelExchange(true);
      } catch (AssertionError e) {
        System.out.println("Ignore error " + e.getMessage());
      }
    }
  }

  @Test
  @Timeout(10)
  void cancel_withoutBinding() {
    for (int i = 0; i < 10; i++) {
      try {
        cancelExchange(false);
      } catch (AssertionError e) {
        System.out.println("Ignore error " + e.getMessage());
      }
    }
  }
}

In the above tests, I create a table with 200 000 rows and then request everything with a fetch size of 100.
When running with binding, the test cancel_withBinding() runs in about 1.5s on my machine without errors. Also setting up debug conditional logs on Conversation#incrementDemand shows that a demand of Long.MAX_VALUE is done upon each cancellation.
Now, when running without the binding, the test cancel_withoutBinding() fails on the 10 seconds timeout. The first call is successful but the second call is blocked until the first call has processed all the 200 000 rows (they are not sent to the consumer, therefore on the first iteration of the loop the behavior is identical on both tests). On this one, I was unable to see any call to Conversation#incrementDemand with Long.MAX_VALUE.

I hope this helps. Please note that our workaround is to add a fake binding to our requests.

@mp911de mp911de changed the title Cancel signal not fully propagated to the ReactorNettyClient.Conversation Cancel signal does not interrupt cursored query fetching Jun 19, 2023
@mp911de mp911de added the type: bug A general bug label Jun 19, 2023
@mp911de
Copy link
Collaborator

mp911de commented Jun 19, 2023

Thanks a lot for providing more detail on this. The issue is that there is quite some stream mapping going on. There are outer and inner streams that are guarded by discardOnCancel. The outer operator did not propagate the cancellation to the inner one and therefore, the fetching of the large result keeps going.

I fixed the issue now and the inner loop no longer keeps fetching its results.

mp911de added a commit that referenced this issue Jun 19, 2023
[resolves #536]

Signed-off-by: Mark Paluch <mpaluch@vmware.com>
@mp911de mp911de added this to the 1.0.2.RELEASE milestone Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants