Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: inline begin transaction #325

Merged
merged 23 commits into from Oct 23, 2020
Merged

feat: inline begin transaction #325

merged 23 commits into from Oct 23, 2020

Conversation

olavloite
Copy link
Contributor

@olavloite olavloite commented Jul 8, 2020

Description

The Spanner client library will normally prepare a fraction of the sessions in the session pool with a read/write transaction. These sessions are used when an application requests a read/write transaction. This eliminates the need for a BeginTransaction RPC to be executed as part of a transaction, as the call has already been executed in advance by the session pool.

This can however be less efficient in some specific cases:

  • When the number of read/write transactions that are needed is higher than the number of transactions the session pool is able to prepare in advance.
  • When the fraction of read/write transactions is higher than what has been configured by the user (the default is 0.2).

Cloud Spanner also supports starting a new transaction as part of a query or DML statement. This eliminates the need for a separate round-trip to the server for starting a transaction. This PR adds the possibility to instruct the Spanner client to start read/write transactions by including this option in the first query/update statement of a read/write transaction, instead of preparing the read/write transactions in the session pool. This can improve the overall read/write transaction execution performance.

Including the BeginTransaction option with the first query or update statement of a transaction does require the client to block any other statement on the same transaction until the first statement has finished. This means that this strategy could perform less well in case of a read/write transaction that includes multiple parallel reads and/or writes.

Technical Implementation

The change in this PR is implemented along the following lines:

  1. Read/write transactions that use inline BeginTransaction will no longer take a write-prepared session from the pool. That is no longer needed, as the first statement on the transaction will include a BeginTransaction option.
  2. All statements that are executed on a transaction will call the method getTransactionSelector(). This method will always return the selector to use for the statement. That selector will either be a BeginTransaction option if no transaction has yet been started, or a TransactionId if a transaction has been started.
  3. If the method determines that a new transaction must be started, it creates a future that will hold the transaction id that will be returned.
  4. If the method determines that there is no transaction id available yet, but another statement has already sent a BeginTransaction option, it will wait for the transaction id to be returned by calling get() on the transaction id future. (The statement cannot proceed, as there is no transaction id available yet, and only one statement should include a BeginTransaction option.)
  5. When the statement that included the BeginTransaction option finishes, it will call onTransactionMetadata(..). This will set the transactionId and transactionIdFuture and allow other statements to proceed.
  6. If the statement that included the BeginTransaction option causes an error, it will call onError(..) and propagate the error to the calling method. The onError(..) method will set an Aborted error on the transactionIdFuture to propagate an Aborted error to any subsequent statement that might either be waiting for a transaction id to come available, or will request one later. This will ensure that if the first error is caught and handled by the transaction, the following statement will cause an Aborted error and cause the entire transaction to retry. See this test case for an example.

@olavloite olavloite added the api: spanner label Jul 8, 2020
@googlebot googlebot added the cla: yes label Jul 8, 2020
@olavloite olavloite changed the title Inline begin tx feat: inline begin transaction Jul 8, 2020
@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jul 16, 2020

Benchmarking the current implementation gives the following results:

Benchmark                               (inlineBegin)  (writeFraction)  Mode  Cnt     Score   Error  Units
InlineBeginBenchmark.burstRead                  false              0.0  avgt        983.651          ms/op
InlineBeginBenchmark.burstRead                  false              0.2  avgt       1069.472          ms/op
InlineBeginBenchmark.burstRead                   true              0.0  avgt       1024.409          ms/op
InlineBeginBenchmark.burstRead                   true              0.2  avgt        918.951          ms/op
InlineBeginBenchmark.burstReadAndWrite          false              0.0  avgt       1141.456          ms/op
InlineBeginBenchmark.burstReadAndWrite          false              0.2  avgt       1243.078          ms/op
InlineBeginBenchmark.burstReadAndWrite           true              0.0  avgt        886.104          ms/op
InlineBeginBenchmark.burstReadAndWrite           true              0.2  avgt        901.669          ms/op
InlineBeginBenchmark.burstWrite                 false              0.0  avgt       1082.526          ms/op
InlineBeginBenchmark.burstWrite                 false              0.2  avgt       1077.063          ms/op
InlineBeginBenchmark.burstWrite                  true              0.0  avgt        699.527          ms/op
InlineBeginBenchmark.burstWrite                  true              0.2  avgt        726.350          ms/op

This indicates that especially bursty write situations would profit from including the BeginTransaction option with the first statement of a transaction, instead of using prepared transactions from the session pool.

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jul 17, 2020

Same benchmarks against a real Cloud Spanner database. The total execution times are higher, as the benchmark is executed on my local laptop against a Cloud Spanner instance, making the network latency higher than would have been the case when the benchmark would also be executed from a VM instance in the same Google cloud project and region as the Cloud Spanner instance. The general picture is the same as when benchmarking against a mock server.

Benchmark                               (inlineBegin)  (writeFraction)  Mode  Cnt     Score   Error  Units
InlineBeginBenchmark.burstRead                  false              0.0  avgt       2022.514          ms/op
InlineBeginBenchmark.burstRead                  false              0.2  avgt       2115.891          ms/op
InlineBeginBenchmark.burstRead                   true              0.0  avgt       2124.326          ms/op
InlineBeginBenchmark.burstRead                   true              0.2  avgt       2052.456          ms/op
InlineBeginBenchmark.burstReadAndWrite          false              0.0  avgt       3325.453          ms/op
InlineBeginBenchmark.burstReadAndWrite          false              0.2  avgt       3278.692          ms/op
InlineBeginBenchmark.burstReadAndWrite           true              0.0  avgt       2625.325          ms/op
InlineBeginBenchmark.burstReadAndWrite           true              0.2  avgt       2764.468          ms/op
InlineBeginBenchmark.burstWrite                 false              0.0  avgt       3653.713          ms/op
InlineBeginBenchmark.burstWrite                 false              0.2  avgt       3546.462          ms/op
InlineBeginBenchmark.burstWrite                  true              0.0  avgt       3175.214          ms/op
InlineBeginBenchmark.burstWrite                  true              0.2  avgt       3058.890          ms/op

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jul 18, 2020

Benchmark for a one transaction per second scenario, 20% writes and 80% reads.

Using prepared sessions

Method Prepare Count Time #/sec Min Max Avg
oneQps prep=Y 65 16180 4.0/s 118ms 1109ms 248.9
oneQps prep=Y 67 16192 4.1/s 120ms 1112ms 241.7
oneQps prep=Y 72 16992 4.2/s 122ms 682ms 236.0
oneQps prep=Y 66 15647 4.2/s 120ms 688ms 237.1
oneQps prep=Y 76 18680 4.1/s 121ms 1429ms 245.8
oneQps prep=Y 67 16550 4.0/s 119ms 605ms 247.0
oneQps prep=Y 61 14805 4.1/s 120ms 826ms 242.7
oneQps prep=Y 71 19090 3.7/s 120ms 1011ms 268.9
oneQps prep=Y 73 18431 4.0/s 121ms 609ms 252.5
oneQps prep=Y 67 14332 4.7/s 121ms 566ms 213.9
              243.4

Using inlined BeginTransaction

Method Prepare Count Time #/sec Min Max Avg
oneQps prep=N 74 17709 4.2/s 125ms 1897ms 239.3
oneQps prep=N 68 14057 4.8/s 126ms 758ms 206.7
oneQps prep=N 66 13551 4.9/s 120ms 529ms 205.3
oneQps prep=N 61 14169 4.3/s 127ms 495ms 232.3
oneQps prep=N 71 16553 4.3/s 125ms 788ms 233.1
oneQps prep=N 70 15774 4.4/s 121ms 595ms 225.3
oneQps prep=N 75 17562 4.3/s 122ms 569ms 234.2
oneQps prep=N 72 14216 5.1/s 122ms 610ms 197.4
oneQps prep=N 67 14875 4.5/s 120ms 594ms 222.0
oneQps prep=N 71 16011 4.4/s 119ms 557ms 225.5
              222.1

@olavloite olavloite marked this pull request as ready for review Jul 18, 2020
@olavloite olavloite requested review from skuruppu and thiagotnunes Jul 18, 2020
@olavloite olavloite added the kokoro:force-run label Jul 18, 2020
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run label Jul 18, 2020
@olavloite olavloite added the do not merge label Jul 20, 2020
@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jul 20, 2020

I temporarily added 'Do not merge' as there is one scenario that I just thought of that hasn't been tested yet: If the first statement of a transaction is a streaming SQL statement, and that statement triggers a retry of the stream halfway (i.e. with a resume token), then the retry should not include a BeginTransaction option, but reuse the current transaction.

@skuruppu
Copy link
Contributor

@skuruppu skuruppu commented Jul 21, 2020

@olavloite I haven't looked at the implementation yet but there has been a lot of discussion recently about this because the C++ client was implemented this way. But there are cases where inlining the begin transaction call can lead to transactions being committed that have failed. googleapis/google-cloud-cpp#4516 is the C++ issue.

They're in the process of confirming the idea for a fix so I think we should wait for that before proceeding with this change.

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Jul 30, 2020

@olavloite I haven't looked at the implementation yet but there has been a lot of discussion recently about this because the C++ client was implemented this way. But there are cases where inlining the begin transaction call can lead to transactions being committed that have failed. googleapis/google-cloud-cpp#4516 is the C++ issue.

They're in the process of confirming the idea for a fix so I think we should wait for that before proceeding with this change.

I've looked into this issue and the problem is very subtle. Cloud Spanner treats constraint and non-constraint errors that occur during a read/write transaction differently. Assume the following initial state:

CREATE TABLE T (K STRING(MAX), V INT64) PRIMARY KEY (K);

BEGIN TRANSACTION;
INSERT INTO T (K, V) VALUES ('One', 1);
COMMIT;

Then we try to execute the following transaction in the Java client using the normal prepared transactions from the session pool:

client
  .readWriteTransaction()
  .run(
    new TransactionCallable<Long>() {
      @Override
      public Long run(TransactionContext transaction) throws Exception {
        try {
          // This fails with a unique key constraint violation exception.
          transaction.executeUpdate(Statement.of("INSERT INTO T (K, V) VALUES ('One', 1)"));
        } catch (SpannerException e) {
          assertThat(e.getErrorCode()).isEqualTo(ErrorCode.ALREADY_EXISTS);
        }
        // This statement will also fail because the previous statement failed with a constraint violation.
        transaction.executeUpdate(Statement.of("INSERT INTO T (K, V) VALUES ('Two', 2)"));
      }
    });

The above transaction will fail because a constraint violation during a read/write transaction will invalidate that entire transaction, even though the error is caught and ignored by the client.

The following transaction will however succeed:

client
  .readWriteTransaction()
  .run(
    new TransactionCallable<Long>() {
      @Override
      public Long run(TransactionContext transaction) throws Exception {
        try {
          // This fails with an invalid argument exception (VALUES keyword missing).
          transaction.executeUpdate(Statement.of("INSERT INTO T (K, V) ('One', 1)"));
        } catch (SpannerException e) {
          assertThat(e.getErrorCode()).isEqualTo(ErrorCode.INVALID_ARGUMENT);
        }
        // This statement will succeed as an INVALID_ARGUMENT error does not fail the entire transaction.
        transaction.executeUpdate(Statement.of("INSERT INTO T (K, V) VALUES ('Two', 2)"));
      }
    });

If we include the BeginTransaction option with the first statement of a transaction, both the above transactions would succeed. The reason for that is that in both transactions the first statement will fail, which means that it will also not return a transaction id, and the BeginTransaction option will then also be included with the second statement. This statement succeeds and the transactions can commit.

The reason that a constraint violation exception (and other data exceptions) should not be allowed to commit in this case is that it technically contains two transactions, which again could cause some very subtle problems. Consider the following (not very efficient) example:

client
  .readWriteTransaction()
  .run(
    new TransactionCallable<Long>() {
      @Override
      public Long run(TransactionContext transaction) throws Exception {
        try {
          transaction.executeUpdate(Statement.of("INSERT INTO T (K, V) VALUES ('One', 2)"));
        } catch (SpannerException e) {
          if (e.getErrorCode() == ErrorCode.ALREADY_EXISTS) {
            // The row already exists, update it to ensure it has V=2.
            transaction.executeUpdate(Statement.of("UPDATE T SET V=2 WHERE K='One'"));
          }
        }
      }
    });

This transaction will always return an error when working with a prepared transaction. That is good.

It will always commit and return successfully when including BeginTransaction with the first statement of the transaction (or the second statement if the first fails, as is the case here). That is not good.

The problem with letting this transaction succeed is that the two statements are actually in two different transactions (each include a BeginTransaction option), which again means that a third and unrelated transaction could in theory have deleted the row with K='One' exactly in the time between the two statements, which again means that the UPDATE T SET V=2 WHERE K='One' statement does not update anything.

The proposed solution in googleapis/google-cloud-cpp#4516 is to do the following when the first statement of a transaction returns with an error:

  1. The first statement of the transaction includes a BeginTransaction option, but the statement fails and does not return a transaction id.
  2. The client should then call the BeginTransaction RPC explicitly to start a transaction.
  3. The statement from 1 is retried on the new transaction. The result of that retried statement is returned to the client application. This will most probably be an error, but it could be that the statement succeeds during the retry as it is executed in a different transaction.
  4. Execute the rest of the statements in the transaction normally and let the backend decide whether the transaction is allowed to proceed or not.

I've added an integration test that shows the subtle difference here:

public void testTxWithConstraintError() {
assumeFalse(
"Emulator does not recover from an error within a transaction",
env.getTestHelper().isEmulator());
// First insert a single row.
client.writeAtLeastOnce(
ImmutableList.of(
Mutation.newInsertOrUpdateBuilder("T").set("K").to("One").set("V").to(1L).build()));
long updateCount =
client
.readWriteTransaction()
.run(
new TransactionCallable<Long>() {
@Override
public Long run(TransactionContext transaction) throws Exception {
try {
// Try to insert a duplicate row. This statement will fail. When the statement
// is executed against an already existing transaction (i.e.
// inlineBegin=false), the entire transaction will remain invalid and cannot
// be committed. When it is executed as the first statement of a transaction
// that also tries to start a transaction, then no transaction will be started
// and the next statement will start the transaction. This will cause the
// transaction to succeed.
transaction.executeUpdate(
Statement.of("INSERT INTO T (K, V) VALUES ('One', 1)"));
fail("missing expected exception");
} catch (SpannerException e) {
assertThat(e.getErrorCode()).isEqualTo(ErrorCode.ALREADY_EXISTS);
}
return transaction.executeUpdate(
Statement.of("INSERT INTO T (K, V) VALUES ('Two', 2)"));
}
});
assertThat(updateCount).isEqualTo(1L);
}

  • This test case will fail when using a prepared transaction (this is actually the good behavior).
  • This test case will succeed when including BeginTransaction with the first statement.

@skuruppu
Copy link
Contributor

@skuruppu skuruppu commented Aug 6, 2020

Thanks heaps @olavloite for the investigation and for the detailed explanation. From the last couple of commits, I think you have also added a fix.

The last decision we need to make is whether we should remove the option and make this the default implementation. I want to follow up with the internal conversation to make sure that this idea is reasonable. I will then review it.

Sorry about the time it has taken me to get to this. But hopefully we can merge this soon and further simplify the session pool.

@skuruppu
Copy link
Contributor

@skuruppu skuruppu commented Aug 6, 2020

@olavloite based on a question the C++ team was asking, what do you return if the BeginTransaction() call fails? Do you return the error from BeginTransaction() or the error from the first transaction attempt?

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Aug 6, 2020

@skuruppu

The last decision we need to make is whether we should remove the option and make this the default implementation. I want to follow up with the internal conversation to make sure that this idea is reasonable. I will then review it.

What should also be taken into consideration in that discussion is the following: There is one scenario where this implementation is less efficient than the current implementation that uses prepared sessions from the pool, and that is when a read/write transaction only does a blind write of one or more Mutations. In those transactions there is no DML or query statement that can be used to include the BeginTransaction option with, which means that it will require an additional BeginTransaction RPC in the transaction itself.

@olavloite
Copy link
Contributor Author

@olavloite olavloite commented Aug 6, 2020

@olavloite based on a question the C++ team was asking, what do you return if the BeginTransaction() call fails? Do you return the error from BeginTransaction() or the error from the first transaction attempt?

The current implementation returns the error from BeginTransaction.

@codecov
Copy link

@codecov codecov bot commented Sep 16, 2020

Codecov Report

Merging #325 into master will decrease coverage by 0.04%.
The diff coverage is 84.21%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #325      +/-   ##
============================================
- Coverage     84.04%   84.00%   -0.05%     
+ Complexity     2540     2500      -40     
============================================
  Files           140      140              
  Lines         13891    13726     -165     
  Branches       1329     1314      -15     
============================================
- Hits          11675    11530     -145     
+ Misses         1672     1652      -20     
  Partials        544      544              
Impacted Files Coverage Δ Complexity Δ
.../google/cloud/spanner/MetricRegistryConstants.java 95.23% <ø> (ø) 1.00 <0.00> (ø)
...om/google/cloud/spanner/TransactionRunnerImpl.java 84.03% <78.52%> (-2.35%) 9.00 <0.00> (ø)
...ain/java/com/google/cloud/spanner/SessionPool.java 86.82% <90.69%> (+0.11%) 71.00 <0.00> (-39.00) ⬆️
.../com/google/cloud/spanner/AbstractReadContext.java 83.12% <100.00%> (+0.10%) 51.00 <1.00> (+4.00)
...va/com/google/cloud/spanner/AbstractResultSet.java 83.39% <100.00%> (+0.21%) 28.00 <0.00> (ø)
...gle/cloud/spanner/AsyncTransactionManagerImpl.java 77.14% <100.00%> (+6.83%) 13.00 <0.00> (+1.00)
...a/com/google/cloud/spanner/DatabaseClientImpl.java 84.15% <100.00%> (-1.17%) 19.00 <2.00> (-6.00)
...a/com/google/cloud/spanner/SessionPoolOptions.java 89.65% <100.00%> (-0.12%) 16.00 <0.00> (-1.00)
...m/google/cloud/spanner/TransactionManagerImpl.java 88.67% <100.00%> (+0.21%) 20.00 <0.00> (+2.00)
.../cloud/spanner/spi/v1/SpannerErrorInterceptor.java 60.00% <0.00%> (-5.00%) 3.00% <0.00%> (ø%)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 768c19d...28277ff. Read the comment docs.

@olavloite olavloite requested a review from as a code owner Sep 28, 2020
@olavloite olavloite requested a review from as a code owner Oct 5, 2020
@@ -190,36 +198,7 @@ void ensureTxn() {
ApiFuture<Void> ensureTxnAsync() {
final SettableApiFuture<Void> res = SettableApiFuture.create();
if (transactionId == null || isAborted()) {
span.addAnnotation("Creating Transaction");
Copy link
Contributor Author

@olavloite olavloite Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to the createTxnAsync method to be re-usable.

}
latch.addListener(
Copy link
Contributor Author

@olavloite olavloite Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code is moved to CommitRunnable

Copy link
Contributor

@thiagotnunes thiagotnunes left a comment

After a conversation IRL with @olavloite and reviewing the technical details, this looks good to me. We won't release this as is though, on the next PR we should completely remove prepared sessions in the pool. This way, the implementation can be simplified.

olavloite added 2 commits Oct 21, 2020
* feat: remove session pool preparing

* fix: fix integration tests

* test: fix malformed retry loop in test case

* fix: review comments
private volatile SettableApiFuture<ByteString> transactionIdFuture = null;

volatile ByteString transactionId;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

// The first PartialResultSet will be returned successfully, and then a DATA_LOSS exception will be returned.
mockSpanner.setExecuteStreamingSqlExecutionTime(
SimulatedExecutionTime.ofStreamException(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The first PartialResultSet will be returned successfully, and then a DATA_LOSS exception will be returned.
mockSpanner.setExecuteStreamingSqlExecutionTime(
SimulatedExecutionTime.ofStreamException(
// The first PartialResultSet will be returned successfully, and then a DATA_LOSS exception will
// be returned.
SimulatedExecutionTime.ofStreamException(Status.DATA_LOSS.asRuntimeException(), 1));

public List<ByteString> getTransactionsStarted() {
return new ArrayList<>(transactionsStarted);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@thiagotnunes thiagotnunes merged commit d08d3de into master Oct 23, 2020
20 of 21 checks passed
@thiagotnunes thiagotnunes deleted the inline-begin-tx branch Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner cla: yes do not merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants