Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: Client library sending a high rate of BeginTransaction requests when database is deleted #16

Closed
skuruppu opened this issue Nov 14, 2019 · 6 comments · Fixed by googleapis/google-cloud-java#6789 or #34
Assignees
Labels
api: spanner priority: p2 type: bug

Comments

@skuruppu
Copy link
Contributor

@skuruppu skuruppu commented Nov 14, 2019

Once a database is deleted and if the Java Spanner client is still left open, it starts sending a large number of BeginTransaction requests. The additional requests fail with status NOT_FOUND.

The following error just keeps repeating in the logs:

Nov 07, 2019 10:20:16 PM com.google.cloud.spanner.spi.v1.LoggingInterceptor$CallLogger log FINER: google.spanner.v1.Spanner/BeginTransaction[74f0a03d]: Start
Nov 07, 2019 10:20:16 PM com.google.cloud.spanner.spi.v1.LoggingInterceptor$CallLogger log FINER: google.spanner.v1.Spanner/BeginTransaction[74f0a03d]: Send: session: "projects/<project>/instances/<instance>/databases/<database>/sessions/<session>" options { read_write { } }
Nov 07, 2019 10:20:16 PM com.google.cloud.spanner.spi.v1.LoggingInterceptor$CallLogger log FINER: google.spanner.v1.Spanner/BeginTransaction[74f0a03d]: Closed with status Status{code=NOT_FOUND, description=Database not found: projects/<project>/instances/<instance>/databases/<database> resource_type: "type.googleapis.com/google.spanner.admin.database.v1.Database" resource_name: "projects/<project>/instances/<instance>/databases/<database>" description: "Database does not exist." , cause=null} and trailers Metadata(...)

I'm guessing this is happening in an attempt to replenish the pool.

@skuruppu
Copy link
Contributor Author

@skuruppu skuruppu commented Nov 14, 2019

@olavloite
Copy link
Contributor

@olavloite olavloite commented Nov 14, 2019

This is caused by an infinite loop that can occur during preparing sessions in the session pool. The session pool will try to prepare a session with a read/write transaction when a session is released into the pool. This process follows this logic:

  1. A session is released into the pool (either as part of the session pool initialization or because a session is returned to the pool by a client).
  2. If the current fraction of write-prepared sessions in the pool is less than the configured writeFraction, a BeginTransaction RPC will be scheduled.
  3. The BeginTransact RPC will fail with a 'Database does not exist.' error. This will cause the session to be released into the pool without a prepared read/write transaction.
  4. The release of the session into the pool will start the process at step 1 again. As no BeginTransaction call will succeed, the process will continue until the Spanner instance is closed.

@sebright
Copy link
Contributor

@sebright sebright commented Nov 14, 2019

@olavloite Thanks for fixing this! I cherry-picked googleapis/google-cloud-java@7c66ee4, and it seems to have fixed the issue for me.

@snehashah16
Copy link
Contributor

@snehashah16 snehashah16 commented Nov 20, 2019

  1. The BeginTransact RPC will fail with a 'Database does not exist.' error. This will cause the session to be released into the pool without a prepared read/write transaction.

@skuruppu
An orthogonal question related to this:
What should the client do when a database not found error is returned ?

options:

  • execute backend rpcs all customer requests and return NOT_FOUND (for database admin APIs for this specific database that got deleted)
  • for read / write APIs - should this invalidate the session pool (since sessions are no longer valid for a deleted db) and handle all subsequent requests with NOT_FOUND errors ?
    • this forces the customer to create a new db client after first NOT_FOUND error even though they recreate the database with the same name (but the backend database has obviously changed).
      DatabaseId db = DatabaseId.of(project, instance, database);
      DatabaseClient dbClient = spanner.getDatabaseClient(db);

@snehashah16 snehashah16 reopened this Nov 20, 2019
@chingor13 chingor13 transferred this issue from googleapis/google-cloud-java Jan 7, 2020
@chingor13 chingor13 added priority: p2 type: bug labels Jan 7, 2020
@olavloite
Copy link
Contributor

@olavloite olavloite commented Jan 9, 2020

@skuruppu @snehashah16
Regarding what to do when a Database not found error is returned: We could also allow the read/write APIs handle this in the same way as the database admin APIs do, i.e. keep sending RPCs and let these return NOT_FOUND errors. If the user re-creates the database with the same name, the session pool will automatically heal as Session not found errors are automatically retried by the client by discarding the old session and creating a new session.

@skuruppu
Copy link
Contributor Author

@skuruppu skuruppu commented Jan 10, 2020

My take on this is that since the Java client connection is at a DB level, if the DB is deleted, then this invalidates the current connection. Essentially option 2 that @snehashah16 mentioned so that the customer is expected to create a new client if they recreate a DB with the same name.

I'm a bit weary of sending RPCs continuously until NOT_FOUND errors are no longer returned. In most cases, I suspect once a DB is deleted, users don't recreate the same DB shortly after in a production app. They probably do such things in tests but for that we would expect them to rerun the tests with a new connection.

Let me know if I misunderstood something here.

olavloite added a commit that referenced this issue Jan 14, 2020
DatabaseClients should not continue to try to send RPCs to a database that has
been deleted. Instead, the session pool will keep track of whether a database
not found error has been returned for a database, and if so, will invalidate
itself. All subsequent calls for this database will return a DatabaseNotFoundException
without calling a RPC.

If a database is re-created, the user must create a new DatabaseClient with a new
session pool in order to resume usage of the database.

Fixes #16
olavloite added a commit that referenced this issue Jan 14, 2020
DatabaseClients should not continue to try to send RPCs to a database that has
been deleted. Instead, the session pool will keep track of whether a database
not found error has been returned for a database, and if so, will invalidate
itself. All subsequent calls for this database will return a DatabaseNotFoundException
without calling a RPC.

If a database is re-created, the user must create a new DatabaseClient with a new
session pool in order to resume usage of the database.

Fixes #16
olavloite added a commit that referenced this issue Jan 22, 2020
DatabaseClients should not continue to try to send RPCs to a database that has
been deleted. Instead, the session pool will keep track of whether a database
not found error has been returned for a database, and if so, will invalidate
itself. All subsequent calls for this database will return a DatabaseNotFoundException
without calling a RPC.

If a database is re-created, the user must create a new DatabaseClient with a new
session pool in order to resume usage of the database.

Fixes #16
olavloite added a commit that referenced this issue Jan 22, 2020
* fix: stop sending rpcs on deleted db

* fix: client should stop sending rpcs after database dropped

DatabaseClients should not continue to try to send RPCs to a database that has
been deleted. Instead, the session pool will keep track of whether a database
not found error has been returned for a database, and if so, will invalidate
itself. All subsequent calls for this database will return a DatabaseNotFoundException
without calling a RPC.

If a database is re-created, the user must create a new DatabaseClient with a new
session pool in order to resume usage of the database.

Fixes #16

* fix: remove double check on isValid

* fix: add wait to deleted db integration test

* fix: process review comments

* fix: update copyright year
@google-cloud-label-sync google-cloud-label-sync bot added the api: spanner label Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner priority: p2 type: bug
Projects
None yet
5 participants