Fix query cancellation collateralizing future queries using the same connection #1000

kahuang · 2020-10-02T15:27:52Z

This change solves the following issue we were seeing in production:

A context would get cancelled due to timeout
watchCancel would get triggered, and start to send the cancellation
Because the cancellation happens concurrently with the normal query path, and doesn't block the finish() path until its done cancelling the query on the postgres side, this connection could be returned to the pool before getting cancelled
This connection then gets reused, and gets collateralized by the previous query's cancel request

The changes I've made to ensure we don't run into this collateral damage:

When the context is cancelled, we immediately try to send the finished signal to the finished channel to prevent a race. Either we successfully send this, or the query we tried to cancel finished first. In the latter case, we just return, and let a future query (if one happens) trigger the cancellation.
In the former case we set the connection state to bad so it doesn't get reused, and cancel the query.
When the finish() function gets called, the querier goroutine closes the connection.

Note: We probably don't need to change bad to an atomic variable, but I did just to be safe.

go18_test.go

maddyblue

Please squash your commits and fix the test flake and I think I can merge this.

…connection

kahuang · 2020-11-23T18:25:11Z

I've updated the tests and squashed the commits

maddyblue · 2020-11-23T20:36:24Z

Need to remove go 1.13 which doesn't support some of the sync stuff here. Doing that in #1014.

jwatte · 2020-12-04T17:43:20Z

This change causes a public API change that breaks our app.
We start a transaction in a WithCancel() context, run a select, cancel the context, and run another select.
The error returned when transacting on a canceled context used to be:
"sql: transaction has already been committed or rolled back"
It is now instead "driver: bad connection" which is not just a breaking change, but arguably the wrong error.

maddyblue · 2020-12-06T21:46:10Z

If you submit a PR that reverts this change and adds a test to prevent this regression in the future I'll merge it.

NOMORECOFFEE · 2021-01-16T06:29:33Z

Hello.
It's possible that connections remain in the pool in the ErrBadConn state and
a connection drop out in the next request.
Should we implement the SessionResetter in this case?

68577: protectedts,kvserver: add UpdateTimestamp method to the storage interface r=ajwerner a=adityamaru This change introduces a UpdateTimestamp method that can be used to update timestamp protected by a protected time stamp record. A next commit will add some logic to the verification code to ensure that an update is picked up correctly during the Verify() call. Release note: None kvserver: modify pts verification to check Timestamp as well Previously, when verifying a pts record we would find the record we are attempting to verify from the cache on the basis of the record ID. This was correct until we added a way to update a records Timestamp in the previous commit. If a record is updated with a new ts, and a verification request is sent for this updated record, then we want to ensure that request is not serviced on the basis of the old record. This is possible if the cache is too stale. To prevent this we now match both the record ID and the timestamp to protect after, when finding the cache entry corresponding to the request. In this way, if we do see the older record then the verification will fail the first time around, the cache will refresh, and we will see the new record the second time around. Release note: None 68665: workload: ignore ErrBadConn if context has been canceled r=otan a=rafiss fixes: #68574 fixes: #68585 refs: lib/pq#1000 lib/pq can return this error if the context has been canceled. Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

knz · 2022-02-11T19:33:02Z

This change causes a public API change that breaks our app.

We've just run into the same issue inside CockroachDB. A query cancellation should not drop the connection.

rafiss · 2022-08-29T15:53:04Z

I think #1079 will address the issues with this PR

evanj · 2023-01-29T18:17:55Z

I have updated #1079 with the code review comments in case it helps.

kahuang mentioned this pull request Oct 4, 2020

Context Cancellation May Collateralize Future Queries #1001

Closed

maddyblue reviewed Nov 23, 2020

View reviewed changes

go18_test.go Outdated Show resolved Hide resolved

maddyblue reviewed Nov 23, 2020

View reviewed changes

Fix query cancellation collateralizing future queries using the same …

f0b8e15

…connection

kahuang force-pushed the master branch from 65efd13 to f0b8e15 Compare November 23, 2020 18:24

maddyblue merged commit aecc811 into lib:master Nov 24, 2020

This was referenced Aug 10, 2021

roachtest: ycsb/F/nodes=3/cpu=32 failed cockroachdb/cockroach#68574

Closed

workload: ignore ErrBadConn if context has been canceled cockroachdb/cockroach#68665

Merged

erikgrinaker mentioned this pull request Aug 24, 2021

roachtest: kv/splits/nodes=3/quiesce=true failed cockroachdb/cockroach#68883

Closed

rafiss mentioned this pull request Sep 27, 2021

roachtest: ycsb/F/nodes=3/cpu=32 failed cockroachdb/cockroach#70019

Closed

rafiss mentioned this pull request Feb 11, 2022

clisqlshell: handle interactive query cancellation cockroachdb/cockroach#76437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix query cancellation collateralizing future queries using the same connection #1000

Fix query cancellation collateralizing future queries using the same connection #1000

kahuang commented Oct 2, 2020

maddyblue left a comment

kahuang commented Nov 23, 2020

maddyblue commented Nov 23, 2020

jwatte commented Dec 4, 2020 •

edited

maddyblue commented Dec 6, 2020

NOMORECOFFEE commented Jan 16, 2021 •

edited

knz commented Feb 11, 2022

rafiss commented Aug 29, 2022

evanj commented Jan 29, 2023

Fix query cancellation collateralizing future queries using the same connection #1000

Fix query cancellation collateralizing future queries using the same connection #1000

Conversation

kahuang commented Oct 2, 2020

maddyblue left a comment

Choose a reason for hiding this comment

kahuang commented Nov 23, 2020

maddyblue commented Nov 23, 2020

jwatte commented Dec 4, 2020 • edited

maddyblue commented Dec 6, 2020

NOMORECOFFEE commented Jan 16, 2021 • edited

knz commented Feb 11, 2022

rafiss commented Aug 29, 2022

evanj commented Jan 29, 2023

jwatte commented Dec 4, 2020 •

edited

NOMORECOFFEE commented Jan 16, 2021 •

edited