-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanner: timeout / context canceled during getting session #7527
Comments
Hello @ericwenn |
We are using the default sessionConfig (ie |
Thanks for quick answer, do you have QPS numbers with 1 client |
Avg 1 QPS |
We've got some reports (mentioned in GitHub above) from folks using v1.42.0 |
@rahul2393 any ideas on this? |
Hello @ericwenn We are trying to replicate the issue at our end with session config having min/max=1 session, will update here if we find anything. Feel free to share any code you have to help replicate quickly. |
Thanks for update. We have unfortunately not be able to replicate this issue consistently. |
Hello @rahul2393, We were opening a ReadOnlyTransaction without closing it after we're done. We are deploying that fix right now, and will keep an eye out. Off-topic: Do you have any ideas on how to systematically prevent these type of issues, for example when running tests not allowing the spanner client to have open sessions when it is shut down (or similar)? |
Nice @ericwenn, I think that's the reason I am not able to replicate the issue before because I was closing them in my replication code. Currently we don't have a way, need to check in other languages if they are handling the scenario. |
@ericwenn Closing this ticket since you already found the issue, will create another ticket for preventing client to open sessions when it is shut down. |
Client
Spanner
Environment
Managed Cloud Run
Description
Since upgrading spanner library to v1.43.0 we have started seeing intermittent timeout/context canceled issues when getting sessions, with error message:
After this happens once the Spanner client is unable to get any sessions, which means all requests to our service times out after 10s (our configured request deadline), until the instance is restarted.
^Traces from when issue happens, until instance is manually restarted (~4:20 - 4:30).
Looking at the traces does not add more information for debugging. The only culprit is
cloud.google.com/go/spanner.Query (9974.995 ms)
which seems to block everything.We have not been able to reproduce this issue consistently, it seems to happen randomly every 1-2 days.
When we noticed this first we rolled back to v1.42.0 and have not seen this issue on that version (running in production for ~1 week).
Looking at recent issues in this repository we tried to bump to the unreleased version based on the fix for this issue, but still saw the issues on that commit.
Judging from changes between v1.42.0 and v.1.43.0 these changes seem to be the culprit, but I'm not sure of that.
The text was updated successfully, but these errors were encountered: