New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Microsoft][ODBC Driver 17 for SQL Server]The connection is broken and recovery is not possible. The connection is marked by the client driver as unrecoverable. No attempt was made to restore the connection. #9227
Comments
Thanks for bringing this up. My guess here is that we're not doing proper connection pool validation (examples: https://confluence.atlassian.com/conf76/surviving-database-connection-closures-1018769693.html) and that behind the scenes GCP is doing maintenance / moving around DB's (as is their operational model). Could you share what you have set for idle timeout and max connections for the DB in question in the Hasura Console --> Data tab? |
@ajohnson1200 Thanks for the response.
|
Hi @julianomcl. I've been digging into this issue and unfortunately can't reproduce it. The idle timeout of 5 seconds that you have set should mean that any connections to the database are either dropped within 5 seconds, or used once and then discarded, as we drop any connections where we receive an error (for exactly this reason). As far as I can tell this has been the case since versions of Hasura pre-v2.10. In addition, I would have thought that Google Cloud Run would be periodically restarting your containers anyway (the documentation suggests that there's a maximum container lifetime of 1 hour, but I might be wrong here). Are you using a mechanism which makes the Hasura container persist longer than this? Can you tell me if you're you still seeing these issues? And have you tried upgrading Hasura to see if it makes the problem go away? |
Hello, I work with @julianomcl Thanks for all the work you already did debugging this issue, but we are still having this issue. Hasura is connected to two DB's one MSSQL and one POSTGRES, these problems only happen on the MSSQL database and locks all queries, after a container restart at cloud run the error is resolved and everything goes back to normal until we get hit by the error again. Could you please provide us with some guidance on how to further debug this? Here is some more information that could help: Connection String Hasura Settings Cloud Run settings |
I'm connecting to a MS SQL Server database, running in Azure and encountering the same issue :/ I'm running |
I'm trying to understand better how vital this bug is to address. I'm preparing to put my Hasura Cloud instance into its first production environment and expect these connection errors to attempt to self-heal. I have this happen at least once every other week, leading to my entire site being down. My only way around this issue is to go into the Hasura cloud dashboard, update an env var, triggering a restart of the server. I've hooked Hasura up to numerous Postgres services without any issues. Is anyone running SQL Server instances on top of Hasura cloud in a production capacity? Surely there are others with this problem. I know these issues are hard to reproduce. Fwiw -- it happens when I'm actively developing. e.g., adding new tables, changing permissions, etc., |
Hey folks, after a lot of prodding, I have come no closer to reproducing this. Could you please give me as much information as you can about your database setup? For example, the last test I ran was on a Microsoft Azure SQL Database, with the pricing tier "General Purpose - Serverless: Gen5, 1 vCore". It'd also be helpful to get exact version information:
|
Hi, @SamirTalwar, thank you for your assistance with this matter. I have executed the query and obtained the following result:
|
Version info as requested:
I'm using Azure's fully-managed SQL Server:
Let me know if I can provide any additional information to help you debug. If I discover any sort of pattern to these outages, I'll be sure to post here as well. Given that it doesn't happen very often, it might be awhile 😮💨 |
Hi folks, we're having trouble getting to the bottom of this. To find out if it's our own connection pooling mechanism or not, we have added a new toggle in v2.30.1. You can set your pool settings to When you have access to v2.30.1, please give this a shot and let us know if it resolves the issue or not. Either way, it will be valuable information for tracking down the issue. Thanks. |
Happy to do so Samir. I haven't used the CLI much, but should be able to get this done by EOW. I'll let you know when I'm able to flip the bit and hopefully I can re-produce the issue in the coming weeks 🤞🏻 |
Some update from us to help you debug this. We migrated our Hasura deployment from Cloud Run to GKE, both were using GCP VPN to connect to the database. All hasura configuration stayed the same. We have been 22 days without a problem now. |
@SamirTalwar, perhaps I'm being dense, but I don't see an obvious way to update my pool settings via hasura CLI. I've configured my CLI to point to the Hasura cloud instance via HASURA_GRAPHQL_ENDPOINT & HASURA_GRAPHQL_ADMIN_SECRET env vars. I've looked at the help documentation but I need help finding the right command. Hasura version: |
@lucasnad27: Using the CLI, you'll need to export your metadata with
Then re-apply them with I highly recommend upgrading your CLI to match the server version first. |
@SamirTalwar thanks for the guidance. Very helpful. I've updated the setting -- found in I'm hoping my staging hasura instance has enough traffic to replicate the issue I'm seeing in production (happened again a few days ago). If I don't see any issues on the staging database over the coming days, I'll make the same change to my production server. I'll keep this thread updated with my progress & findings. |
Great, thanks! Assuming it makes it to your production instance and you're monitoring machine statistics, it would be great to know if there's major changes to CPU load, memory usage, or network traffic anywhere. If you still have issues, please let me know. |
Promoted metadata update to production a few minutes ago. I don't have a ton of traffic on that instance yet, but it should be enough to give us some data points. I'll circle back early next week on this thread. |
Version Information
Server Version: v2.10.0
CLI Version (for CLI related issue): v2.10.0
What is the current behaviour?
We have Hasura hosted on a Cloud Run on Google Cloud Platform connected with a Postgres and a SQL Server databases.
From time to time, Hasura is losing connection with the SQL Server database and unable to reconnect again.
We are receiving this exception message:
What is the expected behaviour?
We would like to Hasura reconnect with SQL Server automatically, so we don't have to release a new revision on Cloud SQL to recover the connection.
Please provide any traces or logs that could help here.
This is the log we have from Cloud Run:
Keywords
sql server connection broken
The text was updated successfully, but these errors were encountered: