-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hydra write to database: broken pipe #1599
Comments
Using these settings is the appropriate way of handling this. You did not provide the version string, but we're using best effort defaults to work around a broken connection pool.
Unless the driver give us an indicator that a connection should be retried, this would not make sense because errors such as 404, 409, 400 should not be retried with (exponential) backoff or else all these errors will take forever to load. On top of that, the sql connection pool (from Golang) is actually capable of handling retries in connection issue scenarios:
So this really looks a driver issue to me, and not an issue we want to solve with "code on top" to fix something, especially if I would be open to set |
Seems reasonable, from the upstream issue it seems that the driver isn't properly marking those connections for the sql.DB client to retry them properly.
I'm not sure what a reasonable default would be. In my case I just have a postgres box in a cloud provider and have run into this issue but I haven't found the source of the connection closing yet. THe value would need to be lower than whatever is closing the connections, in my case I to the extreme (10s) since I need it to always work (and my load is pretty low right now). So again, not sure we want to set a default timeout (maybe just have some docs explaining the situation?). It seems that pgx (another pg driver for go -- https://github.com/jackc/pgx) has this retry logic implemented -- jackc/pgx@4868929 So maybe a simpler solution is swapping :/ |
Interesting, I didn't know this driver existed. Swapping the driver is definitely a bit of work because we have to handle error codes correctly etc. But if lib/pq fails to deliver it's a viable solution. |
It seems like |
I checked around a bit and it doesn't seem that what you're trying to do (retrying on e.g. broken pipe errors) is something most sql driver implementations are supporting as it's not possible to distinguish if the server accepted the request or not. From the official Go mysql driver:
It does appear that there's a PR for checking MySQL pruned connections ( go-sql-driver/mysql#934 ) but I'm not sure if that's even an issue with PostgreSQL. If it was, and this was unsolved, I'm sure we could convince the maintainers to accept a similar PR. |
pgx also has a stdlib interface (https://github.com/jackc/pgx/tree/master/stdlib) to use the same go sql interface. TBH I haven't used it with cockroach but I would expect it to work (since its PG API), as for mysql
IIRC there's a way to check at the syscall level -- but I'm not sure if thats easily accessible on the go side of things (I assume it could be). This is really only an issue for idle connections that when used are actually broken. Now that I've set my idle timeout very short I just don't see this issue anymore. It seems like we'd want |
Do you know if |
Anyways, I'm closing this here. The proposed solution to retry on failure is not an acceptable solutions as I've laid out in previous comments. This needs to be fixed in the driver itself. We could switch to |
If switching to pgx/stlib is a one-liner I'd be up for that |
Closes #1599 Co-authored-by: Gorka Lerchundi Osa <glertxundi@gmail.com>
Closes #1599 Co-authored-by: Gorka Lerchundi Osa <glertxundi@gmail.com>
Closes ory#1599 Co-authored-by: Gorka Lerchundi Osa <glertxundi@gmail.com>
Describe the bug
Requests to hydra (running with at least postgres) can sometimes hit an "idle" (but closed) connection to the DB. In this scenario the caller gets back:
I've run into similar errors in other apps, and the issue is upstream in lib/pq as well (lib/pq#870).
I was able to work around this issue by setting
max_conn_lifetime=10s
in the DSN -- as it avoids the connections being held open but actually closed on the remote.Reproducing the bug
This has been annoying to reproduce on-demand. I've noticed this when tokens go to get refreshed from the IDP -- specifically after a long time (8hrs in this case) such that the connection can close out.
Expected behavior
I would expect that L4 errors such as this would be transparently retried instead of bubbling all the way to the client.
The text was updated successfully, but these errors were encountered: