-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
At time DB reconnect fail with read tcp [::1]:49528->[::1]:5432: read: connection reset by peer #835
Comments
Having the same issue here. I believe this is a regression that has been introduced about a month ago, I never had tcp connection issues stopping pq from opening a transaction or starting a query before. In my case the occurs quite often on opening a new transaction. The underlying connection has timed out on the postgres side and then there's the "read: connection reset by peer" error. |
@GeertJohan See #871 (comment) for workarounds. |
I have since moved to pgx, as the readme also advertises. V4 of pgx also.corretly handles connection issues. Pq has been awesome, it helped me a lot and I'm super grateful to everyone who contributed to.pq. Thanks!! That said, it seems pgx has better maintenance and updates now, and more future-proof. |
I believe that the
As pointed out by @gnuletik above, this issue is made worse by #702 (comment). Instead of allowing an operation to be retried immediately by
In other words: if a network error occurs, one query will result an error, but the next one should succeed. If I am not mistaken, pgx suffers from the same issue (jackc/pgx#74). However by using the pgxpool connection pool, connection have a maximum lifetime (one hour per https://github.com/jackc/pgx/blob/v4.10.0/pgxpool/pool.go#L17), so the problem is somewhat alleviated. (Don't mistake the pgx "health checks" for actively probing the connection of a TCP socket, it only checks the lifetime, see https://github.com/jackc/pgx/blob/v4.10.0/pgxpool/pool.go#L318-L356) An analysis of lib/pq conn.go: func (cn *conn) send(m *writeBuf) {
n, err := cn.c.Write(m.wrap())
if err != nil {
if n == 0 { // this case was added in lib/pq 1.9.0
err = &safeRetryError{Err: err}
}
panic(err)
}
}
func (cn *conn) prepareTo(q, stmtName string) *stmt {
st := &stmt{cn: cn, name: stmtName}
b := cn.writeBuf('P')
b.string(st.name)
b.string(q)
b.int16(0)
b.next('D')
b.byte('S')
b.string(st.name)
b.next('S')
// If the local socket is closed, this panics with a safeRetryError{Err: "write: connection reset by peer"}
// if the local socket is open, this one will always succeed even if the socket is remotely closed.
cn.send(b)
// If the remote socket is closed, this will panic with "read: connection reset by peer"
// see conn.readParseResponse -> conn.recv1 -> conn.recv1Buf ->
// conn->recvMessage (which panics if conn.recvMessage -> io.ReadFull(cn.buf, x) fails)
cn.readParseResponse()
st.paramTyps, st.colNames, st.colTyps = cn.readStatementDescribeResponse()
st.colFmts, st.colFmtData = decideColumnFormats(st.colTyps, cn.disablePreparedBinaryResult)
cn.readReadyForQuery()
return st
}
Lines 481 to 518 in 072e83d
There is however one problem, some errors (such as those implementing the |
the postgres driver had a regression where it wouldn't drop dead connections resulting in all queries failing with tcp errors. The driver never recovers and restarting your process is the only workaround. The issue was resolved in 1.9.0 of the driver related discussion: lib/pq#835 Signed-off-by: Taylor Silva <tsilva@pivotal.io>
@Lekensteyn thanks for this analysis. So, if I understand properly, lib/pq still suffers from this issue, in which the first query will fail? If so, do you have a proposed fix by any chance? Isn't EDIT: Can this be related to go-sql-driver/mysql/issues/257 as well? |
I've seen the same problem when connecting to a psql DB in docker. Somehow I can't reproduce the problem locally but it happens on CI occasionally. The code is like
Occasionally on CI, I've seen this "Could not select current database: read tcp 127.0.0.1:46768->127.0.0.1:41973: read: connection reset by peer" error. Many PRs referencing this issue just does an upgrade of the |
I wish this would solve this issue but if you use a service mesh which monitors the connections that doesn't change it. Let's assume you have a low traffic service, sometimes your pool can be using a connection that has been shut by the mesh and the pool is not aware of it, so you'll have to know what the mesh value is to adjust it in your configuration.
|
The work around that worked for me: var db *pgx.Conn
for i := 0; i < 20; i++ {
db, err = pgx.Connect(ctx, dsn.String())
if err == nil {
break
}
time.Sleep(500 * time.Millisecond)
} Here is the code that gave me the error originally: db, err = pgx.Connect(ctx, dsn.String())
if err != nil {
t.Fatalf("Couldn't connect to database: %s", err)
} |
Well, I have a sample script which I have been using to understand whether or how the lib/pq handle the reconnect.
I see the reconnect working sometime but there are time when I see the following error
Now I'm not sure why it works sometimes and does not work the other time. Is this something need to handle by the individual client(i.e reconnect) here?
NOTE: I'm not entirely sure if this issue is
database/sql
orlib/pq
please help me guide through thisThe text was updated successfully, but these errors were encountered: