You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, the aarch64 builders have been shrinking over time until the next redeploy brings them back to life. After a bit of debugging, we noticed that the connection to the AMQP host is somehow lost, but the builder doesn't exit. Heartbeats should have helped in this situation but didn't. When looking at the connections using ss, we see the following on a busted builder:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
u_str ESTAB 0 0 * 44148 * 24754 users:(("builder",pid=25341,fd=2),("builder",pid=25341,fd=1),("grahamcofborg-b",pid=25335,fd=2),("grahamcofborg-b",pid=25335,fd=1))
and the following on a working builder:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
u_str ESTAB 0 0 * 80906 * 33026 users:(("builder",pid=24827,fd=2),("builder",pid=24827,fd=1),("grahamcofborg-b",pid=24816,fd=2),("grahamcofborg-b",pid=24816,fd=1))
tcp ESTAB 0 0 xxx.xxx.xxx.xxx:47128 xxx.xxx.xxx.xxx:5671 users:(("builder",pid=24827,fd=3))
As you can see, the busted builder has dropped its connection to the AMQP host, while the working builder still has an established connection.
Potentially unrelated, but in the stack of one of the busted builders, we also noticed the following thread:
Maybe this is related to something panicking, and thus preventing a clean exit (or exit altogether) somehow? Though it looks to me like that's just related to the std::thread::Builder::spawn_unchecked function (probably to panic if anything goes wrong rather than making the caller handle any errors).
The text was updated successfully, but these errors were encountered:
Recently, the aarch64 builders have been shrinking over time until the next redeploy brings them back to life. After a bit of debugging, we noticed that the connection to the AMQP host is somehow lost, but the builder doesn't exit. Heartbeats should have helped in this situation but didn't. When looking at the connections using
ss
, we see the following on a busted builder:and the following on a working builder:
As you can see, the busted builder has dropped its connection to the AMQP host, while the working builder still has an established connection.
Potentially unrelated, but in the stack of one of the busted builders, we also noticed the following thread:
Maybe this is related to something panicking, and thus preventing a clean exit (or exit altogether) somehow? Though it looks to me like that's just related to the
std::thread::Builder::spawn_unchecked
function (probably to panic if anything goes wrong rather than making the caller handle any errors).The text was updated successfully, but these errors were encountered: