The /logs endpoint supports long polling via waitsizenot parameter. To avoid being flagged as a timeout, it tries to serve a 304 in just under a minute (55 seconds specifically):
// Return a 304 if there's no activity in just under a minute.// This keeps some occasional activity on the TCP connection// so we (and any proxies) know it's alive, and can fit// within reason read/write deadlines on either side.ctx, cancel:=context.WithTimeout(r.Context(), 55*time.Second)
defercancel()
changed:=gl.waitSizeNot(ctx, oldSize)
if!changed {
w.WriteHeader(http.StatusNotModified)
return
}
I think this worked fine before, but in practice it seems not to work at all now. Instead it always serves a 502 after 60 seconds:
$ time curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713'
HTTP/2 502
content-type: text/html; charset=UTF-8
referrer-policy: no-referrer
content-length: 332
date: Mon, 27 Jun 2022 02:56:31 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713' 0.02s user 0.02s system 0% cpu 1:00.14 total
Can be reproduced during quiet times (i.e., need 60 seconds+ of inactivity) by manually computing the current corpus size, or by running maintwatch.
Maybe 55 seconds has become too short because of new internal timeouts on the load balancing side. Can trivially try something like 45-50 or 20-25 seconds.
This isn't a big deal for functionality because most callers retry both 502 no differently than a correctly served 304, but adds some noise and inefficiency.
The text was updated successfully, but these errors were encountered:
dmitshur
added
Builders
x/build issues (builders, bots, dashboards)
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
labels
Jun 27, 2022
A few test queries with the /debug304 endpoint (from CL 415254) show that waiting under 30 seconds results in 304 being returned, but 30 or more seconds results in 502 after a minute.
Change https://go.dev/cl/415354 mentions this issue: maintner/maintnerd: set ingress backend timeout to 60 seconds
dmitshur
added
NeedsFix
The path to resolution is known, but the work has not been done.
and removed
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
labels
Jun 30, 2022
The /logs endpoint supports long polling via
waitsizenot
parameter. To avoid being flagged as a timeout, it tries to serve a 304 in just under a minute (55 seconds specifically):(Source: x/build/maintner/maintnerd/gcslog/gcslog.go#243-253.)
I think this worked fine before, but in practice it seems not to work at all now. Instead it always serves a 502 after 60 seconds:
Can be reproduced during quiet times (i.e., need 60 seconds+ of inactivity) by manually computing the current corpus size, or by running maintwatch.
Maybe 55 seconds has become too short because of new internal timeouts on the load balancing side. Can trivially try something like 45-50 or 20-25 seconds.
This isn't a big deal for functionality because most callers retry both 502 no differently than a correctly served 304, but adds some noise and inefficiency.
The text was updated successfully, but these errors were encountered: