You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During investigation of #43312 we observed that some TryBot runs would reach 0 builds remaining, yet wouldn't complete. We should try to understand the root cause and fix it.
Edit: The root cause is well understood now, see comment from Mar 3.
CC @golang/release.
The text was updated successfully, but these errors were encountered:
dmitshur
added
Builders
x/build issues (builders, bots, dashboards)
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
labels
Dec 22, 2020
Observed this again now. There were 6 trybot runs at https://farmer.golang.org/#trybots just now, and they were stuck at "Builds remaining: 0" for at least 5-10 minutes. Afterwards, most of them completed.
I was tailing the logs during the time and did not see anything that visibly stood out as a problem.
TryBot completion happens as part of a loop inside findTryWork. This is why TryBots take at least a few seconds after reaching 0 builds left, and why when findTryWork was broken in #43312 try weren't completing.
it posted a trybot-result vote (but wasn't removed as an active trybot yet)
I removed the trybot-result vote (coordinator still hasn't removed it as an active trybot)
by the time it would be removed in the findTryWork loop, it stopped meeting the condition for a "finished run" because the trybot-result vote was missing.
The problem observed on Mar 2 is due to me removing TryBot-Result vote "too quickly", without giving coordinator a chance to mark the trybot run as complete. A workaround is to also remove the TryBot-Run vote, wait a minute for the run to get cancelled, then restart it.
The fix is to improve the trybot completing logic by either factoring it out of the findTryWork loop, or otherwise at least adding a check and cancel even if ts.wantedAsOf == now when the number of builds remaining is non-positive.
dmitshur
changed the title
x/build/cmd/coordinator: understand why TryBot runs sometimes fail to complete after reaching 0 builds remaining
x/build/cmd/coordinator: make logic for cleaning up TryBot runs that reach "0 builds remaining" more robust
Mar 3, 2021
dmitshur
added
NeedsFix
The path to resolution is known, but the work has not been done.
and removed
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
labels
Mar 3, 2021
During investigation of #43312 we observed that some TryBot runs would reach 0 builds remaining, yet wouldn't complete. We should try to understand the root cause and fix it.
Edit: The root cause is well understood now, see comment from Mar 3.
CC @golang/release.
The text was updated successfully, but these errors were encountered: