x/build/cmd/coordinator: make logic for cleaning up TryBot runs that reach "0 builds remaining" more robust #43323
Observed this again now. There were 6 trybot runs at https://farmer.golang.org/#trybots just now, and they were stuck at "Builds remaining: 0" for at least 5-10 minutes. Afterwards, most of them completed.
I was tailing the logs during the time and did not see anything that visibly stood out as a problem.
The trybot run at https://farmer.golang.org/try?commit=f1347265 is still active at this moment despite having "Builds remaining: 0" for well over 10 minutes.
I think I understand this now.
TryBot completion happens as part of a loop inside
In the case of https://farmer.golang.org/try?commit=f1347265, it was an unfortunate race condition involving these steps:
Relevant code is coordinator.go#L1093-L1098.
The fix is to improve the trybot completing logic by either factoring it out of the