-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: iOS builder took 19h to obtain machine #58435
Comments
OK, I stand corrected. It didn't deadlock, I guess, just ran really slowly, because the bot just passed. I guess most of that 19h was waiting? |
Unfortunately, afak, iOS builders are either limited by the Corellium infra (killed due to hitting resources) or stuck on some build tasks without releasing the machine. (I am now even reserving a regular calendar event to remind me to reboot those builders manually) |
hi again, looks like the |
@changkun, it looks like the builders are wedged again. Could you restart them? And is there some way we could set up an @mknyszek, @heschi: is there any possibility that we could get the new LUCI infrastructure working for |
A while back I checked in CL http://go.dev/cl/480675 with a change to the clang wrapper used on the arm corellium builders. For this fix to take effect, someone needs to rebuild the wrapper from source and install it on the builders. Instructions for this appear to be here: @changkun if you are going to restart the builders, could you please also update the wrappers? I don't think I have the right tools/machines to do the update. Thanks. |
@bcmills LUCI supports iOS, but figuring out where the resources are coming from is a bit of a different question. I hope to start answering that soon, but I suspect it'll be a slowish process :( |
@bcmills I reconfigured three builders, and now they are back again. @thanm The clangwrap is now rebuilt based on the tip of the build repository.
I would love to configure a physical device but don't have the budget to buy more. |
Thank you! |
I just rebooted the builders again. |
Thanks |
@changkun, could you reboot the builders again? Looks like two are missing and one got stuck during a TryBot run. |
@bcmills done |
Thanks. They finished one set of runs but then disappeared again. 😕 |
Hmm, this is happening significantly more often. What had happened lately? I hope to find time to look into Corellium's new API set to do automatic reboots based on farmer.golang.org. |
Looks like some runs will always lead to this:
And then all builders get killed, and those tests are never released. Is there any way to timeout or free them from the waiting list? |
@changkun are you seeing the error above now, or when you trying to add more builder machines?
Could you explain this more? What are not released? I can take a deeper look once I understand more. Thanks. |
@cherrymui I was judging based on the reboot behavior and the number of tests waiting for builders displayed in https://farmer.golang.org/. In the waiting tests, I can see messages like:
In the waiting cases, I found the number of waiters will not reduce at some point (e.g. have 43 waiters), then all 3 existing iOS builders will always fail the coming tests, and buildlet lost the connection. From the build log, e.g. from
Since if a build test is disconnected, the test run will not be released from buildlet, and will try again when the builders are online again. Hence I wonder if we can clear those existing waiting tests to prevent builders to disconnect and need a reboot. |
We stopped skipping tests that didn't have a good reason to be skipped. 🙃 |
There's a trybot run where presumably a test deadlocked, but the run never timed out.EDIT: It actually passed as I was writing this. I guess it was waiting all that time to grab a builder? Yeah, it's right there in the log.
https://farmer.golang.org/try?commit=e95f1f1c
Screenshot for posterity:

Builder output at the time of writing:
The text was updated successfully, but these errors were encountered: