QEMU issue (when building for ARM) #87
Comments
On my build farm, I'm only seeing this now that I've turned on armhf builds; I wasn't seeing it when building for armel. |
This is a race condition. Actually it (almost?) never happened on one of my two machines, but would happen regularly on the other. It depends on a lot of things, including the processor, kernel, direction of the wind... |
Confirmed; I'm seeing it on armel builds now as well. |
There is a pattern that one can see in the output of 'ps auxf', a dead child takes up 0 bytes of memory, and can thus be easily identified by a script running periodically. |
@po1 Is this in your aggregated patches? |
There is a workaround, a script written by Austin that checks for zombie cmake processes and sends a SIGCHLD to the parent process, it should be included in the files that I put together, and it is documented in the howto. |
The existing buildfarm will not be modified anymore. If this is this a problem on the new farm please consider filling a new ticket there (related to ros-infrastructure/ros_buildfarm#21). |
As stated here:
https://bugs.launchpad.net/qemu/+bug/955379
there is big problem with all current versions of qemu, which (among other) can be triggered by cmake.
What happens is that sometimes, during a check (for CXX ABI info among others), the whole build process hangs forever. It hangs because there is a bug in the select() call handler in QEMU, and that the SIGCHLD is lost in space when the child finishes, for which the parent (cmake) waits forever and ever.
Quick workaround: when that happens, manually doing a 'kill -SIGCHLD $pid' will unlock the cmake process and resume its execution.
As I see it, there are 2 things we can do:
No. 2 does not sound very realistic...
The text was updated successfully, but these errors were encountered: