-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler hangs occasionally on many-core CPUs on Windows #73532
Compiler hangs occasionally on many-core CPUs on Windows #73532
Comments
Increase the initial size of __CFReadSocketsFds to reduce the chance of resizing it which appears to lead to compiler hangs.
This has been seen to happen in rare cases only (a few random processes) in in our internal large swift application build. The symptom is that the compiler driver thread is stuck waiting forever in the while loop in Based on inspections with the debugger, the Fortunately, adding a small amount of logging doesn't change the reproducibility, but unfortunately adding too much logging makes it go away. And as it's been so far only reproducible in a large swift build which involves many, many invocations of swiftc/swift-frontend processes and it is hard to know in which it occurs and to attach a debugger in real time. Most of the debugging so far relied on limited amount of logging. A further investigation shows that the reason why the callback never fires seems to be that some arbitrary socket file descriptors occasionally get dropped (some bits in the bit vector cleared/unset) for unknown reasons after they are put into the I also checked that the access to the bit vector is properly synchronized but no issues found. I also instrumented in the other points in the code where bits in the bit vector could be potentially cleared but didn't find anything suspicious. This looks like a data corruption of some kind and my current theory is some sort of race-y data/heap corruption in lower-level code such as a race-condition bug in the underlying memory allocators (CFDataAllocator, etc.) or the lock implementation (CFLock, etc.) unless it's broken CPU/hardware or something like that. |
swiftlang/swift-corelibs-foundation#4951 is a suggested workaround that reliably avoids this hang by reducing the chance of bitvector resizing by allocating a larger initial size. Ideally we'd fix the root cause but given the cost/benefit tradeoff and that this code is deprecated and is going to be replaced by |
I am afraid that on Windows it is even more complicated. I did some research a while ago on this, because we noticed that creating too much CFSockets makes test app hang. The reason of such weird behavior is the I stopped working on this because the only issue I noticed was one synthetic test. It is unfortunate that this issue affects the compiler in such drastic way 😞 Here is my WIP commit with initial fix I made. Just for reference. tbh I even don't remember all "how and why"s, but hope it describes the idea at least. And it fixes fd_set growth problem in vitro. |
@lxbndr Oh my... thanks for posting and the patch :) I'm intrigued by the fact that it works to this extent despite this issue 🤯 I confirmed that your WIP commit reliably fixes the hang in our internal build, as is. Would you be willing to put up a PR out of it? That would definitely unblock us. It'd be great if we can merge it. |
@hjyamauchi I guess we can do that, even if it is not perfect. If it makes sense and fixes real issues, it worth to try. |
Fixes swiftlang/swift#73532. On Windows, socket handles in a `fd_set` are not represented as bit flags as in Berkeley sockets. While we have no `fd_set` dynamic growth in this implementation, the `FD_SETSIZE` defined as 1024 in `CoreFoundation_Prefix.h` should be enough for majority of tasks.
Fixes swiftlang/swift#73532. On Windows, socket handles in a `fd_set` are not represented as bit flags as in Berkeley sockets. While we have no `fd_set` dynamic growth in this implementation, the `FD_SETSIZE` defined as 1024 in `CoreFoundation_Prefix.h` should be enough for majority of tasks.
@lxbndr thanks for the fix! |
Description
The swift compiler occasionally hangs during a build. This is seen more frequently on many-core (> 16 cores) machines, In particular AMD threadripper CPUs with 32 cores / 64 threads. There are several
swiftc
processes that are left running and making no progress, with no (child)swift-frontend
processes, when it happens.Reproduction
This happens in a large internal app build on Windows. In some particular AMD threadripper machines, it happens 100% of the time. We have seen something hang in other machines much less frequently, which may be the same issue.
Expected behavior
The build doesn't hang and finishes, as opposed to hanging forever.
Environment
Windows
Additional information
No response
The text was updated successfully, but these errors were encountered: