New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Socket IO causes VM crash #495
Comments
Socket IO on Windows can crash the VM with an access violation due to a race condition on memory freeing in aioWin.c. The sequence of events that can lead to the crash is: allHandles is malloc'd in aioPoll() and passed to sliceWaitForMultipleObjects(). sliceWaitForMultipleObjects() stores a pointer to allHandles in sliceData->handles. waitHandlesThreadFunction() is then called from one or more threads and copies the data from allHandles. aioPoll() then waits for an event using WaitForMultipleObjectsEx(). Once WaitForMultipleObjectsEx() returns it checks the results and frees allHandles. However this assumes that every thread gets a chance to run prior to WaitForMultipleObjectsEx() returning. But WaitForMultipleObjectsEx() will return either after a timeout or after the first thread indicates an event - since WaitForMultipleObjectsEx() is called with bWaitAll = FALSE. On a machine with only a couple of cores and a large number of sockets open allHandles can be freed prior to all threads getting enough CPU time, causing an access violation. The solution is to allocate a buffer per thread and copy the relevant portion of allHandles in to it prior to spawning the thread. As not all threads get a chance to complete prior to WaitForMultipleObjectsEx() returning, it also means that socket IO may not be recognised, and the socket not read (or written). Polling all sockets even on timeout ensures that all IO is recognised. The time for checkEventsInHandles() is less than 1mS, so the overhead is minimal, with basically no work being done if no handles are registered for asynchronous IO. Fixes: pharo-project#495
This is the bug you talked to me at Esug? Super good catch! |
In case someone wants to reproduce these kind of crashes locally, we were using the setup here: https://github.com/feenkcom/pharo-socket-stability-experiment/tree/main/rust-worker. The idea there was to same thousand of small workers in Rust that connect through sockets to a runner in Pharo.
|
Hi @guillep, Yes, this is the issue I mentioned. :-) Prior to the fix if we had about 5 incoming sockets there was a reasonable chance the VM would crash. By the time that got up to 25 sockets it was almost certain to crash. With the fix we have 960 open incoming connections regularly and so far no crash. |
Windows Socket IO causes VM crash (Issue #495)
Socket IO on Windows can crash the VM with an access violation due to a race condition on memory freeing in aioWin.c.
The sequence of events that can lead to the crash is:
However this assumes that every thread gets a chance to run prior to WaitForMultipleObjectsEx() returning. But WaitForMultipleObjectsEx() will return either after a timeout or after the first thread indicates an event - since WaitForMultipleObjectsEx() is called with bWaitAll = FALSE.
On a machine with only a couple of cores and a large number of sockets open allHandles can be freed prior to all threads getting enough CPU time, causing an access violation.
The solution is to allocate a buffer per thread and copy the relevant portion of allHandles in to it prior to spawning the thread.
As not all threads get a chance to complete prior to WaitForMultipleObjectsEx() returning, it also means that socket IO may not be recognised, and the socket not read (or written).
Polling all sockets even on timeout ensures that all IO is recognised. The time for checkEventsInHandles() is less than 1mS, so the overhead is minimal, with basically no work being done if no handles are registered for asynchronous IO.
A PR is on the way.
The text was updated successfully, but these errors were encountered: