Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Socket IO causes VM crash (Issue #495) #497

Merged
merged 1 commit into from
Dec 14, 2022

Conversation

akgrant43
Copy link
Contributor

Socket IO on Windows can crash the VM with an access violation due to a race condition on memory freeing in aioWin.c.

The sequence of events that can lead to the crash is:

allHandles is malloc'd in aioPoll() and passed to sliceWaitForMultipleObjects().
sliceWaitForMultipleObjects() stores a pointer to allHandles in sliceData->handles.
waitHandlesThreadFunction() is then called from one or more threads and copies the data from allHandles.
aioPoll() then waits for an event using WaitForMultipleObjectsEx().
Once WaitForMultipleObjectsEx() returns it checks the results and frees allHandles.

However this assumes that every thread gets a chance to run prior to WaitForMultipleObjectsEx() returning. But WaitForMultipleObjectsEx() will return either after a timeout or after the first thread indicates an event - since WaitForMultipleObjectsEx() is called with bWaitAll = FALSE.

On a machine with only a couple of cores and a large number of sockets open allHandles can be freed prior to all threads getting enough CPU time, causing an access violation.

The solution is to allocate a buffer per thread and copy the relevant portion of allHandles in to it prior to spawning the thread.

As not all threads get a chance to complete prior to WaitForMultipleObjectsEx() returning, it also means that socket IO may not be recognised, and the socket not read (or written).

Polling all sockets even on timeout ensures that all IO is recognised. The time for checkEventsInHandles() is less than 1mS, so the overhead is minimal, with basically no work being done if no handles are registered for asynchronous IO.

Fixes: #495

Socket IO on Windows can crash the VM with an access violation due to a race condition on memory freeing in aioWin.c.

The sequence of events that can lead to the crash is:

    allHandles is malloc'd in aioPoll() and passed to sliceWaitForMultipleObjects().
    sliceWaitForMultipleObjects() stores a pointer to allHandles in sliceData->handles.
    waitHandlesThreadFunction() is then called from one or more threads and copies the data from allHandles.
    aioPoll() then waits for an event using WaitForMultipleObjectsEx().
    Once WaitForMultipleObjectsEx() returns it checks the results and frees allHandles.

However this assumes that every thread gets a chance to run prior to WaitForMultipleObjectsEx() returning. But WaitForMultipleObjectsEx() will return either after a timeout or after the first thread indicates an event - since WaitForMultipleObjectsEx() is called with bWaitAll = FALSE.

On a machine with only a couple of cores and a large number of sockets open allHandles can be freed prior to all threads getting enough CPU time, causing an access violation.

The solution is to allocate a buffer per thread and copy the relevant portion of allHandles in to it prior to spawning the thread.

As not all threads get a chance to complete prior to WaitForMultipleObjectsEx() returning, it also means that socket IO may not be recognised, and the socket not read (or written).

Polling all sockets even on timeout ensures that all IO is recognised. The time for checkEventsInHandles() is less than 1mS, so the overhead is minimal, with basically no work being done if no handles are registered for asynchronous IO.

Fixes: pharo-project#495
@akgrant43
Copy link
Contributor Author

[ERROR] 2022-10-26 14:23:13.000 ioInitHeartbeat (extracted/vm/src/common/heartbeat.c:428):pthread_getschedparam failed: Operation not permitted

The error seems to be unrelated to the patch.
@tesonep @guillep Can we restart the CI?

@guillep
Copy link
Member

guillep commented Dec 5, 2022

CI is as green as we have right now
imagen

The failing part of the job is a VM virtualizing an ARMv7 that does not consistently work...

@@ -506,16 +497,6 @@ EXPORT(long) aioPoll(long microSeconds){
CloseHandle(waitingHandles[i]);
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the reason to not having the quick return when the timeout has arrived? Because we are testing the handlers when we have already testing them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Pablo,

Because not all the threads may have had a chance to execute before the timeout arrives. If the quick return is used there is a good chance that sockets will incorrectly not be signaled as having IO complete - we saw this happen quite regularly.

So the WaitForMultipleObjectsEx() call will return early if there is an event, and may return early if there is IO, allowing the event or IO to be processed, but if it times out it isn't really giving any information about IO completion. And checking whether IO has occurred is very quick, 1000 sockets are checked in less than a millisecond (from memory).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, perfect, thanks for the explanation

@tesonep tesonep merged commit 2af5c8f into pharo-project:pharoX Dec 14, 2022
@akgrant43
Copy link
Contributor Author

@guillep @tesonep
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows Socket IO causes VM crash
3 participants