Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Race Condition Crash in FSIOLambdaRunnable #60
This was a fun one to track down. It only happens in builds and it was very sporadic. I'd add a space and recompile and it'd crash constantly every time the Connect was called. I'd remove the space and compile again and it'd go away.
I finally tracked it down to FSIOLambdaRunnable (one of which is created upon calling the Connect function). The issue is that FSIOLambdaRunnable calls "delete this" in the
The issue is that calling
I came up with a temporary solution so we could ship, but I can't submit it as a PR without some additional discussion as I think it leads to an additional edge case. The idea is that you remove "delete this" from the exit function (so that if the lambda finishes before the main thread completes, we don't delete the FSIOLambdarUnnable)
This means we now are responsible for the lifecycle of FSIOLambdaRunnable. It is inadvisable that FSIOLambdaRunnable controls the lifecycle of both itself and its FRunnableThread. Then, when you use the FSIOLambdaRunnable you need to store the pointer to the runnable and delete it in the deconstructor, and before you use it again. (Source)
This technically lets the Runnable exist in memory for longer than strictly needed but will be cleaned up when FSocketIONative is removed. However, this leads to a second issue - you need to delete ConnectionThread before you re-assign it otherwise you leak memory, thus the second delete call.
This however leads to another issue - if you delete ConnectionThread from the main thread while the FSIOLambdaRunnable is still executing something you crash again! I think the fix for this would be effectively:
This has the disadvantage of stalling the main thread until your background thread completes which kind of defeats the purpose of a background thread, but it'd only happen if you called the function twice in a row before the existing one completed. The only other way I can think of is finding a way to safely delete a thread while it's in the middle of executing a function, but I imagine that can still leave things in a bad state.
The other solution I could think of would be to safeguard inside the code that uses FSIOLambdaRunnable, ie: It sets a "IsConnecting" bool to true and the next time you try to call Connect if it's already connecting it just early outs and warns, and then a callback on the exit of the Runnable sets IsConnecting to false once the function actually completes the first time. Then it'd be safe to just delete ConnectionThread before assigning it, since it'd never get to that code if the current ConnectionThread was still in use.
I'm opening this as an issue and not a PR to start a discussion about the best way to fix it, because I'm honestly not sure. ¯\_(ツ)_/¯
I'm still convinced we can encapsulate the life cycle inside the lambda, but I need to be able to replicate this issue on my end to explore ways to achieve this. The usual way is to add a lock or two somewhere ensuring things happen according to the desired order.
Do you have a minimal fresh project you can make that will cause this condition?