Skip to content
This repository has been archived by the owner on Nov 11, 2023. It is now read-only.

Address ABA vulnerability in LocklessTaskQueue. #32

Closed
wants to merge 1 commit into from
Closed

Address ABA vulnerability in LocklessTaskQueue. #32

wants to merge 1 commit into from

Conversation

tristanjl
Copy link
Contributor

This is a solution to address the ABA limitation of the LocklessTaskQueue implementation. Problem is outlined here https://domino.research.ibm.com/library/cyberdig.nsf/papers/58319A2ED2B1078985257003004617EF/$File/rj5118.pdf on page 19 in figure 10.

@tristanjl
Copy link
Contributor Author

I also wonder if it might be worth having the some of steering committee (particularly Audiokinetic/Epic/Unity) have a look. I would not be surprised if they have their own solutions to this. Might lead us to having the best solution pushed to the resonance repository.

@anokta
Copy link
Contributor

anokta commented May 5, 2020

Thanks for bringing this up! +1 on having additional reviews (in addition to @jkammerl from Google), to keep everyone in the loop.

I personally wasn't aware of any specific issues caused by LocklessTaskQueue, as long as there is one consumer thread for executing the tasks (i.e., the audio thread). Do you mind providing more context where/how you encountered this issue on FMOD side?

Cheers

@tristanjl
Copy link
Contributor Author

Sure. So even with the change, one consumer thread is the maximum allowed. FMOD uses only one consumer thread.

We encountered this issue with our users reporting crashes when using resonance audio with FMOD after an extended period of time. Here is one such user case - https://qa.fmod.com/t/fmod-unity-integration-crash-with-resonanceaudio/15710 - and we were provided with crash dumps with callstacks crashed inside LocklessTaskQueue. We were unable to reproduce it ourselves, so looked to solve the problem based on available information on Treiber stack c++ implementations (as this problem does not exist, for example, in Java implementations due to language guarantees).

It's easiest to conceptualise how problems can arise looking at a three producer (P1 and P2&3) one consumer (C) scenario. P1 enters PopNodeFromList(&free_list_head_) via Post(). It gets list_head_ptr = list_head.load(); (P1H) and list_head_next_ptr = list_head_ptr->next.load(); (P1N). P1 gets interrupted. P2&3 and C perform a number of operations such that free_list_head_ == P1H and free_list_head_->next != P1N. P1 resumes, and the compare and swap in PopNodeFromList succeeds as *list_head == list_head_ptr (P1H == free_list_head_). But as P1N no longer == list_head_next_ptr (which could theoretically could be anywhere at this point), this swap should be invalid.

Hopefully my explanation is clear enough - it takes me a little while to get my head around these types of problems.

@tristanjl
Copy link
Contributor Author

Oh, I probably should have asked before - is there a best way to reach the other steering committee members?

@jkammerl
Copy link
Member

jkammerl commented May 9, 2020

Thanks for the PR! I'll take a close look on Monday and try to reproduce the issue.

@jkammerl
Copy link
Member

jkammerl commented May 9, 2020

Good catch! I agree the described case could be an issue. I refactored your patch a bit and will send it out for an internal review on Monday. Once I get approval, I'll update this github repo accordingly. Thanks a lot for reaching out!

@jkammerl
Copy link
Member

Hi, I refactored your approach and just submitted another pull request (I'll add some feedback on your PR in a second which motivated me to rewrite some parts).
#33
Could you give it a try? Also feel free to comment on code.
Thanks! Julius


// Union for swap operations
union NodeAndTag {
std::atomic<uint64_t> both_atomic;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks dangerous to me given that the size of the std::atomic is not guaranteed to be the exact size of its templated type, see also discussion here: https://stackoverflow.com/questions/18278611/can-two-stdatomics-be-part-of-one-union

@tristanjl tristanjl closed this May 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants