You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I believe there is a bug in the handling of the orphaned heaps linked list. It suffers from a race when adding/removing heaps.
Imagine the initial state in Thread 1 is A->B and we want to use the heap A, we load it and the next_heap will be B. Immediately before the CAS, the thread is interrupted.
Then you get other threads that adding/removing heaps reach a state of A->N->B.
Now Thread 1 resumes, the CAS succeeds, because A is still the head, but now after the CAS the list's head is B, N is gone.
Unfortunately in this way we could also hit a heap that is in use.
The ABA problem in linked lists is not trivial to fix unfortunately. In your Global span cache, you use a lock (the SPAN_LIST_LOCK_TOKEN), I guess using a lock for the orphaned heaps will be OK as well - the operation of handling them is relatively seldom - only when threads are init/finalized.
Please correct me if I'm wrong somewhere.
The text was updated successfully, but these errors were encountered:
Since heaps in rpmalloc are always 64k aligned I think it would be sufficient with a running counter in the low 16 bits to avoid ABA issues as it would require the init/fini of 64k threads while the one thread accessing the orphan pointer is suspended in order to fail.
Hi,
I believe there is a bug in the handling of the orphaned heaps linked list. It suffers from a race when adding/removing heaps.
Imagine the initial state in Thread 1 is A->B and we want to use the heap A, we load it and the next_heap will be B. Immediately before the CAS, the thread is interrupted.
Then you get other threads that adding/removing heaps reach a state of A->N->B.
Now Thread 1 resumes, the CAS succeeds, because A is still the head, but now after the CAS the list's head is B, N is gone.
Unfortunately in this way we could also hit a heap that is in use.
The ABA problem in linked lists is not trivial to fix unfortunately. In your Global span cache, you use a lock (the SPAN_LIST_LOCK_TOKEN), I guess using a lock for the orphaned heaps will be OK as well - the operation of handling them is relatively seldom - only when threads are init/finalized.
Please correct me if I'm wrong somewhere.
The text was updated successfully, but these errors were encountered: