Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency bugs that cause hanging #890

Closed
wangjwchn opened this issue May 7, 2024 · 2 comments
Closed

Concurrency bugs that cause hanging #890

wangjwchn opened this issue May 7, 2024 · 2 comments

Comments

@wangjwchn
Copy link

I believe that there are concurrency bugs in function mi_free_block_delayed_mt that cause hanging.

  mi_thread_free_t tfree = mi_atomic_load_relaxed(&page->xthread_free);
  do {
    use_delayed = (mi_tf_delayed(tfree) == MI_USE_DELAYED_FREE);
    if mi_unlikely(use_delayed) {
      // unlikely: this only happens on the first concurrent free in a page that is in the full list
      tfreex = mi_tf_set_delayed(tfree,MI_DELAYED_FREEING);
    }
    else {
      // usual: directly add to page thread_free list
      mi_block_set_next(page, block, mi_tf_block(tfree));
      tfreex = mi_tf_set_block(tfree,block);
    }
  } while (!mi_atomic_cas_weak_release(&page->xthread_free, &tfree, tfreex));

The above code segment attempts to push a node, tfreex, to the beginning of the concurrent linked-list-based stack, referred to as xthread_free.

If xthread_free is modified by other threads during the do-while loop, the program will hang because xthread_free has been changed ( != tfree), causing mi_atomic_cas_weak_release to always fail.

Moving the first line into the do-while loop can fix this issue.

The same issue occurred in another code segment of this function:

    tfree = mi_atomic_load_relaxed(&page->xthread_free);
    do {
      tfreex = tfree;
      mi_assert_internal(mi_tf_delayed(tfree) == MI_DELAYED_FREEING);
      tfreex = mi_tf_set_delayed(tfree,MI_NO_DELAYED_FREE);
    } while (!mi_atomic_cas_weak_release(&page->xthread_free, &tfree, tfreex));

The solution is similar to the one above.

@mjp41
Copy link
Member

mjp41 commented May 7, 2024

Doesn't mi_atomic_cas_weak_release update tfree in the failure case.

E.g. it is based on:
https://en.cppreference.com/w/c/atomic/atomic_compare_exchange

Have you actually observed this execution hanging execution? On what platform as it could be the atomics are not implemented correctly in that case?

@wangjwchn
Copy link
Author

Just double-check the result. I believe you are correct. I mapped the mi_atomic_xxx instructions to a different atomic library, which doesn't have the failure update feature that causes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants