Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race on global pools arrays of pool_freelist #12755

Merged
merged 4 commits into from
Nov 27, 2023

Conversation

fabbing
Copy link
Contributor

@fabbing fabbing commented Nov 17, 2023

This PR, a joint work with @OlivierNicole, addresses data races on global_avail_pools and global_full_pools members of the struct pool_freelist in shared_heap.c.

Domains can access the global_(avail|full)_pools arrays in parallel while trying to adopt a pool with pool_global_adopt, so these arrays must be made of atomic pointers.
Since the pool_freelist struct is not exposed outside of the runtime, the choice was made to use a proper _Atomic qualifier rather than a volatile one. The downside is that read and write accesses have to be done through the atomic_(load|store)_relaxed functions to not cause more synchronisation than intended, which makes the code a bit more verbose.

Co-authored-by: Olivier Nicole <olivier@chnik.fr>
@OlivierNicole
Copy link
Contributor

To address some possible confusion: manipulations of this global free list is generally protected by a mutex, but there is exactly one exception, pool_global_adopt as @fabbing said, where a first read is made outside of the lock:

ocaml/runtime/shared_heap.c

Lines 301 to 309 in 8ec2b3d

/* probably no available pools out there to be had */
if( !pool_freelist.global_avail_pools[sz] &&
!pool_freelist.global_full_pools[sz] )
return NULL;
/* Haven't managed to find a pool locally, try the global ones */
caml_plat_lock(&pool_freelist.lock);
if( pool_freelist.global_avail_pools[sz] ) {

This read consequently can race with all writes into the same arrays, which is technically UB.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced by your choice of relaxed. Except for the one short-cut check that was outside the lock, these accesses only occur (in debug mode or) in code that runs on the first pool allocation that follows the termination of another domain, that is, it is very rare. I think that the performance gains of relaxed compared to a pleasant atomic access are neglectible.

if( !pool_freelist.global_avail_pools[sz] &&
!pool_freelist.global_full_pools[sz] )
if( !atomic_load_acquire(&pool_freelist.global_avail_pools[sz]) &&
!atomic_load_acquire(&pool_freelist.global_full_pools[sz]) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a relaxed here if the intention is to have a fast, approximate check that corresponds to the previous version. acquire is probably fine though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was to avoid taking the lock if not necessary, but now that I have to argue about it, I'm no longer sure it was achieving it. Relaxed operation will be fine.

@OlivierNicole
Copy link
Contributor

I am not convinced by your choice of relaxed. Except for the one short-cut check that was outside the lock, these accesses only occur (in debug mode or) in code that runs on the first pool allocation that follows the termination of another domain, that is, it is very rare.

In principle, I’m not at all opposed to use SC atomics instead of relaxed if the performance cost is negligible: it makes the code much more readable. I see one other place where such SC atomic operations on global pools will occur, however: through move_all_pools, which is called by each domain from caml_cycle_heap when completing a major heap cycle. (Although I don’t really understand this code, e.g. I don’t understand why move_all_pools is called repeatedly on a global pool list that I expect to be empty after the first call.)

@gasche
Copy link
Member

gasche commented Nov 20, 2023

caml_cycle_heap is a rare operation, adding a couple SC atomics is neglectible.

@fabbing
Copy link
Contributor Author

fabbing commented Nov 20, 2023

I am not convinced by your choice of relaxed. Except for the one short-cut check that was outside the lock, these accesses only occur (in debug mode or) in code that runs on the first pool allocation that follows the termination of another domain, that is, it is very rare. I think that the performance gains of relaxed compared to a pleasant atomic access are neglectible.

These accesses are protected by the pool_freelist.lock mutex which propagates changes to other domains when the lock is taken/released, relaxed operations are fine.
The code is more verbose because we have to use atomic_load_relaxed, but it also makes it clear that these operations don't cause synchronisation.

@gasche
Copy link
Member

gasche commented Nov 20, 2023

I would have preferred the less verbose code with normal C reads/writes, but if you care about using relaxed operations explicitly, those who do the work decide.

@gasche
Copy link
Member

gasche commented Nov 20, 2023

There is a failure on the CI:

runtime/shared_heap.c:82:5: error: incompatible integer to pointer conversion initializing '_Atomic(pool *)' with an expression of type 'int' [-Werror,-Wint-conversion]
  { 0, },
    ^
runtime/shared_heap.c:83:5: error: incompatible integer to pointer conversion initializing '_Atomic(pool *)' with an expression of type 'int' [-Werror,-Wint-conversion]
  { 0, },

@fabbing
Copy link
Contributor Author

fabbing commented Nov 20, 2023

It's nice to have Clang to spot these issues. Sorry I didn't notice it before!

@sadiqj
Copy link
Contributor

sadiqj commented Nov 25, 2023

Is this only waiting on a Changes entry?

@gasche
Copy link
Member

gasche commented Nov 25, 2023

I guess it is.

@fabbing
Copy link
Contributor Author

fabbing commented Nov 27, 2023

I guess it is.

Changes updated accordingly, and thanks for the review!

@gasche gasche merged commit 8cfd9c5 into ocaml:trunk Nov 27, 2023
9 checks passed
@fabbing fabbing mentioned this pull request Dec 5, 2023
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants