New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data races on fields of gc_stats
structure
#12590
Comments
The code says: /* The "sampled stats" of a domain are a recent copy of its
domain-local stats, accessed without synchronization and only
updated ("sampled") during stop-the-world events -- each minor
collection, and on domain termination. */
static struct gc_stats sampled_gc_stats[Max_domains];
/* Update the sampled stats for the given domain. */
void caml_collect_gc_stats_sample(caml_domain_state* domain); The intent is that I guess that @eutro is pointing at a bug where |
If, say, a thread gets pre-empted just after arriving at the barrier, other threads are free to leave the STW, and then the GC stats can easily be subject to a data race. The bug may also be easier to think of in terms of memory ordering rather than time. There is no The existing barrier is arrived at here: Lines 644 to 650 in 8e30385
It should be enough to move the It might also, for clarity, make more sense to have the barrier departure within the same procedure, rather than its caller. |
Let me rephrase this in terms that are more familiar to me (a non-expert):
The error with the current gc stats sampling is that it is done before leaving the barrier, while it should happen before entering the barrier.
Indeed. (The
Indeed, it would be clearer if the whole barrier code was in
(would you also like to submit a PR?) |
I'll happily submit a PR, yes. |
To our current knowledge there are two races involving gc stats, one on |
#12597 is merged and, to my knowledge, all gc_stats related data races are now fixed. Closing this. #12597 removed the |
As reported by @eutro in #12579:
The fix suggested by @eutro is to protect
caml_compute_gc_stats
by STW barriers, which makes sense to me.The text was updated successfully, but these errors were encountered: