What do you think about instead lifting some of the setup from writebarrierptr_prewrite1 (like switching to the system stack) into bulkBarrierPreWrite and just calling gcmarkwb_m directly from bulkBarrierPreWrite? Then, as a possible second step, unrolling the bulkBarrierPreWrite loop that calls gcmarkwb_m?
That was my first instinct, but I wasn't sure whether for a very large bulk pre-write this might exceed the latency budget in a way that hopping back and forth off of the system stack might not.
bulkBarrierPreWrite is already non-preemptible, so this wouldn't be making it any worse. In fact, it would be kind of nice for it to be obviously non-preemptible, rather than subtly non-preemptible like it is now. :)
This isn't great, obviously. But if we wanted to fix this (which we might have to), I think we would need to do it at the typedmemmove and friends level by breaking it up into smaller bulkBarrierPreWrite and memmove segments with a preemption point after each segment.
There are several TODOs here:
Improve the layering.
Figure out whether typedmemmove and friends need to break their work up into chunks to avoid long pauses.
Check whether loop unrolling in bulkBarrierPreWrite improves performance.
The text was updated successfully, but these errors were encountered:
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.
Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).
Run-TryBot: Austin Clements <firstname.lastname@example.org>
Reviewed-by: Rick Hudson <email@example.com>
TryBot-Result: Gobot Gobot <firstname.lastname@example.org>