NULL and movement check in process_edge #1032

wks · 2023-11-30T06:07:12Z

This PR does two things:

ProcessEdgesWork::process_edge will skip slots that are not holding references (i.e. if Edge::load() returns ObjectReference::NULL).
ProcessEdgesWork::process_edge will skip the Edge::store() if the object is not moved.

Doing (1) removes unnecessary invocations of trace_object() as well as the subsequent Edge::store(). It also allows slots to hold non-reference tagged values. In that case, the VM binding can return ObjectReference::NULL in Edge::load() so that mmtk-core will simply skip the slot, fixing #1031

Doing (2) removes unnecessary Edge::store() operations in the case where the objects are not moved during trace_object. It reduces the STW time in most cases, fixing #574

Fixes: #1031
Fixes: #574

This allows the VM binding to report slots that hold tagged non-reference values as `ObjectReference::NULL` so that mmtk-core will not try to trace it.

Update the doc comments of `Edge::load` and `Edge::store` to mention tagged references.

wks · 2023-12-08T17:36:52Z

@k-sareen I took your advice and added a check after trace_object to skip the store() if the object is not moved. It fixes #574

I did some experiment.

Build	build1	build2	build3
Check NULL before `trace_object`?	no	yes	yes
Check if object is moved after `trace_object`?	no	no	yes

Build1 is the master branch.
Build2 is currently this PR. It checks if the return value of slot.load() is NULL and, if it is, skips the slot completely.
Build3 is here. It further adds, on top of build2, a check of whether the object is moved and, if it is not moved, skips the store operation.

I tested with OpenJDK (unmodified). I ran lusearch of DaCapo Chopin on bobcat.moma, 40 invocations, 2.5x min heap size, no PGO.

Results (normalized to build1)

Plotty: https://squirrel.anu.edu.au/plotty/wks/noproject/#0|bobcat-2023-12-08-Fri-072543&benchmark^build^invocation^iteration^mmtk_gc&GC^time^time.stw&|10&iteration^1^4|20&1^invocation|30&1&mmtk_gc&build;build1|41&Histogram%20(with%20CI)^build^mmtk_gc&

The effect of this PR (build1 vs build2) is uncertain. It slows down GenCopy, Immix and StickyImmix, but speeds up GenImmix and SemiSpace. The difference in Immix (0.7%) is likely to be a result of more GCs being triggered for some reasons. The differences for other plans are less than 0.3%, and are within the confidence intervals.

The effect of checking for object movement after trace_object is significant. It speeds up all plans except SemiSpace which moves every object in every GC. One possible fix is adding a predicate always_move_objects() in addition to may_move_objects() so that SemiSpace can return true and omit the check.

k-sareen · 2023-12-11T05:51:34Z

Hmm. That's a bit odd tbh. Also even with SemiSpace we may not actually forward an object if the object is in the LOS or non-moving spaces. I get it's an extra check, but I would have thought in principle it would reduce unnecessary memory traffic so it would be faster.

k-sareen · 2023-12-11T05:55:58Z

Oh sorry. I misread the graphs (could I suggest giving them more semantic names instead of "build{1,2,3}"?). Yes okay skipping the object write for unmoved objects is a win which is good. I'll note that the percentage differences are minuscule (the graphs are < 1%) and the error bars are large, so either way this PR does not kill performance. I would recommend running a bigger experiment with all the benchmarks to see the impact across the entire suite.

wks · 2023-12-11T08:13:46Z

This is my favorite test: tracing over a static binary search tree. Every time I ran the test program, it triggered 100 GCs (plus 10 warm-ups) using System.gc() (I am sure it always triggers GC), and it recorded the time of each GC. All GCs were run with only 1 GC worker thread to exaggerate the tracing time. The three builds were the exact same binary as my last post.

In the following figure, each sub-figure window is a Plan*Build combination. Each vertical bar is an execution of the test program, and each dot is one GC, and the y-axis is the time for that GC. The violin plot shows the clustering. The colored horizontal bar in the middle is the median, and the black horizontal bar in the middle is the mean.

Obviously, SemiSpace manifests a bi-modal distribution because the cost of dispatching the object to copyspace0 and copyspace1 is different. (We discussed the bi-modal distribution before. See #952 (comment)) The testing of new_object != object systematically adds a cost, as the entire "violin" is shifted upwards. But its effect is even smaller than the test of which space an object is in (i.e. if copyspace0.in_space(object) { ... } else if copyspace1.in_space(object) { ... } else ...). I will not worry about the slowdown in SemiSpace due to this PR because if we refactor the dispatching, we will have more significant improvement in the future.

And the effect on Immix is obvious. The speed-up is consistent and significant, probably because of the reduced memory traffic.

The effect on StickyImmix is a bit mixed. It reduced the variation of GC time, and reduced the maximum GC time. Although the median (colored bar in the middle) becomes higher, the GCs that are below the median are significantly lower. Therefore, the mean (black bar) is not that high. Overall, the effect on StickyImmix is not significant.

I think the conclusion is that this PR does not have significant impact on performance. I'll test it with more benchmarks.

k-sareen · 2023-12-11T12:01:46Z

Could you remove the couple of outliers we have for Gen{Copy,Immix}? I'm just interested to see the actual trend and it's currently being dominated by the outliers. Also just to confirm my understanding, each violin plot (1-5) for each subfigure is the same configuration but a different invocation?

wks · 2023-12-11T12:33:20Z

Could you remove the couple of outliers we have for Gen{Copy,Immix}? I'm just interested to see the actual trend and it's currently being dominated by the outliers.

The following plots have the data points with zscore > 2 removed (Removing only zscore > 3 will still keep many outliers).

Also just to confirm my understanding, each violin plot (1-5) for each subfigure is the same configuration but a different invocation?

Yes. The five violins in each subfigure have exactly the same configuration. They are just five different invocations.

k-sareen · 2023-12-11T12:51:30Z

So avoiding the write is a performance win for all GCs except SS which is juuust slightly worse

wks · 2023-12-11T13:02:01Z

So avoiding the write is a performance win for all GCs except SS which is juuust slightly worse?

Yes. That's expected. The added check itself has overhead. But if it avoids more overhead from the write, it will still be a win overall. SemiSpace should be the worst case because it copies every single object, and it is just slightly worse. Immix profits from this check the most because Immix is deliberately designed to avoid copying. I guess if we enable the "stress GC" options for the ImmixSpace, it will be slower, too, because it will move every single object, too.

k-sareen · 2023-12-13T01:45:18Z

Okay I just ran into a correctness bug because we overwrote a forwarding pointer in an slot. In ART a slot can be the start of an object since the class pointer is stored at the start of the object (as opposed to OpenJDK where the start of the object is the mark word) and in this case the object was already forwarded but the slot.load() returns the object with the forwarding status bits due to previous concurrency issues. Then we overwrite the forwarding pointer with the address of the to-space object, corrupting the class-pointer of the from-space object. I'd like to fast-track this PR. I'm happy to review it.

wks · 2023-12-17T13:34:34Z

This time I ran the three builds (the same binaries as in the previous post) against more benchmarks. 20 invocations, 3x min heap size w.r.t. "vanilla" OpenJDK with G1 (note that the min heap size of some MMTk plans may sometimes be higher than G1's min heap size, so it may be less than 3x min heap size for some benchmarks.)

I omitted batik, eclipse, h2, jme and zxing because the min heap sizes of those benchmarks increases when the number of iterations increase, and that could indicate memory leak, possibly due to the lack of (soft/weak/phantom) reference processing. Tradebeans, tradesoap and tomcat were omitted, too, due to unexpected crashes and hangs during execution. I am not sure about the reason, but it may be due to some anomaly in the communication between the benchmark threads and the network server running during the benchmark.

Detailed results: https://squirrel.anu.edu.au/plotty/wks/noproject/#0|vole-2023-12-13-Wed-173235&benchmark^build^invocation^iteration^mmtk_gc&GC^time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark^mmtk_gc&build;build1|40&Histogram%20(with%20CI)^build^benchmark&

From the data, we see that

(build1 -> build2) Skipping slots that contain NULL references has no visible effect in STW time.
(build2 -> build3) Checking if the object is moved before storing reduces the STW time significantly for most plans. SemiSpace is an exception and the check has no obvious effect for it, for reasons we discussed before.

Some benchmarks, including biojava, graphchi, jython, luindex and lusearch, benefits from the check of object movement when using generational plans. Others are not that sensitive.

From the data, I think the check for object movement after trace_object is profitable. I will add that to this PR, too, and it will fix #574. The check for NULL before trace_object is necessary for correctness reasons, so I'll combine both checks into this PR.

k-sareen · 2023-12-18T08:05:12Z

src/scheduler/gc_work.rs

        let new_object = self.trace_object(object);
-        if Self::OVERWRITE_REFERENCE {
+        debug_assert!(!new_object.is_null());
+        if Self::OVERWRITE_REFERENCE && new_object != object {


We don't need OVERWRITE_REFERENCE anymore. It's always true anyway.

SanityGCProcessEdges sets that constant to false.

There is no harm to check it, and we should check it unless we remove OVERWRITE_REFERENCE. Otherwise if there is a new implementation of ProcessEdgesWork that sets OVERWRITE_REFERNECE to false, we have a bug here.

OVERWRITE_REFERENCE was a hack to implement sanity GC. Sanity GC does not move objects anyway and the check can be obviated now. I'm guessing the compiler is smart enough to figure out that the constant is either always true in the case of real GC work packets and always false for sanity, so there is no performance overhead of the check. So at the end of the day, it's fine to keep it, but I just think it's a bit redundant.

…check

k-sareen · 2023-12-18T11:00:32Z

@wks MarkCompact is failing the null object debug check in trace_object. Could you look into it?

wks · 2023-12-18T11:02:29Z

The OpenJDK binding test failed, and the error is in the reference processor. Now trace_object always asserts that the argument is not NULL. We need to update the documentation, too, where trace_object is exposed to the VM binding (such as in Scanning::process_weak_refs), and make it clear the caller of trace_object is responsible for ensuring the object reference is not NULL.

k-sareen · 2023-12-18T11:02:43Z

You need to add the null checks here as well (in the entire file):

mmtk-core/src/util/reference_processor.rs

Line 225 in 658bce8

) -> ObjectReference {

wks · 2023-12-18T11:04:11Z

You need to add the null checks here as well (in the entire file):

mmtk-core/src/util/reference_processor.rs

Line 225 in 658bce8

) -> ObjectReference {

Yes. And I think this is one more reason why eliminating ObjectReference::NULL is helpful for the clearness of the API. #1043

k-sareen · 2023-12-18T11:06:34Z

I don't quite understand the relevance of the bug to that issue. With this change, we need to do a null check before we do trace_object (i.e. we've moved the check up one level).

wks · 2023-12-18T12:39:26Z

I don't quite understand the relevance of the bug to that issue.

The code:

            let old_referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(reference);
            let new_referent = ReferenceProcessor::get_forwarded_referent(trace, old_referent);

The reference processor called get_forwarded_referent(..., old_referent) without checking whether old_referent is NULL or not. If we make ObjectReference to be non-NULL, then get_referent will have to return None if the referent is already cleared, and Some(referent) if not cleared. Since trace_object(object) always takes a non-NULL ObjectReference as a parameter, then get_forwarded_referent, which is a wrapper of trace_object, will also have to require its argument old_referent to be a non-NULL ObjectReference. That will force the programmer to check if old_referent is None before calling get_forwarded_referent, and the bug in the failed OpenJDK test will not appear in the first place.

With this change, we need to do a null check before we do trace_object (i.e. we've moved the check up one level).

Yes. With #1043 implemented, we will need to do null check before calling trace_object. And yes. It is moved one level above.

For the Ruby binding, this check always exists. Because a Ruby slot can hold a tagged non-reference value, the idiom is

if (SPECIAL_CONST_P(*field)) { // Check if it holds "special values".
    continue;
}
rb_gc_mark_movable(*field); // This will call `trace_object` underneath

true, false, nil and small integers are all "special values". If a slot holds a special value, it will skip calling trace_object.

For bindings that do not have NULL references, they will not need to do the checking because there is no NULL.

So I would argue that the check of whether a usize value represents a NULL is VM-specific, and should be pushed up to the VM binding.

wks · 2023-12-19T03:24:19Z

... With this change, we need to do a null check before we do trace_object (i.e. we've moved the check up one level).

The commit 9648aed introduced the method ReferenceGlue::is_referent_cleared. The intention was that not all VMs clear referents to ObjectReference::NULL. Specifically, the Julia binding replaces the reference with a pointer to jl_nothing, a pointer to a special Julia object which represents the special value Nothing. This means the reference processor must check if the referent is cleared using ReferenceGlue::is_referent_cleared instead of comparing against ObjectReference::NULL. We can see the following idiom in the ReferenceProcessor:

            let referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(*reference);
            if !<E::VM as VMBinding>::VMReferenceGlue::is_referent_cleared(referent) { // checks cleared (NULL or special values)
                Self::keep_referent_alive(trace, referent); // calls trace_object
            }

This also reflects the fact that ObjectReference::NULL is not generally applicable to all VMs (not even most VMs because Julia, Ruby, Python all represent Nothing, nil, None using things that are not ObjectReference::NULL).

k-sareen · 2023-12-19T03:48:10Z

My understanding is that ReferenceGlue is deprecated and slated for removal. The new API doesn't have this issue since everything is done by the VM/binding.

Re: #1043, yes I see. It would be a better API if ObjectReference was non-null.

k-sareen

LGTM. Thanks!

qinsoon

LGTM

This PR does two things: 1. `ProcessEdgesWork::process_edge` will skip slots that are not holding references (i.e. if `Edge::load()` returns `ObjectReference::NULL`). 2. `ProcessEdgesWork::process_edge` will skip the `Edge::store()` if the object is not moved. Doing (1) removes unnecessary invocations of `trace_object()` as well as the subsequent `Edge::store()`. It also allows slots to hold non-reference tagged values. In that case, the VM binding can return `ObjectReference::NULL` in `Edge::load()` so that mmtk-core will simply skip the slot, fixing mmtk#1031 Doing (2) removes unnecessary `Edge::store()` operations in the case where the objects are not moved during `trace_object`. It reduces the STW time in most cases, fixing mmtk#574 Fixes: mmtk#1031 Fixes: mmtk#574

wks mentioned this pull request Dec 7, 2023

Remove NULL ObjectReference #1043

Closed

wks added 3 commits December 8, 2023 12:07

Skip slots where Edge::load() returns NULL

dbf6e94

This allows the VM binding to report slots that hold tagged non-reference values as `ObjectReference::NULL` so that mmtk-core will not try to trace it.

More non-null assertions in spaces.

7b08673

Update documentation

01865db

Update the doc comments of `Edge::load` and `Edge::store` to mention tagged references.

wks force-pushed the feature/early-null-check branch from 45fe7c9 to 01865db Compare December 8, 2023 04:21

Check return value of trace_object

5d0c98b

wks added 2 commits December 18, 2023 14:09

Formatting

0cf0c1d

Update doc comment

fcc6d85

wks changed the title ~~Skip slots where Edge::load() returns NULL~~ NULL and movement check in process_edge Dec 18, 2023

Merge branch 'master' into feature/early-null-check

13932ab

wks marked this pull request as ready for review December 18, 2023 06:54

Fix hyperlink format

bc340d8

k-sareen reviewed Dec 18, 2023

View reviewed changes

Merge remote-tracking branch 'origin/master' into feature/early-null-…

0797733

…check

wks added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Dec 18, 2023

wks added 2 commits December 19, 2023 16:18

Check if referent is cleared before forwarding

5aa3392

Update doc comment and add assertion

afb4beb

k-sareen approved these changes Dec 20, 2023

View reviewed changes

k-sareen added this pull request to the merge queue Dec 20, 2023

wks removed this pull request from the merge queue due to a manual request Dec 20, 2023

qinsoon approved these changes Dec 20, 2023

View reviewed changes

wks added this pull request to the merge queue Dec 20, 2023

Merged via the queue into mmtk:master with commit e2951ad Dec 20, 2023
20 checks passed

wks deleted the feature/early-null-check branch December 20, 2023 07:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NULL and movement check in process_edge #1032

NULL and movement check in process_edge #1032

wks commented Nov 30, 2023 •

edited

wks commented Dec 8, 2023

k-sareen commented Dec 11, 2023 •

edited

k-sareen commented Dec 11, 2023

wks commented Dec 11, 2023

k-sareen commented Dec 11, 2023

wks commented Dec 11, 2023

k-sareen commented Dec 11, 2023 •

edited

wks commented Dec 11, 2023

k-sareen commented Dec 13, 2023

wks commented Dec 17, 2023

k-sareen Dec 18, 2023

wks Dec 18, 2023

qinsoon Dec 18, 2023

k-sareen Dec 18, 2023

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

wks commented Dec 19, 2023

k-sareen commented Dec 19, 2023

k-sareen left a comment

qinsoon left a comment

NULL and movement check in process_edge #1032

NULL and movement check in process_edge #1032

Conversation

wks commented Nov 30, 2023 • edited

wks commented Dec 8, 2023

k-sareen commented Dec 11, 2023 • edited

k-sareen commented Dec 11, 2023

wks commented Dec 11, 2023

k-sareen commented Dec 11, 2023

wks commented Dec 11, 2023

k-sareen commented Dec 11, 2023 • edited

wks commented Dec 11, 2023

k-sareen commented Dec 13, 2023

wks commented Dec 17, 2023

k-sareen Dec 18, 2023

Choose a reason for hiding this comment

wks Dec 18, 2023

Choose a reason for hiding this comment

qinsoon Dec 18, 2023

Choose a reason for hiding this comment

k-sareen Dec 18, 2023

Choose a reason for hiding this comment

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

k-sareen commented Dec 18, 2023

wks commented Dec 18, 2023

wks commented Dec 19, 2023

k-sareen commented Dec 19, 2023

k-sareen left a comment

Choose a reason for hiding this comment

qinsoon left a comment

Choose a reason for hiding this comment

wks commented Nov 30, 2023 •

edited

k-sareen commented Dec 11, 2023 •

edited

k-sareen commented Dec 11, 2023 •

edited