Extract set_collection_kind and set_gc_status #957

qinsoon · 2023-09-19T02:15:59Z

This PR extracts set_collection_kind and set_gc_status from each plan's schedule_collection, and calls them directly in the ScheduleCollection work packet. This removes duplicated code, and also makes our implementation consistent with Java MMTk.

For generational plans, in schedule_collection in each plan, our current code calls requires_full_heap_collection first, and then call set_collection_kind. With the changes in the PR, requires_full_heap_collection is called after set_collection_kind, which is the same as in Java MMTk (super.collectionPhase(phaseId);). The implications are:

When set_collection_kind forces a full heap GC, and set_collection_kind is called before requires_full_heap_collection, we will do a full heap GC in the current GC (which is the expected behavior in Java MMTk).
set_collection_kind may increase cur_collection_attempt, which makes it more likely to trigger an emergency GC, and makes it more likely to trigger a full heap GC.

In summary, we miss some opportunities where we should do a full heap GC. With the changes, we should properly trigger full heap GCs when we should.

qinsoon · 2023-09-21T06:43:41Z

I got some strange performance results for the PR, running SemiSpace. For jython, the number of GCs is increased for the PR, thus the STW time is increased. The result is reproducible.

The change in this PR should be a no-op for SemiSpace. It just moves the call to set_collection_kind and set_gc_status from the plan's schedule_collection to the ScheduleCollection work packet. I tried to log the page usage after each GC. I got a plot below. It shows for this PR, we have more live pages at the end of each GC, and that is the reason why we need to do more GCs. There is no emergency GCs triggered for both builds (emergency GCs should not affect semispace any way, but it may affect reference processing. Reference processing is not enabled in the run).

I have no idea why this would happen. Any idea would be helpful.

caizixian · 2023-09-21T06:55:15Z

This might be an simple but interesting example for relational debugging cc @steveblackburn

k-sareen · 2023-09-25T03:51:32Z

Have you been able to reproduce it on a different microarchitecture?

qinsoon · 2023-10-06T05:00:59Z

I am tracing down this bug. It turns out that the number of objects scanned in each GC for the two builds are the same, but the total bytes of the scanned objects are different. The total bytes is calculated with ObjectModel::get_current_size(). I also tried to get the min/max of object sizes in each GC, and it is unchanged. So it looks like, for some reason, ObjectModel::get_current_size() returns a different size for a subset of the objects for the PR. I am trying to figure it out.

steveblackburn · 2023-10-11T02:37:28Z

The idea that Jython might behave differently with different build names is not implausible. Jython is a python implementation, and it makes a lot of sense that it sees and uses the users' "environment", and thus would be affected by the path length of the working directory in which it is run. It's alarming that it would affect the GC to this extent, but it's not implausible IMO.

qinsoon · 2023-10-11T05:55:18Z

The path length will increase object size in jython. So the longer the path is, the larger the minheap size is. This will cause more GCs and more STW time.

The following is the number of GCs and the STW time for semispace, immix, and mark sweep for different path lengths. I use folder names of length 1 to 255. The X axis is the base folder /home/yilin + variable folder name. If the data for length=255 is missing, it means OOM and we do not have results for it.

SemiSpace

Immix

Mark Sweep

I will reevaluate this PR.

@steveblackburn Should I open an issue for DaCapo?

steveblackburn · 2023-10-11T06:35:16Z

Yes, I think we need to fix this in DaCapo

steveblackburn · 2023-10-11T20:45:18Z

@qinsoon If you have any info on the type that is affected that would help me a great deal.

qinsoon · 2023-10-11T21:38:50Z

@qinsoon If you have any info on the type that is affected that would help me a great deal.

I will see if I can narrow it down to some types. It may not be that easy: 1. It is possible that those are some string types that have increased sizes. But what we are really interested is the types that reference the strings, which probably won't have a size difference. 2. We can only tell what types have increased size by comparing two builds. If we just look into one build, we cannot tell much.

The only info that I know is probably the affected types are larger than 100 bytes. I will probably start by finding out what types have more than 100 bytes. I will see what I can find out.

qinsoon · 2023-10-12T06:09:31Z

@caizixian and I looked at this together. It turned out to be an issue of me using 'exploded builds' in the measurement. In an exploded build, OpenJDK uses absolute paths to refer to resources, like modules, classes, etc. jython further makes use of those absolute paths, and exaggerates the problem. In an image build, OpenJDK uses relative paths for the resources, and the path length doesn't matter any more. This can also explain why @k-sareen observed a larger min heap size for exploded builds -- this is at least one of the reasons that contribute to the larger min heap.

Also in jython, the min heap size for mark compact is 40-50M. We observed ~50k byte arrays in a GC that include the path. If we have 100 more characters in the path, that is 5M increase of the live heap size.

So there is no issue with jython, and the lesson is: do not use exploded build for performance measurement.

qinsoon · 2023-10-16T00:44:21Z

The following is the performance results for 4 plans (SS, GenCopy, Immix, GenImmix). There is no observable performance difference for the PR. plotty link

SemiSpace

GenCopy

Immix

GenImmix

wks

In ScheduleCollection::do_work, right after the line plan.schedule_collection(worker.scheduler());, there is an invocation of GCTriggerPolicy::on_gc_start. See:

    fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
        let plan = mmtk.get_plan();
        plan.base().set_collection_kind(plan);
        plan.base().set_gc_status(GcStatus::GcPrepare);

        plan.schedule_collection(worker.scheduler());

        // Tell GC trigger that GC started.
        // We now know what kind of GC this is (e.g. nursery vs mature in gen copy, defrag vs fast in Immix)
        // TODO: Depending on the OS scheduling, other workers can run so fast that they can finish
        // everything in the `Unconstrained` and the `Prepare` buckets before we execute the next
        // statement. Consider if there is a better place to call `on_gc_start`.
        mmtk.get_plan().base().gc_trigger.policy.on_gc_start(mmtk);   // HERE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    }
}

That line used to be in GCController::do_gc_until_completion after the controller finished executing ScheduleCollection. I moved it here when I removed "coordinator work". on_gc_start should be executed as soon as the kind of GC (nursery or not) is determined, but I had to execute it after plan.schedule_collection because the GC kind was set inside plan.schedule_collection. That was inaccurate because as the comment says, other workers may run so fast that they can finish many jobs before we call on_gc_start

With this PR, we can move on_gc_start between set_gc_status and plan.schedule_collection.

qinsoon · 2023-10-16T04:32:21Z

In ScheduleCollection::do_work, right after the line plan.schedule_collection(worker.scheduler());, there is an invocation of GCTriggerPolicy::on_gc_start. See:
    fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
        let plan = mmtk.get_plan();
        plan.base().set_collection_kind(plan);
        plan.base().set_gc_status(GcStatus::GcPrepare);

        plan.schedule_collection(worker.scheduler());

        // Tell GC trigger that GC started.
        // We now know what kind of GC this is (e.g. nursery vs mature in gen copy, defrag vs fast in Immix)
        // TODO: Depending on the OS scheduling, other workers can run so fast that they can finish
        // everything in the `Unconstrained` and the `Prepare` buckets before we execute the next
        // statement. Consider if there is a better place to call `on_gc_start`.
        mmtk.get_plan().base().gc_trigger.policy.on_gc_start(mmtk);   // HERE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    }
}
That line used to be in GCController::do_gc_until_completion after the controller finished executing ScheduleCollection. I moved it here when I removed "coordinator work". on_gc_start should be executed as soon as the kind of GC (nursery or not) is determined, but I had to execute it after plan.schedule_collection because the GC kind was set inside plan.schedule_collection. That was inaccurate because as the comment says, other workers may run so fast that they can finish many jobs before we call on_gc_start

With this PR, we can move on_gc_start between set_gc_status and plan.schedule_collection.

I thought about that, but I don't think we can move on_gc_start() to an earlier position. on_gc_start() will use information like whether the current GC is a full heap GC or not, and plan.schedule_collection() may decide that. So we should not call on_gc_start before plan.schedule_collection().

wks · 2023-10-16T05:05:47Z

... on_gc_start() will use information like whether the current GC is a full heap GC or not...

Not actually. FixedHeapSizeTrigger::on_gc_start is a NO-OP. In MemBalancerTrigger::on_gc_start, it only queries whether the GC is a generational GC (not whether the current GC is a nursery GC or full-heap GC). It will also query mmtk.get_plan().get_reserved_pages() which is already available at this time.

p.s. In on_gc_release and on_gc_end, the GC trigger does need to know whether the current GC is nursery, and the information is available at those points.

qinsoon · 2023-10-16T05:12:27Z

... on_gc_start() will use information like whether the current GC is a full heap GC or not...

Not actually. FixedHeapSizeTrigger::on_gc_start is a NO-OP. In MemBalancerTrigger::on_gc_start, it only queries whether the GC is a generational GC (not whether the current GC is a nursery GC or full-heap GC). It will also query mmtk.get_plan().get_reserved_pages() which is already available at this time.

p.s. In on_gc_release and on_gc_end, the GC trigger does need to know whether the current GC is nursery, and the information is available at those points.

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

wks · 2023-10-16T05:24:24Z

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

Indeed. It can be moved to the beginning of schedule_collection. It could be in GCController::do_gc_until_completion, too, but since we intend to remove the "controller" in the future, it's preferable to leave it in the ScheduleCollection work packet.

qinsoon · 2023-10-16T05:30:17Z

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

Indeed. It can be moved to the beginning of schedule_collection. It could be in GCController::do_gc_until_completion, too, but since we intend to remove the "controller" in the future, it's preferable to leave it in the ScheduleCollection work packet.

I moved the line for on_gc_start to the beginning of ScheduleCollection::do_work.

wks

LGTM

Extract set_collection_kind and set_gc_status

f26d3b7

qinsoon mentioned this pull request Sep 29, 2023

Introduce ArcFlexMut for Space references #965

Draft

Merge branch 'master' into extract-set-collection-kind

31b7830

Merge branch 'master' into extract-set-collection-kind

dcff613

qinsoon marked this pull request as ready for review October 16, 2023 00:44

qinsoon requested a review from wks October 16, 2023 00:44

caizixian approved these changes Oct 16, 2023

View reviewed changes

wks requested changes Oct 16, 2023

View reviewed changes

Move gc_trigger.policy.on_gc_start

eed0be6

wks approved these changes Oct 16, 2023

View reviewed changes

qinsoon enabled auto-merge October 16, 2023 05:58

qinsoon added this pull request to the merge queue Oct 16, 2023

Merged via the queue into mmtk:master with commit 29d482c Oct 16, 2023
19 checks passed

qinsoon deleted the extract-set-collection-kind branch October 16, 2023 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract set_collection_kind and set_gc_status #957

Extract set_collection_kind and set_gc_status #957

qinsoon commented Sep 19, 2023

qinsoon commented Sep 21, 2023

caizixian commented Sep 21, 2023

k-sareen commented Sep 25, 2023

qinsoon commented Oct 6, 2023

steveblackburn commented Oct 11, 2023

qinsoon commented Oct 11, 2023

steveblackburn commented Oct 11, 2023

steveblackburn commented Oct 11, 2023

qinsoon commented Oct 11, 2023

qinsoon commented Oct 12, 2023

qinsoon commented Oct 16, 2023

wks left a comment

qinsoon commented Oct 16, 2023

wks commented Oct 16, 2023

qinsoon commented Oct 16, 2023

wks commented Oct 16, 2023

qinsoon commented Oct 16, 2023

wks left a comment

Extract set_collection_kind and set_gc_status #957

Extract set_collection_kind and set_gc_status #957

Conversation

qinsoon commented Sep 19, 2023

qinsoon commented Sep 21, 2023

caizixian commented Sep 21, 2023

k-sareen commented Sep 25, 2023

qinsoon commented Oct 6, 2023

steveblackburn commented Oct 11, 2023

qinsoon commented Oct 11, 2023

SemiSpace

Immix

Mark Sweep

steveblackburn commented Oct 11, 2023

steveblackburn commented Oct 11, 2023

qinsoon commented Oct 11, 2023

qinsoon commented Oct 12, 2023

qinsoon commented Oct 16, 2023

SemiSpace

GenCopy

Immix

GenImmix

wks left a comment

Choose a reason for hiding this comment

qinsoon commented Oct 16, 2023

wks commented Oct 16, 2023

qinsoon commented Oct 16, 2023

wks commented Oct 16, 2023

qinsoon commented Oct 16, 2023

wks left a comment

Choose a reason for hiding this comment