Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract set_collection_kind and set_gc_status #957

Merged
merged 4 commits into from
Oct 16, 2023

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Sep 19, 2023

This PR extracts set_collection_kind and set_gc_status from each plan's schedule_collection, and calls them directly in the ScheduleCollection work packet. This removes duplicated code, and also makes our implementation consistent with Java MMTk.

For generational plans, in schedule_collection in each plan, our current code calls requires_full_heap_collection first, and then call set_collection_kind. With the changes in the PR, requires_full_heap_collection is called after set_collection_kind, which is the same as in Java MMTk (super.collectionPhase(phaseId);). The implications are:

  1. When set_collection_kind forces a full heap GC, and set_collection_kind is called before requires_full_heap_collection, we will do a full heap GC in the current GC (which is the expected behavior in Java MMTk).
  2. set_collection_kind may increase cur_collection_attempt, which makes it more likely to trigger an emergency GC, and makes it more likely to trigger a full heap GC.

In summary, we miss some opportunities where we should do a full heap GC. With the changes, we should properly trigger full heap GCs when we should.

@qinsoon
Copy link
Member Author

qinsoon commented Sep 21, 2023

I got some strange performance results for the PR, running SemiSpace. For jython, the number of GCs is increased for the PR, thus the STW time is increased. The result is reproducible.

ss-jython

The change in this PR should be a no-op for SemiSpace. It just moves the call to set_collection_kind and set_gc_status from the plan's schedule_collection to the ScheduleCollection work packet. I tried to log the page usage after each GC. I got a plot below. It shows for this PR, we have more live pages at the end of each GC, and that is the reason why we need to do more GCs. There is no emergency GCs triggered for both builds (emergency GCs should not affect semispace any way, but it may affect reference processing. Reference processing is not enabled in the run).

combined_plot

I have no idea why this would happen. Any idea would be helpful.

@caizixian
Copy link
Member

This might be an simple but interesting example for relational debugging cc @steveblackburn

@k-sareen
Copy link
Collaborator

Have you been able to reproduce it on a different microarchitecture?

@qinsoon
Copy link
Member Author

qinsoon commented Oct 6, 2023

I am tracing down this bug. It turns out that the number of objects scanned in each GC for the two builds are the same, but the total bytes of the scanned objects are different. The total bytes is calculated with ObjectModel::get_current_size(). I also tried to get the min/max of object sizes in each GC, and it is unchanged. So it looks like, for some reason, ObjectModel::get_current_size() returns a different size for a subset of the objects for the PR. I am trying to figure it out.

@steveblackburn
Copy link
Contributor

The idea that Jython might behave differently with different build names is not implausible. Jython is a python implementation, and it makes a lot of sense that it sees and uses the users' "environment", and thus would be affected by the path length of the working directory in which it is run. It's alarming that it would affect the GC to this extent, but it's not implausible IMO.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 11, 2023

The path length will increase object size in jython. So the longer the path is, the larger the minheap size is. This will cause more GCs and more STW time.

The following is the number of GCs and the STW time for semispace, immix, and mark sweep for different path lengths. I use folder names of length 1 to 255. The X axis is the base folder /home/yilin + variable folder name. If the data for length=255 is missing, it means OOM and we do not have results for it.

SemiSpace

path-length-ss-120m

Immix

path-length-ix-120m

Mark Sweep

path-length-ms-480m

I will reevaluate this PR.

@steveblackburn Should I open an issue for DaCapo?

@steveblackburn
Copy link
Contributor

Yes, I think we need to fix this in DaCapo

@steveblackburn
Copy link
Contributor

@qinsoon If you have any info on the type that is affected that would help me a great deal.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 11, 2023

@qinsoon If you have any info on the type that is affected that would help me a great deal.

I will see if I can narrow it down to some types. It may not be that easy: 1. It is possible that those are some string types that have increased sizes. But what we are really interested is the types that reference the strings, which probably won't have a size difference. 2. We can only tell what types have increased size by comparing two builds. If we just look into one build, we cannot tell much.

The only info that I know is probably the affected types are larger than 100 bytes. I will probably start by finding out what types have more than 100 bytes. I will see what I can find out.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 12, 2023

@caizixian and I looked at this together. It turned out to be an issue of me using 'exploded builds' in the measurement. In an exploded build, OpenJDK uses absolute paths to refer to resources, like modules, classes, etc. jython further makes use of those absolute paths, and exaggerates the problem. In an image build, OpenJDK uses relative paths for the resources, and the path length doesn't matter any more. This can also explain why @k-sareen observed a larger min heap size for exploded builds -- this is at least one of the reasons that contribute to the larger min heap.

Also in jython, the min heap size for mark compact is 40-50M. We observed ~50k byte arrays in a GC that include the path. If we have 100 more characters in the path, that is 5M increase of the live heap size.

So there is no issue with jython, and the lesson is: do not use exploded build for performance measurement.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 16, 2023

The following is the performance results for 4 plans (SS, GenCopy, Immix, GenImmix). There is no observable performance difference for the PR. plotty link

SemiSpace

ss-stw
ss-total

GenCopy

gencopy-stw
gencopy-total

Immix

ix-stw
ix-total

GenImmix

genix-stw
genix-total

@qinsoon qinsoon marked this pull request as ready for review October 16, 2023 00:44
@qinsoon qinsoon requested a review from wks October 16, 2023 00:44
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ScheduleCollection::do_work, right after the line plan.schedule_collection(worker.scheduler());, there is an invocation of GCTriggerPolicy::on_gc_start. See:

    fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
        let plan = mmtk.get_plan();
        plan.base().set_collection_kind(plan);
        plan.base().set_gc_status(GcStatus::GcPrepare);

        plan.schedule_collection(worker.scheduler());

        // Tell GC trigger that GC started.
        // We now know what kind of GC this is (e.g. nursery vs mature in gen copy, defrag vs fast in Immix)
        // TODO: Depending on the OS scheduling, other workers can run so fast that they can finish
        // everything in the `Unconstrained` and the `Prepare` buckets before we execute the next
        // statement. Consider if there is a better place to call `on_gc_start`.
        mmtk.get_plan().base().gc_trigger.policy.on_gc_start(mmtk);   // HERE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    }
}

That line used to be in GCController::do_gc_until_completion after the controller finished executing ScheduleCollection. I moved it here when I removed "coordinator work". on_gc_start should be executed as soon as the kind of GC (nursery or not) is determined, but I had to execute it after plan.schedule_collection because the GC kind was set inside plan.schedule_collection. That was inaccurate because as the comment says, other workers may run so fast that they can finish many jobs before we call on_gc_start

With this PR, we can move on_gc_start between set_gc_status and plan.schedule_collection.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 16, 2023

In ScheduleCollection::do_work, right after the line plan.schedule_collection(worker.scheduler());, there is an invocation of GCTriggerPolicy::on_gc_start. See:

    fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
        let plan = mmtk.get_plan();
        plan.base().set_collection_kind(plan);
        plan.base().set_gc_status(GcStatus::GcPrepare);

        plan.schedule_collection(worker.scheduler());

        // Tell GC trigger that GC started.
        // We now know what kind of GC this is (e.g. nursery vs mature in gen copy, defrag vs fast in Immix)
        // TODO: Depending on the OS scheduling, other workers can run so fast that they can finish
        // everything in the `Unconstrained` and the `Prepare` buckets before we execute the next
        // statement. Consider if there is a better place to call `on_gc_start`.
        mmtk.get_plan().base().gc_trigger.policy.on_gc_start(mmtk);   // HERE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    }
}

That line used to be in GCController::do_gc_until_completion after the controller finished executing ScheduleCollection. I moved it here when I removed "coordinator work". on_gc_start should be executed as soon as the kind of GC (nursery or not) is determined, but I had to execute it after plan.schedule_collection because the GC kind was set inside plan.schedule_collection. That was inaccurate because as the comment says, other workers may run so fast that they can finish many jobs before we call on_gc_start

With this PR, we can move on_gc_start between set_gc_status and plan.schedule_collection.

I thought about that, but I don't think we can move on_gc_start() to an earlier position. on_gc_start() will use information like whether the current GC is a full heap GC or not, and plan.schedule_collection() may decide that. So we should not call on_gc_start before plan.schedule_collection().

@wks
Copy link
Collaborator

wks commented Oct 16, 2023

... on_gc_start() will use information like whether the current GC is a full heap GC or not...

Not actually. FixedHeapSizeTrigger::on_gc_start is a NO-OP. In MemBalancerTrigger::on_gc_start, it only queries whether the GC is a generational GC (not whether the current GC is a nursery GC or full-heap GC). It will also query mmtk.get_plan().get_reserved_pages() which is already available at this time.

p.s. In on_gc_release and on_gc_end, the GC trigger does need to know whether the current GC is nursery, and the information is available at those points.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 16, 2023

... on_gc_start() will use information like whether the current GC is a full heap GC or not...

Not actually. FixedHeapSizeTrigger::on_gc_start is a NO-OP. In MemBalancerTrigger::on_gc_start, it only queries whether the GC is a generational GC (not whether the current GC is a nursery GC or full-heap GC). It will also query mmtk.get_plan().get_reserved_pages() which is already available at this time.

p.s. In on_gc_release and on_gc_end, the GC trigger does need to know whether the current GC is nursery, and the information is available at those points.

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

@wks
Copy link
Collaborator

wks commented Oct 16, 2023

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

Indeed. It can be moved to the beginning of schedule_collection. It could be in GCController::do_gc_until_completion, too, but since we intend to remove the "controller" in the future, it's preferable to leave it in the ScheduleCollection work packet.

@qinsoon
Copy link
Member Author

qinsoon commented Oct 16, 2023

I see, that's true. In this case, we don't even need to know what kind of GC it is. We should be able to call on_gc_start at the very beginning of a GC. Is it?

Indeed. It can be moved to the beginning of schedule_collection. It could be in GCController::do_gc_until_completion, too, but since we intend to remove the "controller" in the future, it's preferable to leave it in the ScheduleCollection work packet.

I moved the line for on_gc_start to the beginning of ScheduleCollection::do_work.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinsoon qinsoon added this pull request to the merge queue Oct 16, 2023
Merged via the queue into mmtk:master with commit 29d482c Oct 16, 2023
19 checks passed
@qinsoon qinsoon deleted the extract-set-collection-kind branch October 16, 2023 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants