-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Note: I still don't have concrete plans to refactor the code. This issue is open for discussion.
"poll" is not polling
The method GCTrigger::poll(space_full, space)
is used in two ways.
- (
space_full == false
) It is called from time to time (i.e. polling) to determine whether it is a good time to do GC, and it will trigger a GC when the trigger and/or the plan decides it is needed. This is done in multiple places:Space::acquire
: Before acquiring new pages from the page resource, it will poll.MallocSpace::alloc
: BecauseMallocSpace
does not use page resource, it polls on every allocation.memory_manager::gc_poll
: This allows the VM binding to poll, too.
- (
space_full == true
) Force a GC. This is only called in one place:Space::acquire
: It will force a GC if it failed to acquire page from the page resource.
When space_full
is true, it will always trigger a GC. I don't think "polling" is the right word. In English, "polling" means asking asking a large number of people about their opinions about a particular subject. In computer science, "polling" refers to the practice of "asking" devices for their status, usually in a periodic manner. In MMTk, it means asking the GC trigger whether it is time to do GC.
If it always triggers GC, it is no longer polling. I feel it confusing when refactoring Space::acquire
and allocation options because it includes two poll
calls, and we debated about whether to enable/disable the first poll
or the second poll
. In my opinion (and to VM binding developers as well), the second poll
is not really a "poll" but forcing a GC. It is hard to describe why there are two "pollings" which are done at different times.
"collection_required" is not really a question
The call hierarchy goes from GCTrigger::poll
to Plan::collection_required
and, for most plans, directly to BasePlan::collection_required
. BasePlan::collection_required
returns true
if space_full
is true. Generational plans also checks the nursery size so that they return true
if the nursery is full. ConcurrentImmix
also triggers concurrent GC if it is half way to the heap size since the last GC. But in any case, space_full == true
will guarantee a GC.
But the real problem with the naming is that collection_required
is not a question. It is an operation with many side effects. Particularly, it will set many fields of GlobalState
or Plan
to determine the nature of the next GC triggered. Here is a list of side effects collection_required
can have:
BasePlan::collection_required
:- Clears
GlobalState::allocation_bytes
to0
if it exceedsOptions::stress_factor
, and triggers GC.
- Clears
Gen::collection_required
:- Sets
CommonGenPlan::next_gc_full_heap
totrue
if any space other than the nursery is full.- By "full", it means
Space::acquire
tried to get pages from the page resources but failed.
- By "full", it means
- Note: If the nursery is full, it bypasses
BasePlan::collection_required
, and will not clear stress factor. I think it is a bug and the developer forgot thatBasePlan::collection_required
has side effects.
- Sets
StickyImmix::collection_required
:- Sets
StickyImmix::next_gc_full_heap
if any space other thanimmix_space
is full. - Note: Unlike
Gen
,StickyImmix::collection_required
does not skipBasePlan::collection_required
.
- Sets
ConcurrentImmix::collection_required
:- Sets
self.should_do_full_gc
totrue
ifBasePlan::collection_required
returnstrue
.
- Sets
From this observation, it is incorrect to skip BasePlan::collection_required
, and we should give all involved parties a chance to set global or plan-specific states before returning true
or false
. The "involved parties" include
- The total heap size: That is the
heap_full
variable inBasePlan::collection_required
, which is ultimately obtained fromGCTrigger::is_heap_full()
. - Stress GC: That is the
stress_force_gc
variable inBasePlan::collection_required
. It works independently from the heap size. - Page resource: That is the
space_full
parameter. - Concrete plans: That include generational GCs and ConcurrentImmix
If there is one place to enumerate all "involved parties", why not just let GCTrigger::poll
ask each of them, instead of letting each Plan
implement collection_required
and call the super method?
What about MMTK::handle_user_collection_request
?
It bypasses GCTrigger::poll
and calls gc_requester.request();
directly. This will guarantee to trigger a GC, but will not ask plans for things like whether to do full-heap GC or concurrent GC. Instead, MMTK::handle_user_collection_request
has parameters:
pub fn handle_user_collection_request(
&self,
tls: VMMutatorThread,
force: bool,
exhaustive: bool,
) -> bool { ... }
force
will bypassOptions::ignore_system_gc
andCollection::is_collection_enabled()
.exhaustive
will trigger a full-heap GC. It setsCommonGenPlan::next_gc_full_heap
orStickyImmix::next_gc_full_heap
to true if the plan is generational.
It bypasses the total heap size, which is expected because it it intended to let the user trigger GC before the heap is full.
It bypasses the stress GC because it is intended to let the user trigger GC before reaching the next stress GC threshold.
It bypasses the page resource because it is not on allocation.
But it does have behaviors specific to concrete plans. The exhaustive
parameter is only relevant to generational plans. If we want MMTK::handle_user_collection_request
to be really general, we should add a force_stop_the_world: bool
parameter, too, so that if the plan is concurrent, it can trigger a fall-back STW GC if allowed.
Difference between handle_user_collection_request
and poll(space_full=true)
GCTrigger::poll(space_full=true)
will still ask the concrete plan to determine the nature of the next GC. To do this, GCTrigger::poll
will still call Plan::collection_required
.
I think it is still necessary to call a method in Plan
so that the plan can participate. But the method name Plan::collection_required
should be changed because it is not really a question. I can think of some possible names:
Plan::check_if_collection_is_required_and_determine_gc_kind
: That's what it does.Plan::on_polling
: Tell it that handles polling. But ifspace_full=true
, it's not really polling. Perhaps we need a better name.Plan::on_polling_or_force_gc
: err...
How to refactor
How do we split poll(space_full, space)
?
Remove the space_full
argument (implies false
). Make it just fn poll(space: Option<&Space>) -> bool
.
Add another method GCTrigger::on_space_full(space: &Space)
. The space
argument is compulsory, and it is guaranteed to trigger GC (so no return value).
Both may call Plan::collection_required
(or Plan::check_if_collection_is_required_and_determine_gc_kind
if we prefer).
Refactoring Plan::collection_required
Rename it, as mentioned above.
Can we offload stress GC and the full heap size out of Plan
and move them to the GCTrigger
instead? All current plans check the full heap size. What about reference counting? I don't know.
There are other things, such as whether the collection should be a defrag GC, that are not done in Plan::collection_required
.
- Can we postpone some of the decisions (e.g. whether it is nursery or full-heap GC, or whether it is concurrent or STW GC) to a later point, such as
Plan::schedule_collection
? - Can we do the opposite, i.e. move some decisions (e.g. whether to do defrag or non-moving GC) to
Plan::collection_required
?