Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8265842: G1: Introduce API to run multiple separate tasks in a single gangtask #3653

Conversation

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Apr 23, 2021

JDK-8214237 is going to join different somewhat independent (sub-)tasks of the post evacuation phase into basically two gangtasks to improve performance (reduce startup and shutdown of work gangs).

To facilitate doing that, introduce an API that manages subtasks of a given gangtask and provide some helpers to do common work.

This CR is only about adding the API and testing to somewhat simplify the actual move of the post evacuate tasks into parallel tasks. There is no new functionality here, only to split up JDK-8214237 a bit.

Here is the full change that uses this API to implement JDK-8214237 (which I'll try to split up a bit before) to see how it will be used (in G1YoungGCPostEvacuateTasks.hpp ).

Testing: new gtest, tier1-5 with mentioned patch for JDK-8214237.

Thanks,
Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8265842: G1: Introduce API to run multiple separate tasks in a single gangtask

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3653/head:pull/3653
$ git checkout pull/3653

Update a local copy of the PR:
$ git checkout pull/3653
$ git pull https://git.openjdk.java.net/jdk pull/3653/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3653

View PR using the GUI difftool:
$ git pr show -t 3653

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3653.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 23, 2021

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@tschatzl tschatzl changed the title G1: Introduce API to run multiple separate tasks in a single gangtask 8265842: G1: Introduce API to run multiple separate tasks in a single gangtask Apr 23, 2021
@openjdk openjdk bot added the rfr label Apr 23, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 23, 2021

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Loading

@openjdk openjdk bot added the hotspot-gc label Apr 23, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 23, 2021

Loading

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Apr 23, 2021

There is one question I would like to have discussed: to make the code more compact the API uses constructor and destructor for setup/teardown-like methods.

This makes use a bit more clumsy that expected, see the G1CollectedHeap::post_evacuate_cleanup_1/2 methods:

void G1CollectedHeap::post_evacuate_cleanup_1(G1ParScanThreadStateSet* per_thread_states,
                                          G1RedirtyCardsQueueSet* rdcqs) {
  Ticks start = Ticks::now();
  {
    G1PostEvacuateCollectionSetCleanupTask1 cl(per_thread_states, rdcqs);
    uint num_threads = MAX2(1u, MIN2(cl.num_busy_workers(), workers()->active_workers()));
    cl.set_max_workers(num_threads);
    workers()->run_task(&cl, num_threads);
   }
   [...]
 }

i.e. since the constructor is used for setup, there is some small code duplication necessary to time execution of such a G1BatchedGangTask. @walulyai suggested (similar to other hotspot code) to introduce a setup method, but no teardown.

There is no reason for me to not to, any concerns or other ideas?

Loading

src/hotspot/share/gc/g1/g1BatchedGangTask.cpp Outdated Show resolved Hide resolved
Loading
test/hotspot/gtest/gc/g1/test_g1BatchedGangTask.cpp Outdated Show resolved Hide resolved
Loading
@openjdk openjdk bot removed the rfr label Apr 27, 2021
@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Apr 27, 2021

This makes use a bit more clumsy that expected, see the G1CollectedHeap::post_evacuate_cleanup_1/2 methods:

void G1CollectedHeap::post_evacuate_cleanup_1(G1ParScanThreadStateSet* per_thread_states,
                                          G1RedirtyCardsQueueSet* rdcqs) {
  Ticks start = Ticks::now();
  {
    G1PostEvacuateCollectionSetCleanupTask1 cl(per_thread_states, rdcqs);
    uint num_threads = MAX2(1u, MIN2(cl.num_busy_workers(), workers()->active_workers()));
    cl.set_max_workers(num_threads);
    workers()->run_task(&cl, num_threads);
   }
   [...]
 }

i.e. since the constructor is used for setup, there is some small code duplication necessary to time execution of such a G1BatchedGangTask. @walulyai suggested (similar to other hotspot code) to introduce a setup method, but no teardown.

There is no reason for me to not to, any concerns or other ideas?

I'm not sure I exactly understand how setup() would be used, but a run_batched_task(...) might help some of the duplication:

void G1CollectedHeap::run_batched_task(G1BatchedGangTask* task) {
  uint num_threads = MAX2(1u, MIN2(task.num_busy_workers(), workers()->active_workers()));
  task.set_max_workers(num_threads);
  workers()->run_task(task, num_threads);
}

I assume the work in the constructor and destructor should be measured, since it is in the example, otherwise the timing could be added to the helper as well.

Basically the only other question/concern I have after looking at the patch is if we can some how get rid or change num_busy_workers(). I guess we might need some way to estimate the needed number of workers but does it have to return a double. To me it is strange that something can require a fraction of a worker. I understand that certain task are very small and for those we could consider returning 0. Or would that cause other problems?

Loading

@openjdk openjdk bot added the rfr label Apr 27, 2021
Copy link
Contributor

@lkorinth lkorinth left a comment

Looks good to me.

Loading

// How many workers (threads) would this task be able to keep busy for at least
// as long as to amortize worker startup costs.
// Called by G1BatchedGangTask to determine total number of workers.
virtual double num_busy_workers() const { return 1.0; }
Copy link
Contributor

@lkorinth lkorinth Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make pure virtual?

Loading

template <typename E, MEMFLAGS F>
class GrowableArrayCHeap;

// G1AbstractSubTask represents a task to be performed either within an
Copy link
Contributor

@lkorinth lkorinth Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

within an -> within a

Loading

// Subclasses of this class add their G1AbstractSubTasks into either the list
// of "serial" or the list of "parallel" tasks.
// During execution in the work gang, this class will make sure that the "serial"
// tasks are executed by a single worker only exactly once, but different "serial"
Copy link
Contributor

@lkorinth lkorinth Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker only exactly once -> worker exactly once

Loading

// tasks are executed by a single worker only exactly once, but different "serial"
// tasks may be executed in parallel using different workers. "Parallel" tasks'
// do_work() method may be called by different workers passing a different
// worker_id at the same time, but at most once per given worker id.
Copy link
Contributor

@lkorinth lkorinth Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker id -> worker_id (I hope to one day rename worker_id to work_id, work_unit, or something else, but if we are not referring to the variable, we should not describe it as a worker id, as it does not describe a worker but a unit of work.

Loading

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Apr 28, 2021

[...]
I'm not sure I exactly understand how setup() would be used, but a run_batched_task(...) might help some of the duplication:
[...]

Exactly that has been introduced in the follow-up change.

[...]

Basically the only other question/concern I have after looking at the patch is if we can some how get rid or change num_busy_workers(). I guess we might need some way to estimate the needed number of workers but does it have to return a double. To me it is strange that something can require a fraction of a worker. I understand that certain task are very small and for those we could consider returning 0. Or would that cause other problems?

Generally there are two interesting cases when running a gang task: using one (or maybe two) threads, and using all of them as indicated by WorkGang::active_workers(). Particularly with more sub tasks in a batch the latter will probably more common than now, so the "in-between" ones will be exceedingly rare.

The case where we use just one worker is particularly interesting though: we could avoid spinning up a worker thread in this case completely, using the VM thread to execute the batch (all gang tasks at least in G1 support that at this time, we just do not do it). With an integer num_busy_workers() value it is very hard to achieve that (i.e. sum(num_busy_workers()) <= 1.0 given such a batch task always contains at least few serial tasks. And if everyone returns 1 at minimum, there is no way this will work reasonably.

Note that currently (and in that follow-up CR) particularly for these serial tasks the num_busy_workers() return value isn't particularly well set (and I just recently "fixed" one for a parallel task...) so that there is a good chance it will not work out of the box, but we haven't enabled using the VM thread anyway yet.
And there is the question whether we can somehow either completely drop or move to concurrent some of that work; however in many cases pre/post evacuation work is very small compared to actual evacuation. Work for another day...

Loading

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Apr 28, 2021

[...]
I'm not sure I exactly understand how setup() would be used, but a run_batched_task(...) might help some of the duplication:
[...]

Exactly that has been introduced in the follow-up change.

Nice!

[...]
Basically the only other question/concern I have after looking at the patch is if we can some how get rid or change num_busy_workers(). I guess we might need some way to estimate the needed number of workers but does it have to return a double. To me it is strange that something can require a fraction of a worker. I understand that certain task are very small and for those we could consider returning 0. Or would that cause other problems?

Generally there are two interesting cases when running a gang task: using one (or maybe two) threads, and using all of them as indicated by WorkGang::active_workers(). Particularly with more sub tasks in a batch the latter will probably more common than now, so the "in-between" ones will be exceedingly rare.

The case where we use just one worker is particularly interesting though: we could avoid spinning up a worker thread in this case completely, using the VM thread to execute the batch (all gang tasks at least in G1 support that at this time, we just do not do it). With an integer num_busy_workers() value it is very hard to achieve that (i.e. sum(num_busy_workers()) <= 1.0 given such a batch task always contains at least few serial tasks. And if everyone returns 1 at minimum, there is no way this will work reasonably.

Note that currently (and in that follow-up CR) particularly for these serial tasks the num_busy_workers() return value isn't particularly well set (and I just recently "fixed" one for a parallel task...) so that there is a good chance it will not work out of the box, but we haven't enabled using the VM thread anyway yet.
And there is the question whether we can somehow either completely drop or move to concurrent some of that work; however in many cases pre/post evacuation work is very small compared to actual evacuation. Work for another day...

I see your points and agree there is a value in being able to have a way to define the cost of a sub-task. What I have the biggest problem with is that num_busy_workers() isn't a uint (which num_workers usually is). So maybe we could call it double G1AbstractSubTask::cost()/worker_cost() instead. It would be the same thing, the estimated amount of workers needed. To me it feels more natural that the cost can be a fraction. Maybe also rename uint G1BatchedGangTask::num_busy_workers() to num_workers() or estimated_workers(). Would this be ok with you?

Loading

Copy link
Contributor

@kstefanj kstefanj left a comment

Thanks Thomas for updating the names. This looks good.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 29, 2021

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265842: G1: Introduce API to run multiple separate tasks in a single gangtask

Reviewed-by: lkorinth, ayang, sjohanss, iwalulya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 129 new commits pushed to the master branch:

  • df7f0b4: 8198317: Enhance JavacTool.getTask for flexibility
  • 115a413: 8265123: Add static factory methods to com.sun.net.httpserver.Filter
  • 39abac9: 8266176: -Wmaybe-uninitialized happens in libArrayIndexOutOfBoundsExceptionTest.c
  • 155da25: 8265005: Introduce the new client property for mac: apple.awt.windowTitleVisible
  • 91226fa: 8265940: Enable C2's optimization for Math.pow(x, 0.5) on all platforms
  • 56cde70: 8266265: mark hotspot compiler/vectorization tests which ignore VM flags
  • 4937214: 8266174: -Wmisleading-indentation happens in libmlib_image sources
  • b305eff: 8266238: mark hotspot compiler/inlining tests which ignore VM flags
  • df3b2d0: 8266264: mark hotspot compiler/eliminateAutobox tests which ignore VM flags
  • 6b263e6: 8266256: compiler.vectorization.TestBufferVectorization does testing twice
  • ... and 119 more: https://git.openjdk.java.net/jdk/compare/191f1fc46c30f7e92ba09d04bc000256442e64ed...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Apr 29, 2021
Copy link
Member

@walulyai walulyai left a comment

lgtm!

Loading

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Apr 29, 2021

Thanks @walulyai @kstefanj @albertnetymk @lkorinth for your reviews!

Loading

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Apr 29, 2021

/integrate

Loading

@openjdk openjdk bot closed this Apr 29, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Apr 29, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 29, 2021

@tschatzl Since your change was applied there have been 133 commits pushed to the master branch:

  • 294347b: 8265918: java/io/Console/CharsetTest.java failed with "expect: spawn id exp6 not open"
  • 84b52db: 8265666: Enable AIX build platform to make external debug symbols
  • dd8286e: 8198616: java/awt/Focus/6378278/InputVerifierTest.java fails on mac
  • 5574922: 8266284: ProblemList java/awt/Graphics2D/DrawString/DrawRotatedStringUsingRotatedFont.java
  • df7f0b4: 8198317: Enhance JavacTool.getTask for flexibility
  • 115a413: 8265123: Add static factory methods to com.sun.net.httpserver.Filter
  • 39abac9: 8266176: -Wmaybe-uninitialized happens in libArrayIndexOutOfBoundsExceptionTest.c
  • 155da25: 8265005: Introduce the new client property for mac: apple.awt.windowTitleVisible
  • 91226fa: 8265940: Enable C2's optimization for Math.pow(x, 0.5) on all platforms
  • 56cde70: 8266265: mark hotspot compiler/vectorization tests which ignore VM flags
  • ... and 123 more: https://git.openjdk.java.net/jdk/compare/191f1fc46c30f7e92ba09d04bc000256442e64ed...master

Your commit was automatically rebased without conflicts.

Pushed as commit c76ce28.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

@tschatzl tschatzl deleted the submit/8265842-api-for-batched-gang-tasks branch Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants