Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254758: Change G1ServiceThread to be task based #734

Closed
wants to merge 14 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Oct 19, 2020

Please review this enhancement to make G1ServiceThread task based. This thread has evolved from having a single responsibility to do more things. And there are plans to add more to it. To make this easier and to allow different tasks to run at different intervals this change adds a simple task mechanism to the service thread.

The idea is that the service thread keeps a list of G1ServiceTask-objects that are ordered by when they should execute. When no tasks are ready to be run the service thread check how far into the future the next task is scheduled and sleeps/waits for that duration. If a new task is added while the thread is waiting it will be woken up to check for ready tasks.

There are currently two tasks in the list. They have been extracted from the old service thread into the tasks: G1PeriodicGCTask and G1RemSetSamplingTask. These are more or less identical to the old code, but in the form of tasks. The only difference is that G1PeriodicGCTask no longer need to keep track of the last attempt, since it will be rescheduled with the correct interval.

To reschedule a task is to update the time when it should be executed next and adding it back to the list. This insertion is O(n) and it should be fine since the expected number of tasks handled by the service thread is low. If a task don't want to be rescheduled it can set a negative timeout and that will cause the task to be dropped from the list. There are currently no such tasks and adding such tasks come with the responsibility of making sure resources are taken care of when the task is no longer active.

I've added a gtest to do some basic VM-testing and apart from a lot of local testing I've run mach5 tier 1-4 a few times with good results.

I have created a few follow up tasks to look at moving the current tasks to more appropriate locations and also updated a few old service thread related tasks with the label: gc-g1-service-thread


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (5/5 passed) ✔️ (2/2 passed) (2/2 running) (1/2 running)
Test (tier1) (7/9 running) (9/9 running)

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/734/head:pull/734
$ git checkout pull/734

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 19, 2020

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 19, 2020

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc label Oct 19, 2020
@kstefanj kstefanj marked this pull request as ready for review Oct 19, 2020
@openjdk openjdk bot added the rfr label Oct 19, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 19, 2020

Webrevs

@albertnetymk
Copy link
Member

@albertnetymk albertnetymk commented Oct 19, 2020

  1. There's a sentinel node in the task list. I wonder if it's possible to get rid of it. This way, G1ServiceTask can be made abstract, and G1ServiceTask::execute and G1ServiceTask::timeout can be pure and unreachable statically.
  2. Currently, the task timeout is a double, which says very little on what precision is actually supported. I think an integral type would be better.
  3. "If a task don't want to be rescheduled it can set a negative timeout and that will cause the task to be dropped from the list." I wonder if timeout=0 is allowed, and what that means.

@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Oct 19, 2020

1. There's a sentinel node in the task list. I wonder if it's possible to get rid of it. This way, `G1ServiceTask` can be made abstract, and `G1ServiceTask::execute` and `G1ServiceTask::timeout` can be pure and unreachable statically.

I tried this out first, but I like not having to check for NULL when adding. Another idea I had to was create a specific G1SentinelTask, to allow execute() and timeout() to be pure virtual. If you like this approach I'm happy to go in that direction.

2. Currently, the task timeout is a `double`, which says very little on what precision is actually supported. I think an integral type would be better.

I agree, I would prefer to avoid double all together but it is a quite nice to use os::elapsedTime() for the timings since the logging-framework prints time this way. And since you then have a double as the base value having the timeout using the same is quite nice, but I'll take a look an see if I can make it look nice with integral type. But the scheduled time will still be using a double.

3. "If a task don't want to be rescheduled it can set a negative timeout and that will cause the task to be dropped from the list." I wonder if `timeout=0` is allowed, and what that means.

Timeout 0 basically means run me again instantly unless someone else is up. For example looking forward the uncommit task will reschedule itself with a 0 timeout avoiding putting the service thread to sleep but still making it possible for another task to run between two invocations.

@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Oct 20, 2020

Updated after Alberts input:

  • Sentinel task is now a separate class, allowing G1ServiceTask to be pure virtual.
  • Changed double G1ServiceTask::timeout() to int64_t G1ServiceTask::timeout_ms()

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 20, 2020

Webrevs

Copy link
Member

@albertnetymk albertnetymk left a comment

Thank you for the update regarding 1 and 2. As for timeout=0, it's not obvious to me how it will be resolved if there are multiple timeout=0 tasks. Since there are no such tasks yet, I am fine with the current status. Ideally, please provide some doc on the behavior of timeout=0 next to virtual int64_t timeout_ms() = 0;; sth like what's said for "negative" values.

src/hotspot/share/gc/g1/g1ServiceThread.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1ServiceThread.hpp Outdated Show resolved Hide resolved
@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Oct 20, 2020

Thank you for the update regarding 1 and 2. As for timeout=0, it's not obvious to me how it will be resolved if there are multiple timeout=0 tasks. Since there are no such tasks yet, I am fine with the current status. Ideally, please provide some doc on the behavior of timeout=0 next to virtual int64_t timeout_ms() = 0;; sth like what's said for "negative" values.

I will update the comment. The way it will work is that each task is rescheduled using os::elapsedTime() and the timeout. So a task with timeout=0 would still be scheduled after a task with a time value smaller than what's returned by os::elapsedTime(). Unless I'm overseeing something, this should be fine with multiple tasks as well.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Thanks for your comments Ivan.

src/hotspot/share/gc/g1/g1ServiceThread.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1ServiceThread.hpp Outdated Show resolved Hide resolved
@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Oct 20, 2020

Latest commit includes some additional renaming apart from what's been discussed here. Had a chat with Ivan and Albert to come up with a bit better names and also moved the logic to decide if a task should be rescheduled to a separate function on the task, rather than using a special value for the delay_ms() (former timeout_ms()).

void add_ordered(G1ServiceTask* task);
bool is_empty();
};

// The G1ServiceThread is used to periodically do a number of different tasks:
// - re-assess the validity of the prediction for the
// remembered set lengths of the young generation.
// - check if a periodic GC should be scheduled.
class G1ServiceThread: public ConcurrentGCThread {
private:
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary "private"

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// The G1ServiceThread is used to periodically do a number of different tasks:
// - re-assess the validity of the prediction for the
// remembered set lengths of the young generation.
// - check if a periodic GC should be scheduled.
class G1ServiceThread: public ConcurrentGCThread {
private:
// The monitor is used to ensure thread saftey for the task queue
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/saftey/safety

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Test that a task that is added while the service thread is
// waiting gets run in a timely manner.
TEST_VM(G1ServiceThread, test_add_while_waiting) {
// Make sure default tasks use long intervals.
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so that they do not interfere with the test as the ServiceTask automatically registers them.

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the comment, the reason we use the long interval is to make sure that the service thread is waiting when the new task is added and is woken up correctly.

}

// Now fake a run-loop, that reschedules the tasks using a
// random multiplyer.
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/multiplyer/multiplier

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Now fake a run-loop, that reschedules the tasks using a
// random multiplyer.
for (double now = 0; now < 1000; now++) {
// Random multiplyier is at least 1 to ensure progress.
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/multiplyer/multiplier

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, how many ways can one spell multipler without getting it correct :)

log_debug(gc, task, start)("G1 Service Thread (%s) (run)", task->name());
task->execute();

double duration = os::elapsedVTime() - start;
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of elapsedVTime() seems wrong here and above as I would only care about wall time between executions.

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For walltime between executions you will have to enable gc+task+start to get the start time of tasks and look in the log how long it was between two executions (or have the specific task log this). This is measuring the time it took to execute a task, the reason I use vTime is because a task can join the suspendible thread set, and because of this be waiting for a big part of the execution. Using vTime will then show how much the task really used.

If it should be allowed that a task wait for a GC to complete could of course be changed, and then using wall clock time would work.

Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing wrt to all other similar messages. All other concurrent tasks print wall time from start to end. Also for periodicity control (i.e. did it start and end on time) wall time is typically more interesting.

While it is interesting to know how much cpu resources a task took (and for noticing when your machine is overloaded), it is much more interesting to know when it started and ended in absolute terms to see that it was on time or not. I understand that you can derive this yourselves, but this seems tiresome.

Maybe print both similar to "time" output showing both real time and cpu time? I.e. something like

"G1 Service Thread (%s) %1.3fms (cpu: %1.3fms)", task->name(), ...."

(fwiw I think the existing use of elapsedVTime particularly for limiting marking step length in g1 is wrong)

While we are at logging, JFR logging for the tasks would be interesting for these tasks too. Could you file a new CR for that?

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the double logging and filed JDK-8255740 for the JFR events.

if (os::supports_vtime()) {
_vtime_accum = (os::elapsedVTime() - vtime_start);
} else {
_vtime_accum = 0.0;
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this use of elapsedVTime() guarded by os::supports_vtime() and not the others (if they are valid)?

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it might make sense to not report anything if we don't get a "real" VTime. For the duration of the task I think returning normal elapsedTime is fine in the event of supports_vtime() returning false.
If we want to change this use I think we can do that when moving the measurement into the G1RemSetSamplingTask, see 8252645.

assert(_monitor.owned_by_self(), "Must be owner of lock");
assert(!_task_queue.is_empty(), "Should not be called for empty list");

double time_diff = _task_queue.peek()->time() - os::elapsedTime();
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the use of os::elapsedTime() wherever I see it :) I would prefer if an integer time were used.

Also, this code implicitly assumes that start time is start of the VM, not start of the thread. So if startup takes 100ms (random number), all items registered in the queue are immediately executed (assuming they have an interval < 100ms).

The os class actually provides a os::elapsed_counter()/javaTimeNanos/Millis for that which I think is better here.

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree, I started looking at using an integer time but switched after looking a bit at how the timing is done for the concurrent mark tasks and also that the logging is using elapsedTime.

Since both you and Albert (and me too) prefer using an integer I will take a look at using elapsedCounter, this should be easily converted to the same timestamps as used by the logging.

}

virtual uint64_t delay_ms() {
return G1ConcRefinementServiceIntervalMillis;
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a (pre-existing) issue: we do not really want to re-schedule the G1RemSetSamplingTask every 100ms (i.e. time stamp 100/200/300/...), but 100ms after the last GC. Eg. consider that the task will be scheduled at 300 for 400 (G1ConcRefinementServiceIntervalMillis), then there is a 200ms GC making the G1RemSetSamplingTask due right after GC.

That's why a "schedule(delay)" call on the task (or the queue, or the servicethread) makes a lot of sense to me (see also below for rationale).

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change when the tasks are run in this change and I think could be addressed as part of JDK-8254999.

But I totally agree with your comment, it would probably make sense to check when the last GC occured and schedule the task with that in mind.

// G1PeriodicGCInterval is a manageable flag and can be updated
// during runtime. If no value is set, wait a second and run it
// again to see if the value has been updated. Otherwise use the
// real value provided.
return G1PeriodicGCInterval == 0 ? 1000 : G1PeriodicGCInterval;
Copy link
Contributor

@tschatzl tschatzl Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-existing: I think this delay has the same issue as the young rem set sampling thread below: G1PeriodicGCInterval specifies that there should be at least one GC after G1PeriodicGCInterval ms after the last gc. Not every G1PeriodicGCInterval ms.

We probably need a task that polls the flag which then schedules the actual periodic gc or something like that.

Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that with this new approach we can do better than before and schedule the task to check at a more appropriate time. I think this should be addressed as part of JDK-8255001, so I added a comment to the issue.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Thanks for all input Thomas.

assert(_monitor.owned_by_self(), "Must be owner of lock");
assert(!_task_queue.is_empty(), "Should not be called for empty list");

double time_diff = _task_queue.peek()->time() - os::elapsedTime();
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree, I started looking at using an integer time but switched after looking a bit at how the timing is done for the concurrent mark tasks and also that the logging is using elapsedTime.

Since both you and Albert (and me too) prefer using an integer I will take a look at using elapsedCounter, this should be easily converted to the same timestamps as used by the logging.

log_debug(gc, task, start)("G1 Service Thread (%s) (run)", task->name());
task->execute();

double duration = os::elapsedVTime() - start;
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For walltime between executions you will have to enable gc+task+start to get the start time of tasks and look in the log how long it was between two executions (or have the specific task log this). This is measuring the time it took to execute a task, the reason I use vTime is because a task can join the suspendible thread set, and because of this be waiting for a big part of the execution. Using vTime will then show how much the task really used.

If it should be allowed that a task wait for a GC to complete could of course be changed, and then using wall clock time would work.

if (os::supports_vtime()) {
_vtime_accum = (os::elapsedVTime() - vtime_start);
} else {
_vtime_accum = 0.0;
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it might make sense to not report anything if we don't get a "real" VTime. For the duration of the task I think returning normal elapsedTime is fine in the event of supports_vtime() returning false.
If we want to change this use I think we can do that when moving the measurement into the G1RemSetSamplingTask, see 8252645.

@@ -28,43 +28,87 @@
#include "gc/shared/concurrentGCThread.hpp"
#include "runtime/mutex.hpp"

class G1ServiceTask : public CHeapObj<mtGC> {
// The next time this task should be executed.
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

void add_ordered(G1ServiceTask* task);
bool is_empty();
};

// The G1ServiceThread is used to periodically do a number of different tasks:
// - re-assess the validity of the prediction for the
// remembered set lengths of the young generation.
// - check if a periodic GC should be scheduled.
class G1ServiceThread: public ConcurrentGCThread {
private:
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Now fake a run-loop, that reschedules the tasks using a
// random multiplyer.
for (double now = 0; now < 1000; now++) {
// Random multiplyier is at least 1 to ensure progress.
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, how many ways can one spell multipler without getting it correct :)

}

// Now fake a run-loop, that reschedules the tasks using a
// random multiplyer.
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

virtual uint64_t delay_ms();
virtual bool should_reschedule();
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a more explicit approach has advantages. What do you think about this approach:

  • Changed G1ServiceThread::reschedule_task(task) G1ServiceThread::schedule_task(task, delay)
  • Add protected G1ServiceThread::reschedule(delay) that calls G1ServiceThread::schedule_task()
  • Removed call to reschedule_task() from main loop.

}

virtual uint64_t delay_ms() {
return G1ConcRefinementServiceIntervalMillis;
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change when the tasks are run in this change and I think could be addressed as part of JDK-8254999.

But I totally agree with your comment, it would probably make sense to check when the last GC occured and schedule the task with that in mind.

// G1PeriodicGCInterval is a manageable flag and can be updated
// during runtime. If no value is set, wait a second and run it
// again to see if the value has been updated. Otherwise use the
// real value provided.
return G1PeriodicGCInterval == 0 ? 1000 : G1PeriodicGCInterval;
Copy link
Contributor Author

@kstefanj kstefanj Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that with this new approach we can do better than before and schedule the task to check at a more appropriate time. I think this should be addressed as part of JDK-8255001, so I added a comment to the issue.

@albertnetymk
Copy link
Member

@albertnetymk albertnetymk commented Oct 30, 2020

Now that the task deadline is an integer, DBL_MAX should not be used in this PR.

class G1ServiceTaskQueue {
// The sentinel task is the entry point of this priority queue holding the
// service tasks. The queue is ordered by the time the tasks are scheduled
// to run and the sentinel task has the time set to DBL_MAX. This guarantees
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DBL_MAX/max_jlong. Maybe just something like: the sentinel task is put at the end of the queue to simplify list management.

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

public:
G1ServiceTask(const char* name);
const char* name();
void set_service_thread(G1ServiceThread* thread);
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just make the ServiceThread a friend of ServiceTask(Queue) to avoid the setters (set_service_thread and set_time and probably others)?
In the end you can't use one without the others anyway, i.e. they are already very tightly coupled.
These management methods seem to only clutter the public interface (i.e. you can still use the methods internally)

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made both G1ServiceThread and G1ServiceTaskQueue friends with G1ServiceTask, so now all setters are either private or protected. The ones that are protected are used by either tests and/or the sentinel task. These could also be made private and list all tasks needing them as friends, but I went with this solution. What do you think?

// Returns the time in milliseconds until the next task is due.
// Used both to determine if there are tasks ready to run and
// how long to sleep when nothing is ready.
int64_t time_to_next_task_ms();
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend returning a positive number here because it only raises questions what this means. The implementation of the method also never returns negative values.

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using int64_t is that Monitor::wait() takes an int64_t. Albert filed an issue to change that: JDK-8255119, but until that is fixed I prefer using the same type here.


public:
G1ServiceThread();
double vtime_accum() { return _vtime_accum; }
void register_task(G1ServiceTask* task, jlong delay = 0);
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment what the delay means, i.e. something like: automatically schedules the task with the given delay.

From what is written here I am actually a bit confused about the difference between register_task() and schedule_task(). One seems to be to use by other threads, and the other one should only be called by a ServiceTask (in context of the ServiceThread).

Do we actually need to make a difference between these two and/or do both need to be public? At first glance I would not be sure who is allowed to call which and what the difference actually is.
It also seems that if either is called by the wrong thread bad things happen. Maybe it is possible to add some asserts to warn about most unintentional misuse?

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these are very similar and as you say it's not good that both are public, I changed this by making G1ServiceTask a friend of G1ServiceThread, so that schedule_task() can be private. Also added short comments to help when only reading the interface.

src/hotspot/share/gc/g1/g1ServiceThread.cpp Show resolved Hide resolved
log_debug(gc, task, start)("G1 Service Thread (%s) (run)", task->name());
task->execute();

double duration = os::elapsedVTime() - start;
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing wrt to all other similar messages. All other concurrent tasks print wall time from start to end. Also for periodicity control (i.e. did it start and end on time) wall time is typically more interesting.

While it is interesting to know how much cpu resources a task took (and for noticing when your machine is overloaded), it is much more interesting to know when it started and ended in absolute terms to see that it was on time or not. I understand that you can derive this yourselves, but this seems tiresome.

Maybe print both similar to "time" output showing both real time and cpu time? I.e. something like

"G1 Service Thread (%s) %1.3fms (cpu: %1.3fms)", task->name(), ...."

(fwiw I think the existing use of elapsedVTime particularly for limiting marking step length in g1 is wrong)

While we are at logging, JFR logging for the tasks would be interesting for these tasks too. Could you file a new CR for that?

void G1ServiceTaskQueue::add_ordered(G1ServiceTask* task) {
assert(task != NULL, "not a valid task");
assert(task->next() == NULL, "invariant");
assert(task->time() != DBL_MAX, "invalid time for task");
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DBL_MAX/jlong_max ?

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, fixed.

virtual void execute();
};

class G1ServiceTaskQueue {
Copy link
Contributor

@tschatzl tschatzl Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add:
// Time ordered list of tasks to execute.

Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this covered by the comment just below.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Thanks for your comments Thomas and Albert.

log_debug(gc, task, start)("G1 Service Thread (%s) (run)", task->name());
task->execute();

double duration = os::elapsedVTime() - start;
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the double logging and filed JDK-8255740 for the JFR events.

void G1ServiceTaskQueue::add_ordered(G1ServiceTask* task) {
assert(task != NULL, "not a valid task");
assert(task->next() == NULL, "invariant");
assert(task->time() != DBL_MAX, "invalid time for task");
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, fixed.

// Returns the time in milliseconds until the next task is due.
// Used both to determine if there are tasks ready to run and
// how long to sleep when nothing is ready.
int64_t time_to_next_task_ms();
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using int64_t is that Monitor::wait() takes an int64_t. Albert filed an issue to change that: JDK-8255119, but until that is fixed I prefer using the same type here.


public:
G1ServiceThread();
double vtime_accum() { return _vtime_accum; }
void register_task(G1ServiceTask* task, jlong delay = 0);
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these are very similar and as you say it's not good that both are public, I changed this by making G1ServiceTask a friend of G1ServiceThread, so that schedule_task() can be private. Also added short comments to help when only reading the interface.

virtual void execute();
};

class G1ServiceTaskQueue {
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this covered by the comment just below.

public:
G1ServiceTask(const char* name);
const char* name();
void set_service_thread(G1ServiceThread* thread);
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made both G1ServiceThread and G1ServiceTaskQueue friends with G1ServiceTask, so now all setters are either private or protected. The ones that are protected are used by either tests and/or the sentinel task. These could also be made private and list all tasks needing them as friends, but I went with this solution. What do you think?

class G1ServiceTaskQueue {
// The sentinel task is the entry point of this priority queue holding the
// service tasks. The queue is ordered by the time the tasks are scheduled
// to run and the sentinel task has the time set to DBL_MAX. This guarantees
Copy link
Contributor Author

@kstefanj kstefanj Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@tschatzl tschatzl left a comment

Thanks for your changes. Lgtm.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 3, 2020

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254758: Change G1ServiceThread to be task based

Reviewed-by: ayang, iwalulya, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 38 new commits pushed to the master branch:

  • 9a0cf58: 8255615: Zero: demote ZeroStack::abi_stack_available guarantee to assert
  • 904561e: 8255719: Zero: on return path, check for pending exception before attempting to clear it
  • 9bd836e: 8255744: Zero: handle JVM_CONSTANT_DynamicInError
  • 36998b0: 8255716: AArch64: Regression: JVM crashes if manually offline a core
  • 4107670: 8233562: [TESTBUG] Swing StyledEditorKit test bug4506788.java fails on MacOS
  • 9a36747: 8255780: Remove unused overloads of VMError::report_and_die()
  • c96a914: 8255662: ZGC: Unify nmethod closures in the heap iterator
  • aa2862a: 8255741: Zero: print signal name in unhandled signal handler
  • 1580574: 8255672: Replace PhaseTransform::eqv by pointer equality check
  • e7a2d5c: 8252533: Signal handlers should run with synchronous error signals unblocked
  • ... and 28 more: https://git.openjdk.java.net/jdk/compare/54c881325422f646ebc6d1d1873c038ff648b99a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Nov 3, 2020
@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Nov 3, 2020

Thanks @albertnetymk, @tschatzl and @walulyai for the reviews.

@kstefanj
Copy link
Contributor Author

@kstefanj kstefanj commented Nov 3, 2020

/integrate

@openjdk openjdk bot closed this Nov 3, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 3, 2020
@kstefanj kstefanj deleted the 8254758 branch Nov 3, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Nov 3, 2020

@kstefanj Since your change was applied there have been 38 commits pushed to the master branch:

  • 9a0cf58: 8255615: Zero: demote ZeroStack::abi_stack_available guarantee to assert
  • 904561e: 8255719: Zero: on return path, check for pending exception before attempting to clear it
  • 9bd836e: 8255744: Zero: handle JVM_CONSTANT_DynamicInError
  • 36998b0: 8255716: AArch64: Regression: JVM crashes if manually offline a core
  • 4107670: 8233562: [TESTBUG] Swing StyledEditorKit test bug4506788.java fails on MacOS
  • 9a36747: 8255780: Remove unused overloads of VMError::report_and_die()
  • c96a914: 8255662: ZGC: Unify nmethod closures in the heap iterator
  • aa2862a: 8255741: Zero: print signal name in unhandled signal handler
  • 1580574: 8255672: Replace PhaseTransform::eqv by pointer equality check
  • e7a2d5c: 8252533: Signal handlers should run with synchronous error signals unblocked
  • ... and 28 more: https://git.openjdk.java.net/jdk/compare/54c881325422f646ebc6d1d1873c038ff648b99a...master

Your commit was automatically rebased without conflicts.

Pushed as commit 1d0bd50.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this issue Nov 3, 2020
Reviewed-by: ayang, iwalulya, tschatzl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc integrated
4 participants