Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED JENKINS-31627] canceled TimerTasks should be purged. #47

Merged
merged 1 commit into from Nov 21, 2015

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented Nov 18, 2015

The TimerTask could be way off into the future as it is user input that
decides when this is run (and or jobs running multiple days the inactivity
could be 1 day). Without purging the Timer the TimerTasks stay into the
queue until such time that they are the next task to run - at which time
they are removed. But with all other triggers in a system (polling every
minute etc) added timers could stay for their maximum duration.

For arguments sake say this is 1 day and you have a build that produces
10000 lines of output every 10 hours and uses the NoActivity strategy.
In this case for every call to write you will have a new TimerTask created
that should trigger in 24hours time - it is easy to see these 10000
TimerTasks are referenced from the Timer and as such will not be garbage
collected for a long while.
If you have lots of these styles of jobs - the required heap to run this
becomes massive.

@reviewbybees
JENKINS-31627

The TimerTask could be way off into the future as it is user input that
decides when this is run (and or jobs running multiple days the inactivity
could be 1 day).  Without purging the Timer the TimerTasks stay into the
queue until such time that they are the next task to run - at which time
they are removed.  But with all other triggers in a system (polling every
minute etc) added timers could stay for their maximum duration.

For arguments sake say this is 1 day and you have a build that produces
10000 lines of output every 10 hours and uses the NoActivity strategy.
In this case for every call to write you will have a new TimerTask created
that should trigger in 24hours time - it is easy to see these 10000
TimerTasks are referenced from the Timer and as such will not be garbaged
collected for a long while.
If you have lots of these styles of jobs - the required heap to run this
becomes massive.
@ghost
Copy link

ghost commented Nov 18, 2015

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@jtnord
Copy link
Member Author

jtnord commented Nov 18, 2015

The Timer internals are all private (who would have expected anything else) so ran out of good ideas to write a unit test for this - the thought of using reflection entered my head but the insides are likely to change between releases so seemed overly fragile. But the code is trivial - and the fix has been verified by hand and heap dumps.

@rsandell
Copy link
Member

🐝

@@ -193,6 +195,8 @@ public synchronized void rescheduleIfScheduled() {
public synchronized boolean tearDown(AbstractBuild build, BuildListener listener) throws IOException, InterruptedException {
if (task != null) {
task.cancel();
// avoid memory leaks for the case where this timer is in the future (JENKINS-31627).
Trigger.timer.purge();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the connection between Trigger.timer.purge() and the task being canceled. Is Trigger.timer a ThreadLocal that has some relevant context that allows it to purge the correct TimerTask?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it pruges all canceled tasks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfennelly
Task is a java.util.TimerTask that is in a java.util.Timer

You cancel the task so that when the timer comes to run it it won't run it - but the timer still holds a reference to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfennelly calling purge tells the timer to walk through the list of Tasks it has and remove any that are cancelled.

Its not as cheap as inserting new tasks - or letting them natrually drop of when they come to run - but the instance that showed this issue had * over 3 million* of these objects - and there should be only a few hundred or so. So whilst calling purge() can be expensive - it works out cheaper than letting them pile up and running out of memory.

@tfennelly
Copy link
Member

🐝

@jtnord
Copy link
Member Author

jtnord commented Nov 18, 2015

@reviewbybees done

@ghost
Copy link

ghost commented Nov 18, 2015

This pull request has completed our internal processes and we now respectfully request the maintainers of this repository to consider our proposal contained within this pull request for merging.

@stephenc
Copy link
Member

🐝

@jglick
Copy link
Member

jglick commented Nov 18, 2015

🐝 but note that Trigger.timer has been deprecated in favor of Timer for a while now. Perhaps the modern java.util.concurrent APIs do not suffer from this awful flaw to begin with?

@jtnord
Copy link
Member Author

jtnord commented Nov 18, 2015

@jglick ScheduledThreadPoolExecutor does not appear to have a similar flaw - but the Jenkins version that introduced this (1.541) is way later than the very old (1.466) version of Jenkins that this plugin is targeting.

@ikedam
Copy link
Member

ikedam commented Nov 21, 2015

Thanks for the important fix.

@ikedam
Copy link
Member

ikedam commented Nov 21, 2015

Tested with Execute shell

#!bash

for i in $(seq 300); do
  for j in $(seq 65535); do
    echo ${i} ${j}
  done
done
Execution Time # of TimeoutTimerTask (after execution) # of TimeoutTimerTask (after GC)
without the fix 5 min 38 sec 11,310,919 7,552,750
with the fix 5 min 23 sec 1,124,989 0

(Counted the # of objects with jmap tool)
I saw that the problem was fixed.
This change doesn't cause a performance issue.

ikedam added a commit that referenced this pull request Nov 21, 2015
[FIXED JENKINS-31627] canceled TimerTasks should be purged.
@ikedam ikedam merged commit 5db020d into jenkinsci:master Nov 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants