Cancel old jobs when a new patch is uploaded #5

Merged
merged 1 commit into from Feb 14, 2012

Projects

None yet

6 participants

bnovc commented Sep 15, 2011

When a new patch set is posted, building the previous patch set
is no longer valuable. This causes an enormous extra load on the
build servers. If someone rapidly posts several revisions of a
patch, it could cause enormous delays and wasted resources in
Jenkins.

If others feel a need to make this an option, I can, but for the
environments I have seen, this is always desired behavior.

Owner
rtyler commented Sep 15, 2011

This maps to our desired use case as well.

Stupid GitHub doesn't let me just "watch" a pull request without making an inane comment :)

Member

Great! I've been wanting this feature as well for a long time.
I will take a closer look at it as soon as I'm able.

Member

Looks good, I actually have a user or two who would like to be able to turn this off.

Could you add a unit test or two to verify that we don't break the it in the future.

bnovc commented Oct 10, 2011

I made this optional, so I think this is acceptable to go in. I still get errors trying to run the unit tests. I'll email you my current ones.

@VestniK VestniK commented on the diff Dec 20, 2011
...ugins/gerrit/trigger/hudsontrigger/GerritTrigger.java
@@ -249,12 +267,31 @@ protected void schedule(GerritCause cause, PatchsetCreated event, AbstractProjec
new RetriggerAllAction(cause.getContext()),
createParameters(event, project));
+ // check if we're running any jobs for this event
+ if (runningJobs.containsKey(event.getChange())) {
+ // if we were, let's cancel them
+ Future oldBuild = runningJobs.remove(event.getChange());
+ if (PluginImpl.getInstance().getConfig().isGerritBuildCurrentPatchesOnly()) {
VestniK
VestniK Dec 20, 2011

Is it really necessary to remove item from runningJobs if this condition is false? Would it be better to rewrite this block in the following way:
if (runningJobs.containsKey(event.getChange()) && PluginImpl.getInstance().getConfig().isGerritBuildCurrentPatchesOnly()) {
Future oldBuild = runningJobs.remove(event.getChange());
oldBuild.cancel(true)
}
?

bnovc
bnovc Dec 20, 2011

I wrote it this way so the option could be off and then turned on and work for existing jobs (see that it adds them unconditionally also).

@VestniK VestniK commented on the diff Dec 20, 2011
...ugins/gerrit/trigger/hudsontrigger/GerritTrigger.java
@@ -241,7 +259,7 @@ protected void schedule(GerritCause cause, PatchsetCreated event) {
*/
protected void schedule(GerritCause cause, PatchsetCreated event, AbstractProject project) {
//during low traffic we still don't want to spam Gerrit, 3 is a nice number, isn't it?
- boolean ok = project.scheduleBuild(
+ Future build = project.scheduleBuild2(
VestniK
VestniK Dec 20, 2011

You should modify the tests which expects scheduleBuild to be called as well (there are 17 in the GerritTriggerTest class which fails now)

bnovc
bnovc Dec 20, 2011

I couldn't ever get the tests to run. That's why I've left this patch dormant. I get a ton of connection errors on one machine I tried it on, and I can't even get mvn package to run on my other machine (it hangs downloading dependencies every time). Hadn't spent the time looking into either of those, yet.

Do you just run mvn package, and you get errors about this?

I'll take your change (89d1c9) and incorporate it into this.

VestniK
VestniK Dec 21, 2011

Yes both "mvn package" and "mvn install" runs tests. I'm using fresh Ubuntu installation (witn open-jdk-6 and maven2 packages from officiall repository) and Arch Linux (again open-jdk-6 and maven2 from off. repos.). Everything works fine on both systems.

Tried your commit. All tests are passed.

VestniK commented Dec 20, 2011

In order to pass all of the tests I had to make the following changes: VestniK@89d1c97

VestniK commented Dec 20, 2011

I'm unsure about my changes in the first diff (ToGerritRunListener.java file). I've tried to fix NullPointerException at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListenerTest.testOnCompleted (ToGerritRunListenerTest.java:132) Can somebody take a look on it?

I'm ready to test merge of this pull request with my open pull request (#4 and #8) in our working environment.

@VestniK VestniK commented on an outdated diff Dec 27, 2011
...ugins/gerrit/trigger/hudsontrigger/GerritTrigger.java
event.getChange().getNumber() + "/" + event.getPatchSet().getNumber(), });
}
/**
+ * Used to inform the plugin that the builds for a job have ended
+ * This allows us to clean up our list of what jobs we're running
+ */
+ public void notifyBuildEnded(PatchsetCreated patchset) {
+ runningJobs.remove(patchset);
VestniK
VestniK Dec 27, 2011

Should this line be runningJobs.remove(patchset.getChange()); ?

Anthony Newnam Cancel jobs when a new patch set is created
When a new patch set is posted, building the previous patch set
is no longer valuable. This causes an enormous extra load on the
build servers. If someone rapidly posts several revisions of a
patch, it could cause enormous delays and wasted resources in
Jenkins.

This only does part of the work, because calling cancel() on a
Future that is provided by scheduleBuild2() does not cancel jobs
that are in the queue. A ticket has been created to address this,
available at: https://issues.jenkins-ci.org/browse/JENKINS-12152
81d484c
bnovc commented Jan 25, 2012

Ping

Member

Sorry, I didn't get notified when new commits got added, and I was waiting for how the discussions around VestniK's comments turned out :)
I'll take a look soon.

bnovc commented Feb 1, 2012

This is causing a NPE on our Jenkins instance. I haven't had a chance/environment to debug it, yet. It seems like this may be a symptom of another problem within the plugin creating invalid patch sets, but I may be mis-using them.

INFO: JenkinsAdmin-Gerrit-trigger-verification #180 main build action completed: SUCCESS
Feb 1, 2012 6:50:49 AM hudson.model.Executor run
SEVERE: Executor threw an exception
java.lang.NullPointerException
at com.sonyericsson.hudson.plugins.gerrit.trigger.hudsontrigger.GerritTrigger.notifyBuildEnded(GerritTrigger.java:291)
at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener.onCompleted(ToGerritRunListener.java:90)
at com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener.onCompleted(ToGerritRunListener.java:49)
at hudson.model.listeners.RunListener.fireCompleted(RunListener.java:178)
at hudson.model.Run.run(Run.java:1454)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)

Member
rsandell commented Feb 1, 2012

Do you think it is something caused by this change?

bnovc commented Feb 1, 2012

I don't know if it is caused by it; it seems like it may just be exposing a problem. It was working fine for a while, also.

@rsandell rsandell merged commit a9342a4 into jenkinsci:master Feb 14, 2012
Member

Hmm, We've done some testing on it. And we get some strange behaviour.
The test cases that I wrote pass like a charm so it seems to work.
But when we test it with a running server, both via mvn hpi:run and on a Testing server running Jenkins 1.424.1 in a tomcat instance, the build don't get cancelled.

I wrote a groovy script that fetches the runningJobs var and runs the following command on it:
println("\tCanceled: " + trigger.runningJobs.values().iterator().next().cancel(true));
So in effect it is running the same method that the code is doing. The cancel call returns true but the build keeps on building and its only a shell doing sleep 300.
Clicking cancel on the build page works.

any ideas?

bnovc commented Feb 15, 2012

Odd, that isn't the behavior that I have. What version of Jenkins are you running? I'm on 1.441. The master and slaves are Windows Server 2008. I think they're on a 64bit JVM.

Is the "sleep 300" in an "Execute shell" box?

Member

Tested on 1.424.1 and when you run with maven it is 1.400 since that is what the pom is configured for.
Ubuntu 10.04 64bit.

Yes the sleep is in an execute shell box, but cancelling the build from the UI works, so it's not like the build isn't possible to kill, its just future.cancel that doesn't aparently.

Member

Now we have tested on 1.441 on Windows XP to rule out all differences.
What we see on both linux/windows and 1.400/1.424/1.441 is the same:

Submitting two patch sets on the same change works fine, the first one is cancelled.
Adding the third patch set doesn't cause the nr 2 to cancel. Any further patch sets added
while nr 3 is waiting on nr 2 won't trigger at all.

Manually cancelling nr 2 and nr 3 can get it working
again ( for the duration of one new patch set, then it bugs out again ) but it can also end up in a state where nothing is triggered, even though no builds are in queue or building.

We also noticed a nullpointer being thrown when an old project is deserialized from a server restart, this was what caused what Bobby described above, when we couldn't get it to work at all. We have a fix for the Nullpointer, but after quite a bit of debugging, we still can't explain the weird behavior I described above.

bnovc, could you test adding a lot of patch sets after each other to the same change and see if you can reproduce this problem?

bnovc commented Feb 17, 2012

I was able to reproduce this. The problem looks like it is because we're doing .remove() on the generic change, but it needs to be sure to only remove the change if it is also the same patch set. I'll put up a patch for this soon (maybe Monday as I'm heading home).

I also experienced a weird problem after pulling in the changes that went on top of this. runningJobs was always null for me. It seems that the explicit constructor that should have created it was not being created. I changed it to a singleton function to create and return it which avoided the problem. Seems kind of like a Java bug, but I haven't looked anymore into that.

Member

I fixed the null problem with commit 8d976b4
I also added some thread safety in that as well.

bnovc commented Feb 20, 2012

Sorry I wasn't clear. I meant a different NPE problem. I've posted a fix to my NPE problem and the canceling issue. See #11

Is this feature stable yet?

Member

Yes as of version 2.7.0 we consider it stable.
From the changelog:
"Build current patchsets only is no longer experimental"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment