Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-59793] Avoid hanging jobs with faulty SubTasks #4346

Merged
merged 6 commits into from Nov 13, 2019
Merged
43 changes: 26 additions & 17 deletions core/src/main/java/hudson/model/queue/WorkUnitContext.java
Expand Up @@ -108,12 +108,17 @@ public WorkUnit getPrimaryWorkUnit() {
* All the {@link Executor}s that jointly execute a {@link Task} call this method to synchronize on the start.
*/
public void synchronizeStart() throws InterruptedException {
stephenc marked this conversation as resolved.
Show resolved Hide resolved
startLatch.synchronize();
// the main thread will send a notification
Executor e = Executor.currentExecutor();
WorkUnit wu = e.getCurrentWorkUnit();
if (wu.isMainWork()) {
future.start.set(e.getCurrentExecutable());
// Let code waiting for the start (future.start.get()) finishes when there is a faulty SubTask by setting the
// future always.
try {
startLatch.synchronize();
} finally {
// the main thread will send a notification
Executor e = Executor.currentExecutor();
WorkUnit wu = e.getCurrentWorkUnit();
if (wu.isMainWork()) {
future.start.set(e.getCurrentExecutable());
MRamonLeon marked this conversation as resolved.
Show resolved Hide resolved
}
}
}

stephenc marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -130,17 +135,21 @@ public void synchronizeEnd(Queue.Executable executable, Throwable problems, long
* the member threads will report {@link InterruptedException}.
*/
public void synchronizeEnd(Executor e, Queue.Executable executable, Throwable problems, long duration) throws InterruptedException {
stephenc marked this conversation as resolved.
Show resolved Hide resolved
endLatch.synchronize();

// the main thread will send a notification
WorkUnit wu = e.getCurrentWorkUnit();
if (wu.isMainWork()) {
if (problems == null) {
future.set(executable);
e.getOwner().taskCompleted(e, task, duration);
} else {
future.set(problems);
e.getOwner().taskCompletedWithProblems(e, task, duration, problems);
// Let code waiting for the build to finish (future.get()) finishes when there is a faulty SubTask by setting
// the future always.
try {
endLatch.synchronize();
} finally {
// the main thread will send a notification
WorkUnit wu = e.getCurrentWorkUnit();
if (wu.isMainWork()) {
if (problems == null) {
future.set(executable);
e.getOwner().taskCompleted(e, task, duration);
} else {
future.set(problems);
e.getOwner().taskCompletedWithProblems(e, task, duration, problems);
}
}
}
}
stephenc marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
@@ -0,0 +1,105 @@
package hudson.model.queue;

import hudson.model.AbstractProject;
import hudson.model.Executor;
import hudson.model.FreeStyleBuild;
import hudson.model.FreeStyleProject;
import hudson.model.Label;
import hudson.model.Node;
import hudson.model.Queue;
import hudson.model.ResourceList;
import org.junit.Rule;
import org.junit.Test;
import org.jvnet.hudson.test.Issue;
import org.jvnet.hudson.test.JenkinsRule;
import org.jvnet.hudson.test.LoggerRule;
import org.jvnet.hudson.test.TestExtension;

import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.concurrent.TimeUnit;
import java.util.logging.Level;

import static org.hamcrest.Matchers.notNullValue;
import static org.junit.Assert.assertThat;
import static org.junit.Assert.assertTrue;

public class BuildKeepsRunningWhenFaultySubTasksTest {
@Rule
public JenkinsRule j = new JenkinsRule();

@Rule
public LoggerRule logs = new LoggerRule().record(Executor.class.getName(), Level.SEVERE).capture(100); //just in case future versions add more logs

public static final String ERROR_MESSAGE = "My unexpected exception";

// When using SubTaskContributor (FailingSubTaskContributor) the build never ends
@Test
@Issue("JENKINS-59793")
public void buildDoesntFinishWhenSubTaskFails() throws Exception {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void buildDoesntFinishWhenSubTaskFails() throws Exception {
public void buildFinishesWhenSubTaskFails() throws Exception {

I think the name is appropriate before the fix but after the fix it should say what's actually asserting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed with the other change

FreeStyleProject p = j.createProject(FreeStyleProject.class);
QueueTaskFuture<FreeStyleBuild> future = p.scheduleBuild2(0);
assertThat("Build should be actually scheduled by Jenkins", future, notNullValue());

// We get stalled waiting the finalization of the job
FreeStyleBuild build = future.get(5, TimeUnit.SECONDS);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also check the build result is a failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to assert that, the purpose of the test is to guarantee we pass this point, so let's keep the minimum needed as agreed above.

assertTrue("The message is printed in the log", logs.getRecords().stream().anyMatch(r -> r.getThrown().getMessage().equals(ERROR_MESSAGE)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is really necessary (reaching this point would be enough to assert the fix?) and you could save all the logger rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not needed if the code passes the get. It's a reminiscence from a lot of tests I've been doing. I can remove it.

}

// A SubTask failing with an exception
@TestExtension
public static class FailingSubTaskContributor extends SubTaskContributor {
@Override
public Collection<? extends SubTask> forProject(final AbstractProject<?, ?> p) {
return Collections.singleton(new SubTask() {
private final SubTask outer = this;

public Queue.Executable createExecutable() throws IOException {
return new Queue.Executable() {
public SubTask getParent() {
return outer;
}

public void run() {
throw new ArrayIndexOutOfBoundsException(ERROR_MESSAGE);
}

public long getEstimatedDuration() {
return 0;
}
};
}

@Override
public Label getAssignedLabel() {
return null;
}
@Override
public Node getLastBuiltOn() {
return null;
}
@Override
public long getEstimatedDuration() {
return 0;
}
@Override
public Queue.Task getOwnerTask() {
return p;
}
@Override
public Object getSameNodeConstraint() {
return null;
}
@Override
public ResourceList getResourceList() {
return ResourceList.EMPTY;
}
@Override
public String getDisplayName() {
return "Subtask of " + p.getDisplayName();
}
});
}
}
}