Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

Conversation

svanoort
Copy link
Member

@svanoort svanoort commented Aug 28, 2018

JENKINS-50199.

Should finally put the last variant of this problem to rest. Root cause: execution completed (and this is saved), but something prevented saving the build itself with completed status. Unfortunately we didn't do anything to detect this case and mark the build itself as completed if its execution was.

Solving the save itself is a separate problem - since that can potentially be a result of I/O failures or abrupt shutdowns, we have to handle the possibility anyway.

Also adds in the suite of persistence problem tests from the CPS plugin, since both the CPS and workflow plugins need to have persistence logic correct internally (and we need to verify this since either can break it).

@jglick
Copy link
Member

jglick commented Aug 29, 2018

adds in the suite of persistence problem tests from the CPS plugin

You are copying all these tests? This seems like the start of a lot of technical debt.

I understand that you want to catch regressions arising from various sources, as long noted in JENKINS-45047, but this is not a good resolution. Work on fixing our CI setup rather than duplicating hundreds of lines of test code just to make sure it gets run more often.

@@ -132,6 +132,12 @@
<version>2.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>pipeline-input-step</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SemaphoreStep instead.

FlowExecution exec = listener.get();
while(exec.getCurrentHeads().isEmpty() || (exec.getCurrentHeads().get(0) instanceof FlowStartNode)) { // Wait until input step starts
System.out.println("Waiting for input step to begin");
Thread.sleep(50);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More easily and reliably done with SemaphoreStep.

// Defer the normal listener to ensure onLoad can complete before finish() is called since that may
// need the build to be loaded and can result in loading loops otherwise.
fetchedExecution.removeListener(finishListener);
fetchedExecution.addListener(new GraphL());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: minor merge conflict with #27.

@jglick
Copy link
Member

jglick commented Aug 29, 2018

Code change looks OK I suppose. As far as the tests go: if there is a particular test in workflow-cps-plugin which reproduces the stated problem if you were to update the workflow-job test dep, then simply revert all but the src/main/java/ portion of this PR, wait for Incrementals deployment, and then bump the test dep in workflow-cps-plugin.

(And if there is no such test yet, then why are you copying all these tests here? Write a new test, here, to reproduce the problem, if you can.)

story.then( j->{
WorkflowJob r = j.jenkins.getItemByFullName(DEFAULT_JOBNAME, WorkflowJob.class);
WorkflowRun run = r.getBuildByNumber(build[0]);
assertCompletedCleanly(run);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add Assert.assertEquals(Result.SUCCESS, run.getResult()); to show that copying the result from the FlowEndNode works correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(addressed by 35b435d)

@Test
@Issue("JENKINS-50199")
/** Replicates case where builds resume when the should not due to build's completion not being saved. */
public void completedExecutionButRunIncomplete() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jglick @dwnusbaum This is the net-new testcase.

@svanoort
Copy link
Member Author

@jglick We have long needed to have the tests in both places to catch regressions introduced in both Workflow-Job and Workflow-CPS persistence calls. In the past we did indeed use a SNAPSHOT/incremental build of workflow-job and then did a bump to workflow-CPS -- but that's a slow and inefficient process where it's easy to slip in regressions by failing to run something against the latest build.

Adapting existing tests from CPS (plus a new one) isn't perfect solution but when we need to be focused on velocity, refactoring tests should not be a high priority -- and we really do need that coverage.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks fine to me based on the investigation with Sam yesterday. I don't have a strong opinion either way on the tests. It's nice to have them run automatically here, but it also means the two copies of the tests can get out of sync with each other. Improving the testing infrastructure would be great, but it doesn't seem trivial.

Copy link

@kshultzCB kshultzCB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test case is good. Thanks for walking me through it. 👍

@svanoort svanoort merged commit 3510070 into jenkinsci:master Aug 30, 2018
@svanoort svanoort deleted the jenkins-50199-fix-completed-execution-onload-loophole branch August 30, 2018 21:55
@jglick
Copy link
Member

jglick commented Sep 4, 2018

when we need to be focused on velocity, refactoring tests should not be a high priority

I would rather argue that improving how/where tests are run is exactly what we need to do in order to increase velocity. Duplicating code is introducing technical debt which will be a drag on further work.

CpsFlowExecution cpsExec = (CpsFlowExecution)(run.getExecution());
InputStepExecution exec = getInputStepExecution(run, "pause");
exec.doProceedEmpty();
j.waitForCompletion(run);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants