[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

svanoort · 2018-08-28T22:42:42Z

Should finally put the last variant of this problem to rest. Root cause: execution completed (and this is saved), but something prevented saving the build itself with completed status. Unfortunately we didn't do anything to detect this case and mark the build itself as completed if its execution was.

Solving the save itself is a separate problem - since that can potentially be a result of I/O failures or abrupt shutdowns, we have to handle the possibility anyway.

Also adds in the suite of persistence problem tests from the CPS plugin, since both the CPS and workflow plugins need to have persistence logic correct internally (and we need to verify this since either can break it).

…pleted

…andling by both the execution and the run

jglick · 2018-08-29T14:26:44Z

adds in the suite of persistence problem tests from the CPS plugin

You are copying all these tests? This seems like the start of a lot of technical debt.

I understand that you want to catch regressions arising from various sources, as long noted in JENKINS-45047, but this is not a good resolution. Work on fixing our CI setup rather than duplicating hundreds of lines of test code just to make sure it gets run more often.

jglick · 2018-08-29T14:18:19Z

pom.xml

@@ -132,6 +132,12 @@
            <version>2.11</version>
            <scope>test</scope>
        </dependency>
+        <dependency>
+            <groupId>org.jenkins-ci.plugins</groupId>
+            <artifactId>pipeline-input-step</artifactId>


Use SemaphoreStep instead.

jglick · 2018-08-29T14:18:56Z

src/test/java/org/jenkinsci/plugins/workflow/job/CpsPersistenceTest.java

+        FlowExecution exec = listener.get();
+        while(exec.getCurrentHeads().isEmpty() || (exec.getCurrentHeads().get(0) instanceof FlowStartNode)) {  // Wait until input step starts
+            System.out.println("Waiting for input step to begin");
+            Thread.sleep(50);


More easily and reliably done with SemaphoreStep.

jglick · 2018-08-29T14:22:30Z

src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java

-                        // Defer the normal listener to ensure onLoad can complete before finish() is called since that may
-                        // need the build to be loaded and can result in loading loops otherwise.
-                        fetchedExecution.removeListener(finishListener);
-                        fetchedExecution.addListener(new GraphL());


Note to self: minor merge conflict with #27.

jglick · 2018-08-29T14:30:41Z

Code change looks OK I suppose. As far as the tests go: if there is a particular test in workflow-cps-plugin which reproduces the stated problem if you were to update the workflow-job test dep, then simply revert all but the src/main/java/ portion of this PR, wait for Incrementals deployment, and then bump the test dep in workflow-cps-plugin.

(And if there is no such test yet, then why are you copying all these tests here? Write a new test, here, to reproduce the problem, if you can.)

dwnusbaum · 2018-08-29T14:34:18Z

src/test/java/org/jenkinsci/plugins/workflow/job/CpsPersistenceTest.java

+        story.then( j->{
+            WorkflowJob r = j.jenkins.getItemByFullName(DEFAULT_JOBNAME, WorkflowJob.class);
+            WorkflowRun run = r.getBuildByNumber(build[0]);
+            assertCompletedCleanly(run);


Maybe add Assert.assertEquals(Result.SUCCESS, run.getResult()); to show that copying the result from the FlowEndNode works correctly?

(addressed by 35b435d)

svanoort · 2018-08-29T17:00:06Z

src/test/java/org/jenkinsci/plugins/workflow/job/CpsPersistenceTest.java

+    @Test
+    @Issue("JENKINS-50199")
+    /** Replicates case where builds resume when the should not due to build's completion not being saved. */
+    public void completedExecutionButRunIncomplete() throws Exception {


@jglick @dwnusbaum This is the net-new testcase.

svanoort · 2018-08-29T17:13:16Z

@jglick We have long needed to have the tests in both places to catch regressions introduced in both Workflow-Job and Workflow-CPS persistence calls. In the past we did indeed use a SNAPSHOT/incremental build of workflow-job and then did a bump to workflow-CPS -- but that's a slow and inefficient process where it's easy to slip in regressions by failing to run something against the latest build.

Adapting existing tests from CPS (plus a new one) isn't perfect solution but when we need to be focused on velocity, refactoring tests should not be a high priority -- and we really do need that coverage.

dwnusbaum

Fix looks fine to me based on the investigation with Sam yesterday. I don't have a strong opinion either way on the tests. It's nice to have them run automatically here, but it also means the two copies of the tests can get out of sync with each other. Improving the testing infrastructure would be great, but it doesn't seem trivial.

kshultzCB

New test case is good. Thanks for walking me through it. 👍

jglick · 2018-09-04T21:42:48Z

when we need to be focused on velocity, refactoring tests should not be a high priority

I would rather argue that improving how/where tests are run is exactly what we need to do in order to increase velocity. Duplicating code is introducing technical debt which will be a drag on further work.

jglick · 2023-05-30T17:10:33Z

src/test/java/org/jenkinsci/plugins/workflow/job/CpsPersistenceTest.java

+            CpsFlowExecution cpsExec = (CpsFlowExecution)(run.getExecution());
+            InputStepExecution exec = getInputStepExecution(run, "pause");
+            exec.doProceedEmpty();
+            j.waitForCompletion(run);


see jenkinsci/jenkins-test-harness#596

svanoort added 4 commits August 28, 2018 18:06

Fix for issues with builds that resume even when the execution is com…

a0af64f

…pleted

Add tests for a bunch of persistence edgecases that require correct h…

2060308

…andling by both the execution and the run

Propagate the final build result from the execution if present

bb1c683

Log cases where we correct build completion status

ff2d27c

svanoort requested review from jglick, dwnusbaum, kshultzCB and rsandell August 28, 2018 22:42

jglick reviewed Aug 29, 2018

View reviewed changes

dwnusbaum reviewed Aug 29, 2018

View reviewed changes

svanoort commented Aug 29, 2018

View reviewed changes

Address review feedback

35b435d

svanoort requested review from jglick and dwnusbaum August 29, 2018 17:13

dwnusbaum approved these changes Aug 29, 2018

View reviewed changes

kshultzCB approved these changes Aug 30, 2018

View reviewed changes

svanoort merged commit 3510070 into jenkinsci:master Aug 30, 2018

svanoort deleted the jenkins-50199-fix-completed-execution-onload-loophole branch August 30, 2018 21:55

dwnusbaum mentioned this pull request Aug 31, 2018

[JENKINS-53358] Remove bogus persistence calls due to notifyListeners jenkinsci/workflow-cps-plugin#234

Draft

jglick mentioned this pull request Sep 4, 2018

[JEP-210] Log handling rewrite #27

Merged

11 tasks

jglick mentioned this pull request Nov 26, 2018

Pull request comments not full width in Files tab mdo/github-wide#54

Closed

dwnusbaum mentioned this pull request Mar 30, 2020

[JENKINS-55287] Improve shutdown-related logging jenkinsci/workflow-cps-plugin#354

Merged

jglick reviewed May 30, 2023

View reviewed changes

jglick mentioned this pull request May 30, 2023

Removing parts of CpsPersistenceTest which duplicate content in workflow-cps #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

svanoort commented Aug 28, 2018 •

edited

Loading

jglick commented Aug 29, 2018

jglick Aug 29, 2018

jglick Aug 29, 2018

jglick Aug 29, 2018

jglick commented Aug 29, 2018

dwnusbaum Aug 29, 2018

dwnusbaum Aug 29, 2018

svanoort Aug 29, 2018

svanoort commented Aug 29, 2018

dwnusbaum left a comment

kshultzCB left a comment

jglick commented Sep 4, 2018

jglick May 30, 2023

[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104

Conversation

svanoort commented Aug 28, 2018 • edited Loading

jglick commented Aug 29, 2018

jglick Aug 29, 2018

Choose a reason for hiding this comment

jglick Aug 29, 2018

Choose a reason for hiding this comment

jglick Aug 29, 2018

Choose a reason for hiding this comment

jglick commented Aug 29, 2018

dwnusbaum Aug 29, 2018

Choose a reason for hiding this comment

dwnusbaum Aug 29, 2018

Choose a reason for hiding this comment

svanoort Aug 29, 2018

Choose a reason for hiding this comment

svanoort commented Aug 29, 2018

dwnusbaum left a comment

Choose a reason for hiding this comment

kshultzCB left a comment

Choose a reason for hiding this comment

jglick commented Sep 4, 2018

jglick May 30, 2023

Choose a reason for hiding this comment

svanoort commented Aug 28, 2018 •

edited

Loading