-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104
[JENKINS-50199] Fix cases where builds would "resume" even though the execution is complete #104
Conversation
…andling by both the execution and the run
You are copying all these tests? This seems like the start of a lot of technical debt. I understand that you want to catch regressions arising from various sources, as long noted in JENKINS-45047, but this is not a good resolution. Work on fixing our CI setup rather than duplicating hundreds of lines of test code just to make sure it gets run more often. |
@@ -132,6 +132,12 @@ | |||
<version>2.11</version> | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.jenkins-ci.plugins</groupId> | |||
<artifactId>pipeline-input-step</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SemaphoreStep
instead.
FlowExecution exec = listener.get(); | ||
while(exec.getCurrentHeads().isEmpty() || (exec.getCurrentHeads().get(0) instanceof FlowStartNode)) { // Wait until input step starts | ||
System.out.println("Waiting for input step to begin"); | ||
Thread.sleep(50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More easily and reliably done with SemaphoreStep
.
// Defer the normal listener to ensure onLoad can complete before finish() is called since that may | ||
// need the build to be loaded and can result in loading loops otherwise. | ||
fetchedExecution.removeListener(finishListener); | ||
fetchedExecution.addListener(new GraphL()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: minor merge conflict with #27.
Code change looks OK I suppose. As far as the tests go: if there is a particular test in (And if there is no such test yet, then why are you copying all these tests here? Write a new test, here, to reproduce the problem, if you can.) |
story.then( j->{ | ||
WorkflowJob r = j.jenkins.getItemByFullName(DEFAULT_JOBNAME, WorkflowJob.class); | ||
WorkflowRun run = r.getBuildByNumber(build[0]); | ||
assertCompletedCleanly(run); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add Assert.assertEquals(Result.SUCCESS, run.getResult());
to show that copying the result from the FlowEndNode works correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(addressed by 35b435d)
@Test | ||
@Issue("JENKINS-50199") | ||
/** Replicates case where builds resume when the should not due to build's completion not being saved. */ | ||
public void completedExecutionButRunIncomplete() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jglick @dwnusbaum This is the net-new testcase.
@jglick We have long needed to have the tests in both places to catch regressions introduced in both Workflow-Job and Workflow-CPS persistence calls. In the past we did indeed use a SNAPSHOT/incremental build of workflow-job and then did a bump to workflow-CPS -- but that's a slow and inefficient process where it's easy to slip in regressions by failing to run something against the latest build. Adapting existing tests from CPS (plus a new one) isn't perfect solution but when we need to be focused on velocity, refactoring tests should not be a high priority -- and we really do need that coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix looks fine to me based on the investigation with Sam yesterday. I don't have a strong opinion either way on the tests. It's nice to have them run automatically here, but it also means the two copies of the tests can get out of sync with each other. Improving the testing infrastructure would be great, but it doesn't seem trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test case is good. Thanks for walking me through it. 👍
I would rather argue that improving how/where tests are run is exactly what we need to do in order to increase velocity. Duplicating code is introducing technical debt which will be a drag on further work. |
CpsFlowExecution cpsExec = (CpsFlowExecution)(run.getExecution()); | ||
InputStepExecution exec = getInputStepExecution(run, "pause"); | ||
exec.doProceedEmpty(); | ||
j.waitForCompletion(run); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JENKINS-50199.
Should finally put the last variant of this problem to rest. Root cause: execution completed (and this is saved), but something prevented saving the build itself with completed status. Unfortunately we didn't do anything to detect this case and mark the build itself as completed if its execution was.
Solving the save itself is a separate problem - since that can potentially be a result of I/O failures or abrupt shutdowns, we have to handle the possibility anyway.
Also adds in the suite of persistence problem tests from the CPS plugin, since both the CPS and workflow plugins need to have persistence logic correct internally (and we need to verify this since either can break it).