Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix more subtle durability issues #223

Merged
merged 35 commits into from May 2, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
2338f88
Ensure FlowHead always has persisted FlowNodes before mutating it, pe…
svanoort Apr 17, 2018
ee259e9
Extended fuzzing test coverage
svanoort Apr 18, 2018
7c4faa9
Code comments to explain
svanoort Apr 18, 2018
cc61b6f
Save notes on persistence
svanoort Apr 20, 2018
40dd2c4
Force persistence of the Pipeline build when the heads are mutated, p…
svanoort Apr 20, 2018
e173bca
Refactor and harden the fuzz tests
svanoort Apr 20, 2018
6069196
Merge branch 'master' into fix-more-durability-issues-2
svanoort Apr 20, 2018
dde404b
Ensure we always initialize heads for FlowNodes and handle additional…
svanoort Apr 20, 2018
7b9ebe0
Add logging for some odd circumstances with deserializing and intiali…
svanoort Apr 20, 2018
e0f0025
Fix one cause of builds showing up as incomplete when can't be loaded
svanoort Apr 20, 2018
646b1d2
Add a bunch of persistence test cases that will fail because our pers…
svanoort Apr 24, 2018
06e56f2
Additional persistence problems testcases
svanoort Apr 24, 2018
8d1b7f5
Refine testcases a bit more
svanoort Apr 24, 2018
3cbc60c
Obligatory save when ending the execution, error out when stored flow…
svanoort Apr 25, 2018
3fa7b8e
Simply error out loading an execution with unpersisted flownodes and …
svanoort Apr 25, 2018
6ffcf33
Pull in wf-job fix for not setting build result
svanoort Apr 25, 2018
a1083fc
Fix test wait-for-input step and remove test that doesn't do what it …
svanoort Apr 25, 2018
2d3a219
Fix test for bogus done status: set done flag on execution if FlowEnd…
svanoort Apr 25, 2018
12d6c8e
Simplify the flownodeStorage initialization and note that we need to …
svanoort Apr 25, 2018
85e99f1
Mostly-working way to build placeholder flownodes for inconsistently …
svanoort Apr 27, 2018
2ec83cc
Pull in wf-job fixes to notifications and ensure done is set if we ad…
svanoort Apr 27, 2018
d4eb956
Fix result handling for build
svanoort Apr 27, 2018
9c9e958
Comment out unusable test
svanoort Apr 27, 2018
a87587a
Handle fuzzers for durability killing run before it can be saved
svanoort Apr 27, 2018
adc109f
Try to save other FlowExecutions even if one fails
svanoort Apr 30, 2018
fc93844
Make findbugs hush
svanoort Apr 30, 2018
c7f44a7
Update docs
svanoort Apr 30, 2018
7567a0a
Use locking to avoid ConcurrentModificationException on FlowGraph whi…
svanoort May 1, 2018
297d820
Fix typo
svanoort May 1, 2018
b51d1ba
Cleanup
svanoort May 1, 2018
2516111
Review changes
svanoort May 1, 2018
474f325
Update SNAPSHOT
svanoort May 1, 2018
46cd988
Fix findbugs kvetching about synchronization
svanoort May 2, 2018
5e16362
Reduce logging verbosity a bit
svanoort May 2, 2018
d824047
Pick up released workflow-job version
svanoort May 2, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion pom.xml
Expand Up @@ -142,7 +142,7 @@
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-job</artifactId>
<!-- Snapshot for better handling of onLoad errors -->
<version>2.21-20180427.223321-5</version>
<version>2.21-20180501.222458-6</version>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
Expand Up @@ -620,10 +620,11 @@ private synchronized String getHeadsAsString() {
* Bypasses {@link #croak(Throwable)} and {@link #onProgramEnd(Outcome)} to guarantee a clean path.
*/
@GuardedBy("this")
void createPlaceholderNodes(Throwable failureReason) throws Exception {
synchronized void createPlaceholderNodes(Throwable failureReason) throws Exception {
this.done = true;

if (this.owner != null) {
// Ensure that the Run is marked as completed (failed) if it isn't already so it won't show as running
Queue.Executable ex = owner.getExecutable();
if (ex instanceof Run) {
Result res = ((Run)ex).getResult();
Expand Down Expand Up @@ -729,10 +730,10 @@ public void onLoad(FlowExecutionOwner owner) throws IOException {
try {
if (isComplete()) {
if (done == Boolean.TRUE && !super.isComplete()) {
LOGGER.log(Level.WARNING, "Completed flow without FlowEndNode: "+this+" heads:"+getHeadsAsString());
LOGGER.log(Level.INFO, "Completed flow without FlowEndNode: "+this+" heads:"+getHeadsAsString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly sounds like it should be a warning to me, but maybe I am missing something?

}
if (super.isComplete() && done != Boolean.TRUE) {
LOGGER.log(Level.WARNING, "Flow has FlowEndNode, but is not marked as done, fixing this for"+this);
LOGGER.log(Level.FINE, "Flow has FlowEndNode, but is not marked as done, fixing this for"+this);
done = true;
saveOwner();
}
Expand All @@ -746,7 +747,7 @@ public void onLoad(FlowExecutionOwner owner) throws IOException {
throw new IOException("Cannot resume build -- was not cleanly saved when Jenkins shut down.");
}
}
} catch (Exception e) { // Multicatch ensures that failure to load does not nuke the master
} catch (Exception e) { // Broad catch ensures that failure to load do NOT nuke the master
SettableFuture<CpsThreadGroup> p = SettableFuture.create();
programPromise = p;
loadProgramFailed(e, p);
Expand Down Expand Up @@ -930,7 +931,7 @@ public void onFailure(Throwable t) {
try {
g = programPromise.get();
} catch (Exception x) {
// FIXME check this won't cause issues due to depickling delays etc
// TODO Check this won't cause issues due to depickling delays etc
LOGGER.log(Level.FINE, "Not blocking restart due to exception in ProgramPromise: "+this, x);
return false;
}
Expand Down Expand Up @@ -1228,7 +1229,6 @@ synchronized void onProgramEnd(Outcome outcome) {
// shrink everything into a single new head
try {
if (heads != null) {
// Below does not look correct to me
FlowHead first = getFirstHead();
first.setNewHead(head);
done = true; // After setting the final head
Expand Down
Expand Up @@ -578,7 +578,7 @@ private void invokeBody(CpsThread cur) {
// we want to do this first before starting body so that the order of heads preserve
// natural ordering.

// FIXME give this javadocs worth a darn, because this is how we create parallel branches and the docs are crypic.
// TODO give this javadocs worth a darn, because this is how we create parallel branches and the docs are cryptic as can be!
// Also we need to double-check this logic because this might cause a failure of persistence
FlowHead[] heads = new FlowHead[context.bodyInvokers.size()];
for (int i = 0; i < heads.length; i++) {
Expand Down
Expand Up @@ -794,11 +794,7 @@ public void evaluate() throws Throwable {

private static void assertBuildNotHung(@Nonnull RestartableJenkinsRule story, @Nonnull WorkflowRun run, int timeOutMillis) throws Exception {
if (run.isBuilding()) {
try {
story.j.waitUntilNoActivityUpTo(timeOutMillis);
} catch (AssertionError ase) { // Allows attaching a debugger here
throw new AssertionError("Build hung: " + run, ase);
}
story.j.waitUntilNoActivityUpTo(timeOutMillis);
}
}

Expand Down Expand Up @@ -877,7 +873,7 @@ public void evaluate() throws Throwable {
@Override
public void evaluate() throws Throwable {
WorkflowRun run = story.j.jenkins.getItemByFullName(jobName, WorkflowJob.class).getLastBuild();
if (run == null) {
if (run == null) { // Build killed so early the Run did not get to persist
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happened in testing -- if the fuzzer kills things too early the Run may not even have had time to be created yet (rare but happens).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably deserves a comment to that effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(resolved)

}
if (run.getExecution() != null) {
Expand Down Expand Up @@ -925,7 +921,7 @@ public void evaluate() throws Throwable {
@Override
public void evaluate() throws Throwable {
WorkflowRun run = story.j.jenkins.getItemByFullName(jobName, WorkflowJob.class).getLastBuild();
if (run == null) {
if (run == null) { // Build killed so early the Run did not get to persist
return;
}
if (run.getExecution() != null) {
Expand Down Expand Up @@ -983,7 +979,7 @@ public void evaluate() throws Throwable {
@Override
public void evaluate() throws Throwable {
WorkflowRun run = story.j.jenkins.getItemByFullName(jobName, WorkflowJob.class).getLastBuild();
if (run == null) {
if (run == null) { // Build killed so early the Run did not get to persist
return;
}
if (run.getExecution() != null) {
Expand Down