Skip to content

Commit

Permalink
Merge pull request #223 from svanoort/fix-more-durability-issues-2
Browse files Browse the repository at this point in the history
Fix more subtle durability issues
  • Loading branch information
svanoort committed May 2, 2018
2 parents 5af8a3e + d824047 commit ca4f539
Show file tree
Hide file tree
Showing 9 changed files with 706 additions and 131 deletions.
40 changes: 40 additions & 0 deletions doc/persistence.md
@@ -0,0 +1,40 @@
# The Pipeline Persistence Model

# Data Model
Running pipelines persist in 3 pieces:

1. The `FlowNode`s - stored by a `FlowNodeStorage` - this holds the FlowNodes created to map to `Step`s, and for block scoped Steps, start and end of blocks
2. The `CpsFlowExecution` - this is currently stored in the WorkflowRun, and the primary pieces of interest are:
* heads - the current "tips" of the Flow Graph, i.e. the FlowNodes that represent running steps and are appended to
- A head maps to a `CpsThread` in the Pipeline program, within the `CpsThreadGroup`
* starts - the `BlockStartNode`s marking the start(s) of the currently executing blocks
* scripts - the loaded Pipeline script files (text)
* persistedClean
- If true, Pipeline saved its execution cleanly to disk and we *might* be able to resume it
- If false, something went wrong saving the execution, so we cannot resume even if we'd otherwise be able to
- If null, probably the build dates back to before this field was added - we check to see if this is running in a highly persistent DurabilityMode (Max_survivability generally)
* done - if true, this execution completed, if false or un-set, the pipeline is a candidate to resume unless its only head is a FlowEndNode
- The handling of false is for legacy reasons, since it was only recently made persistent.
*
* various other boolean flags & settings for the execution (durability setting, user that started the build, is it sandboxed, etc)
3. The Program -- this is the current execution state of the Pipeline
* This holds the Groovy state - the `CpsThreadGroup` - with runtime calls transformed by CPS so they can persist
* The `CpsThread`s map to the running branches of the Pipeline
* The program depends on the FlowNodes from the FlowNodeStorage, since it reads them by ID rather than storing them in the program
* This also depends on the heads in the CpsFlowExecution, because its FlowHeads are loaded from the heads of the CpsFlowExecution
* Also holds the CpsStepContext, i.e. the variables such as EnvVars, Executor and Workspace uses (the latter stored as Pickles)
- The pickles will be specially restored when the Pipeline resumes since they don't serialize/deserialize normally

## Persistence Issues And Logic

Some basic rules:

1. If the FlowNodeStorage is corrupt, incomplete, or un-persisted, all manner of heck will break loose
- In terms of Pipeline execution, the impact is like the Resonance Cascade from the Half-Life games
- The pipeline can never be resumed (the key piece is missing)
- Usually we fake up some placeholder FlowNodes to cover this situation and save them
2. Whenever persisting data, the Pipeline *must* have the FlowNodes persisted on disk (via `storage.flush()` generally)
in order to be able to load the heads and restore the program.
3. Once we've set persistedClean as false and saved the FlowExecution, then it doesn't matter what we do -- the Pipeline will assume
it already has incomplete persistence data (as with 1) when trying to resume. This is how we handle the low-durability modes, to
avoid resuming a stale state of the Pipeline simply because we have old data persisted and are not updating it.
2 changes: 1 addition & 1 deletion pom.xml
Expand Up @@ -141,7 +141,7 @@
<dependency>
<groupId>org.jenkins-ci.plugins.workflow</groupId>
<artifactId>workflow-job</artifactId>
<version>2.20</version>
<version>2.21</version>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
Expand Up @@ -120,7 +120,6 @@ class CpsBodyExecution extends BodyExecution {
}

head.setNewHead(sn);
CpsFlowExecution.maybeAutoPersistNode(sn);

StepContext sc = new CpsBodySubContext(context, sn);
for (BodyExecutionCallback c : callbacks) {
Expand Down Expand Up @@ -337,7 +336,6 @@ public Next receive(Object o) {
FlowHead h = CpsThread.current().head;
StepStartNode ssn = addBodyStartFlowNode(h);
h.setNewHead(ssn);
CpsFlowExecution.maybeAutoPersistNode(ssn);
}

StepEndNode en = addBodyEndFlowNode();
Expand Down Expand Up @@ -367,7 +365,6 @@ public Next receive(Object o) {
for (BodyExecutionCallback c : callbacks) {
c.onSuccess(sc, o);
}

return Next.terminate(null);
}

Expand Down

0 comments on commit ca4f539

Please sign in to comment.