New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix more subtle durability issues #223
Changes from 28 commits
2338f88
ee259e9
7c4faa9
cc61b6f
40dd2c4
e173bca
6069196
dde404b
7b9ebe0
e0f0025
646b1d2
06e56f2
8d1b7f5
3cbc60c
3fa7b8e
6ffcf33
a1083fc
2d3a219
12d6c8e
85e99f1
2ec83cc
d4eb956
9c9e958
a87587a
adc109f
fc93844
c7f44a7
7567a0a
297d820
b51d1ba
2516111
474f325
46cd988
5e16362
d824047
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# The Pipeline Persistence Model | ||
|
||
# Data Model | ||
Running pipelines persist in 3 pieces: | ||
|
||
1. The `FlowNode`s - stored by a `FlowNodeStorage` - this holds the FlowNodes created to map to `Step`s, and for block scoped Steps, start and end of blocks | ||
2. The `CpsFlowExecution` - this is currently stored in the WorkflowRun, and the primary pieces of interest are: | ||
* heads - the current "tips" of the Flow Graph, i.e. the FlowNodes that represent running steps and are appended to | ||
- A head maps to a `CpsThread` in the Pipeline program, within the `CpsThreadGroup` | ||
* starts - the `BlockStartNode`s marking the start(s) of the currently executing blocks | ||
* scripts - the loaded Pipeline script files (text) | ||
* persistedClean | ||
- If true, Pipeline saved its execution cleanly to disk and we *might* be able to resume it | ||
- If false, something went wrong saving the execution, so we cannot resume even if we'd otherwise be able to | ||
- If null, probably the build dates back to before this field was added - we check to see if this is running in a highly persistent DurabilityMode (Max_survivability generally) | ||
* done - if true, this execution completed, if false or un-set, the pipeline is a candidate to resume unless its only head is a FlowEndNode | ||
- The handling of false is for legacy reasons, since it was only recently made persistent. | ||
* | ||
* various other boolean flags & settings for the execution (durability setting, user that started the build, is it sandboxed, etc) | ||
3. The Program -- this is the current execution state of the Pipeline | ||
* This holds the Groovy state - the `CpsThreadGroup` - with runtime calls transformed by CPS so they can persist | ||
* The `CpsThread`s map to the running branches of the Pipeline | ||
* The program depends on the FlowNodes from the FlowNodeStorage, since it reads them by ID rather than storing them in the program | ||
* This also depends on the heads in the CpsFlowExecution, because its FlowHeads are loaded from the heads of the CpsFlowExecution | ||
* Also holds the CpsStepContext, i.e. the variables such as EnvVars, Executor and Workspace uses (the latter stored as Pickles) | ||
- The pickles will be specially restored when the Pipeline resumes since they don't serialize/deserialize normally | ||
|
||
## Persistence Issues And Logic | ||
|
||
Some basic rules: | ||
|
||
1. If the FlowNodeStorage is corrupt, incomplete, or un-persisted, all manner of heck will break loose | ||
- In terms of Pipeline execution, the impact is like the Resonance Cascade from the Half-Life games | ||
- The pipeline can never be resumed (the key piece is missing) | ||
- Usually we fake up some placeholder FlowNodes to cover this situation and save them | ||
2. Whenever persisting data, the Pipeline *must* have the FlowNodes persisted on disk (via `storage.flush()` generally) | ||
in order to be able to load the heads and restore the program. | ||
3. Once we've set persistedClean as false and saved the FlowExecution, then it doesn't matter what we do -- the Pipeline will assume | ||
it already has incomplete persistence data (as with 1) when trying to resume. This is how we handle the low-durability modes, to | ||
avoid resuming a stale state of the Pipeline simply because we have old data persisted and are not updating it. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,7 +141,8 @@ | |
<dependency> | ||
<groupId>org.jenkins-ci.plugins.workflow</groupId> | ||
<artifactId>workflow-job</artifactId> | ||
<version>2.20</version> | ||
<!-- Snapshot for better handling of onLoad errors --> | ||
<version>2.21-20180427.223321-5</version> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jglick Since you only added that a few weeks ago and thus it's not "obvious" maybe you meant something besides 'pro tip'? |
||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,7 +120,6 @@ class CpsBodyExecution extends BodyExecution { | |
} | ||
|
||
head.setNewHead(sn); | ||
CpsFlowExecution.maybeAutoPersistNode(sn); | ||
|
||
StepContext sc = new CpsBodySubContext(context, sn); | ||
for (BodyExecutionCallback c : callbacks) { | ||
|
@@ -337,7 +336,6 @@ public Next receive(Object o) { | |
FlowHead h = CpsThread.current().head; | ||
StepStartNode ssn = addBodyStartFlowNode(h); | ||
h.setNewHead(ssn); | ||
CpsFlowExecution.maybeAutoPersistNode(ssn); | ||
} | ||
|
||
StepEndNode en = addBodyEndFlowNode(); | ||
|
@@ -367,7 +365,6 @@ public Next receive(Object o) { | |
for (BodyExecutionCallback c : callbacks) { | ||
c.onSuccess(sc, o); | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could be reverted |
||
return Next.terminate(null); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation, WUT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know @jglick, pure absurdity, what can I say?