New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix more subtle durability issues #223
Merged
svanoort
merged 35 commits into
jenkinsci:master
from
svanoort:fix-more-durability-issues-2
May 2, 2018
Merged
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
2338f88
Ensure FlowHead always has persisted FlowNodes before mutating it, pe…
svanoort ee259e9
Extended fuzzing test coverage
svanoort 7c4faa9
Code comments to explain
svanoort cc61b6f
Save notes on persistence
svanoort 40dd2c4
Force persistence of the Pipeline build when the heads are mutated, p…
svanoort e173bca
Refactor and harden the fuzz tests
svanoort 6069196
Merge branch 'master' into fix-more-durability-issues-2
svanoort dde404b
Ensure we always initialize heads for FlowNodes and handle additional…
svanoort 7b9ebe0
Add logging for some odd circumstances with deserializing and intiali…
svanoort e0f0025
Fix one cause of builds showing up as incomplete when can't be loaded
svanoort 646b1d2
Add a bunch of persistence test cases that will fail because our pers…
svanoort 06e56f2
Additional persistence problems testcases
svanoort 8d1b7f5
Refine testcases a bit more
svanoort 3cbc60c
Obligatory save when ending the execution, error out when stored flow…
svanoort 3fa7b8e
Simply error out loading an execution with unpersisted flownodes and …
svanoort 6ffcf33
Pull in wf-job fix for not setting build result
svanoort a1083fc
Fix test wait-for-input step and remove test that doesn't do what it …
svanoort 2d3a219
Fix test for bogus done status: set done flag on execution if FlowEnd…
svanoort 12d6c8e
Simplify the flownodeStorage initialization and note that we need to …
svanoort 85e99f1
Mostly-working way to build placeholder flownodes for inconsistently …
svanoort 2ec83cc
Pull in wf-job fixes to notifications and ensure done is set if we ad…
svanoort d4eb956
Fix result handling for build
svanoort 9c9e958
Comment out unusable test
svanoort a87587a
Handle fuzzers for durability killing run before it can be saved
svanoort adc109f
Try to save other FlowExecutions even if one fails
svanoort fc93844
Make findbugs hush
svanoort c7f44a7
Update docs
svanoort 7567a0a
Use locking to avoid ConcurrentModificationException on FlowGraph whi…
svanoort 297d820
Fix typo
svanoort b51d1ba
Cleanup
svanoort 2516111
Review changes
svanoort 474f325
Update SNAPSHOT
svanoort 46cd988
Fix findbugs kvetching about synchronization
svanoort 5e16362
Reduce logging verbosity a bit
svanoort d824047
Pick up released workflow-job version
svanoort File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# The Pipeline Persistence Model | ||
|
||
# Data Model | ||
Running pipelines persist in 3 pieces: | ||
|
||
1. The `FlowNode`s - stored by a `FlowNodeStorage` - this holds the FlowNodes created to map to `Step`s, and for block scoped Steps, start and end of blocks | ||
2. The `CpsFlowExecution` - this is currently stored in the WorkflowRun, and the primary pieces of interest are: | ||
* heads - the current "tips" of the Flow Graph, i.e. the FlowNodes that represent running steps and are appended to | ||
- A head maps to a `CpsThread` in the Pipeline program, within the `CpsThreadGroup` | ||
* starts - the `BlockStartNode`s marking the start(s) of the currently executing blocks | ||
* scripts - the loaded Pipeline script files (text) | ||
* persistedClean | ||
- If true, Pipeline saved its execution cleanly to disk and we *might* be able to resume it | ||
- If false, something went wrong saving the execution, so we cannot resume even if we'd otherwise be able to | ||
- If null, probably the build dates back to before this field was added - we check to see if this is running in a highly persistent DurabilityMode (Max_survivability generally) | ||
* done - if true, this execution completed, if false or un-set, the pipeline is a candidate to resume unless its only head is a FlowEndNode | ||
- The handling of false is for legacy reasons, since it was only recently made persistent. | ||
* | ||
* various other boolean flags & settings for the execution (durability setting, user that started the build, is it sandboxed, etc) | ||
3. The Program -- this is the current execution state of the Pipeline | ||
* This holds the Groovy state - the `CpsThreadGroup` - with runtime calls transformed by CPS so they can persist | ||
* The `CpsThread`s map to the running branches of the Pipeline | ||
* The program depends on the FlowNodes from the FlowNodeStorage, since it reads them by ID rather than storing them in the program | ||
* This also depends on the heads in the CpsFlowExecution, because its FlowHeads are loaded from the heads of the CpsFlowExecution | ||
* Also holds the CpsStepContext, i.e. the variables such as EnvVars, Executor and Workspace uses (the latter stored as Pickles) | ||
- The pickles will be specially restored when the Pipeline resumes since they don't serialize/deserialize normally | ||
|
||
## Persistence Issues And Logic | ||
|
||
Some basic rules: | ||
|
||
1. If the FlowNodeStorage is corrupt, incomplete, or un-persisted, all manner of heck will break loose | ||
- In terms of Pipeline execution, the impact is like the Resonance Cascade from the Half-Life games | ||
- The pipeline can never be resumed (the key piece is missing) | ||
- Usually we fake up some placeholder FlowNodes to cover this situation and save them | ||
2. Whenever persisting data, the Pipeline *must* have the FlowNodes persisted on disk (via `storage.flush()` generally) | ||
in order to be able to load the heads and restore the program. | ||
3. Once we've set persistedClean as false and saved the FlowExecution, then it doesn't matter what we do -- the Pipeline will assume | ||
it already has incomplete persistence data (as with 1) when trying to resume. This is how we handle the low-durability modes, to | ||
avoid resuming a stale state of the Pipeline simply because we have old data persisted and are not updating it. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,7 +120,6 @@ class CpsBodyExecution extends BodyExecution { | |
} | ||
|
||
head.setNewHead(sn); | ||
CpsFlowExecution.maybeAutoPersistNode(sn); | ||
|
||
StepContext sc = new CpsBodySubContext(context, sn); | ||
for (BodyExecutionCallback c : callbacks) { | ||
|
@@ -337,7 +336,6 @@ public Next receive(Object o) { | |
FlowHead h = CpsThread.current().head; | ||
StepStartNode ssn = addBodyStartFlowNode(h); | ||
h.setNewHead(ssn); | ||
CpsFlowExecution.maybeAutoPersistNode(ssn); | ||
} | ||
|
||
StepEndNode en = addBodyEndFlowNode(); | ||
|
@@ -367,7 +365,6 @@ public Next receive(Object o) { | |
for (BodyExecutionCallback c : callbacks) { | ||
c.onSuccess(sc, o); | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could be reverted |
||
return Next.terminate(null); | ||
} | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation, WUT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know @jglick, pure absurdity, what can I say?