[JENKINS-62545] Attempt to detect cycles when iterating through siblings in FlowGraphTable #108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See JENKINS-62545.
I saw a report recently of a thread stuck inside of the
while
loop inFlowGraphTable.Row.addTreeSibling
. I am not sure how this could be happening, and I do not have any reproduction steps, but the only thing that comes to mind is an infinite loop caused by a cycle in the graph (probably not the actual flow graph, just in the representation inFlowGraphTable
).Here is a draft PR that tries to detect cycles so that we can understand what the real bug is. I have not tested it yet, and I am not sure if this code is covered by unit tests or not, so it needs to be verified manually.
CC @res0nance
EDIT: I got more info, so now I know the proximate cause of the issue, although I am still not sure about the root cause. It turns out that the actual flow graph itself was corrupted. Here is a simplified form of its structure, the
sh
step is the problematic node:The problem is that the
sh
step is the parent of the end of a parallel step, but somehow also the parent of additional steps.FlowGraphTable
doesn't expect the graph to have a structure like this so things break. The build was manually aborted while the parallel branches were executing, yet somehow branch 3 continued executing despite the end node for the parallel step being linked to one of the nodes in that branch. I don't have access to Jenkins system logs from when the build was executed to know if there were any warnings. The Pipeline in question looks relatively normal (there aren't nested parallels or anything like that).I am not sure how this could happen, and I wasn't able to reproduce the issue from scratch.
I think it's reasonable to make
FlowGraphTable
throw an exception for a corrupted graph like this to avoid infinite loops.