scheduler: always edge merge in one direction #4559

jedevc · 2024-01-16T13:18:00Z

Previously, the scheduler could create cycles in the edges it merged:

Essentially, what appears to happen is that we have two vertexes, A and B. Both of these vertexes seem to have at least two edges, A1/A2 and B1/B2. Somehow, we perform a merge that merges A1 into B1, but then merges B2 into A2.

Prior to #4347 this could (at worst) lead to a slight unnecessary inefficiency, where we would execute both A and B (even though both of their edges be merged so we'd never need to perform both of them). However, after this PR, we can accidentally deadlock, due to attempting to nest progress writers.

jedevc · 2024-01-16T17:31:21Z

Woops, initial pass wasn't right - this was still vulnerable to deadlocks if:

A0 merges to B0
A1 merges to C1
B2 merges to A2 (causing a cycle - and the previous merge would have set the "owner" to C, so we wouldn't find this)

The correct answer is to track every owner we merged into.

solver/jobs.go

tonistiigi · 2024-01-16T21:21:26Z

Essentially, what appears to happen is that we have two vertexes, A and B. Both of these vertexes seem to have at least two edges, A1/A2 and B1/B2. Somehow, we perform a merge that merges A1 into B1, but then merges B2 into A2.

Some stupid questions: Are these edges for different indexes? Then why would it matter that E1=A1->B1 and E2=B2->A2. As the edges are different there isn't a loop between edges but vertexes, and vertexes are not being merged.

jedevc · 2024-01-16T21:36:04Z

Essentially, what appears to happen is that we have two vertexes, A and B. Both of these vertexes seem to have at least two edges, A1/A2 and B1/B2. Somehow, we perform a merge that merges A1 into B1, but then merges B2 into A2.

Some stupid questions: Are these edges for different indexes? Then why would it matter that E1=A1->B1 and E2=B2->A2. As the edges are different there isn't a loop between edges but vertexes, and vertexes are not being merged.

Yeah sorry 😁

The edges are unique, so are for different indexes.

The vertices aren't merged, true - they're still entirely separate. The issue is that:

With the progress writer change, there's now a dependency between states (which map to vertices). That dependency is directed, and so cycles in the vertices means cycles in the progress writer - and cycles in the progress multi writer means deadlocks.
If we have the cycle in the graph and just don't allow them in the progress writer we get suboptimal behavior (the "easy" fix) - we have to execute both of the vertices (A and B) to compute both the edges, while if we could eliminate the cycles (so that both edges went from A to B or vice versa), we'd only need to execute the "owner" (which is super useful if it's an expensive operation, that's run lots of times).

tonistiigi · 2024-01-16T22:07:53Z

solver/jobs.go

@@ -178,6 +180,7 @@ func (s *state) setEdge(index Index, targetEdge *edge, targetState *state) {
 			targetState.mpw.Add(s.mpw)
 			targetState.allPw[s.mpw] = struct{}{}
 		}
+		targetState.owners = append(targetState.owners, s.vtx)


Do we need this extra property? Can't we just cycle the .owner chains of all sibling edges?

Yup, looks like that works as well 🎉

tonistiigi · 2024-01-16T22:10:20Z

solver/jobs.go

@@ -280,6 +283,39 @@ func NewSolver(opts SolverOpt) *Solver {
 	return jl
 }

+func (jl *Solver) findMergeDirection(dest *edge, src *edge) (*edge, *edge) {


Would it be clearer if instead we used langauge "find if any sibling edge has already owned by the vertex of the targetedge"

Have refactored this into a hasOwner helper, then the caller determines whether to switch.

tonistiigi · 2024-01-16T22:10:54Z

solver/scheduler.go

@@ -351,6 +352,7 @@ func (s *scheduler) mergeTo(target, src *edge) bool {
 type edgeFactory interface {
 	getEdge(Edge) *edge
 	setEdge(Edge, *edge)
+	findMergeDirection(*edge, *edge) (*edge, *edge)


Can't this logic be inside the setEdge instead of exposed in this interface.

Sadly not (this would be nice though).

We need to make sure we call s.ef.setEdge in the right direction, but we also need to make sure we call s.mergeTo in the right direction as well.

Signed-off-by: Justin Chadwell <me@jedevc.com>

Previously, if this case occured, we would deadlock, so as to make this case more easily identifiable and debuggable, we should panic instead. Potentially, we could also just *skip* this case, but given that this is an internal logical error that should not occur, it would be better to identify this. Signed-off-by: Justin Chadwell <me@jedevc.com>

When we perform a vertex merge, we should explicitly track the vertex that it was merged into. This way, we can avoid the case where we merge an index 0 edge from A->B and, later an index 1 edge from B->A. With this patch, this scenario instead flips the direction of the merge to merge from A->B for index 1. Signed-off-by: Justin Chadwell <me@jedevc.com>

Signed-off-by: Justin Chadwell <me@jedevc.com>

jedevc requested review from tonistiigi and sipsma January 16, 2024 13:47

jedevc marked this pull request as ready for review January 16, 2024 13:47

jedevc mentioned this pull request Jan 16, 2024

chore: update buildkit upstream to master dagger/dagger#6428

Merged

jedevc force-pushed the fix-edge-merge-progress-deadlock branch 2 times, most recently from 41fd3e9 to 9290b84 Compare January 16, 2024 17:28

jedevc commented Jan 16, 2024

View reviewed changes

solver/jobs.go Outdated Show resolved Hide resolved

jedevc marked this pull request as draft January 16, 2024 18:05

tonistiigi reviewed Jan 16, 2024

View reviewed changes

jedevc added 5 commits January 17, 2024 14:12

chore: fix minor typos in scheduler_test.go

0740994

Signed-off-by: Justin Chadwell <me@jedevc.com>

chore: update merging log message to include index

d3b1de8

Signed-off-by: Justin Chadwell <me@jedevc.com>

test: add a test for cyclic merges

b7a0282

Signed-off-by: Justin Chadwell <me@jedevc.com>

jedevc force-pushed the fix-edge-merge-progress-deadlock branch from 9290b84 to b7a0282 Compare January 17, 2024 15:15

jedevc marked this pull request as ready for review January 17, 2024 15:16

tonistiigi approved these changes Jan 17, 2024

View reviewed changes

jedevc merged commit 9d84cdc into moby:master Jan 18, 2024
60 checks passed

jedevc deleted the fix-edge-merge-progress-deadlock branch January 18, 2024 11:32

jedevc mentioned this pull request Jan 25, 2024

propagate progrock parents dagger/dagger#6456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler: always edge merge in one direction #4559

scheduler: always edge merge in one direction #4559

jedevc commented Jan 16, 2024 •

edited

jedevc commented Jan 16, 2024

tonistiigi commented Jan 16, 2024

jedevc commented Jan 16, 2024 •

edited

tonistiigi Jan 16, 2024

jedevc Jan 17, 2024

tonistiigi Jan 16, 2024

jedevc Jan 17, 2024

tonistiigi Jan 16, 2024

jedevc Jan 17, 2024

scheduler: always edge merge in one direction #4559

scheduler: always edge merge in one direction #4559

Conversation

jedevc commented Jan 16, 2024 • edited

jedevc commented Jan 16, 2024

tonistiigi commented Jan 16, 2024

jedevc commented Jan 16, 2024 • edited

tonistiigi Jan 16, 2024

Choose a reason for hiding this comment

jedevc Jan 17, 2024

Choose a reason for hiding this comment

tonistiigi Jan 16, 2024

Choose a reason for hiding this comment

jedevc Jan 17, 2024

Choose a reason for hiding this comment

tonistiigi Jan 16, 2024

Choose a reason for hiding this comment

jedevc Jan 17, 2024

Choose a reason for hiding this comment

jedevc commented Jan 16, 2024 •

edited

jedevc commented Jan 16, 2024 •

edited