Skip to content

e2e: add settling checks between liveness cases to fix intermittent flake#3270

Merged
juan-malbeclabs merged 2 commits intomainfrom
jo/3269
Mar 16, 2026
Merged

e2e: add settling checks between liveness cases to fix intermittent flake#3270
juan-malbeclabs merged 2 commits intomainfrom
jo/3269

Conversation

@juan-malbeclabs
Copy link
Contributor

Summary of Changes

  • After unblocking a client in the route liveness matrix, the test now waits for all expected-stable routes to converge before starting the next block/unblock cycle
  • Specifically: after Case B unblocks client2, the test explicitly verifies c1->c3 and c3->c1 are stable before Case C begins blocking client3
  • Without this, BGP propagation from the previous cycle could still be in-flight when Case C starts, causing the c1->c3 restored assertion to race against the 60s timeout

Diff Breakdown

Category Files Lines (+/-) Net
Tests 1 +6 / -2 +4

Single-file change — pure test stabilization, no production code touched.

Key files (click to expand)
  • e2e/multi_client_ibrl_liveness_test.go — adds requireEventuallyRoute settling checks for c1->c3 and c3->c1 after Case B's unblock, and adds a comment explaining why c1->c4 is intentionally omitted (client4 has liveness disabled and never responds to probes)

Testing Verification

  • Ran TestE2E_MultiClientIBRL_RouteLiveness with -count=3 to exercise three consecutive passes; all passed cleanly (~366–407s per run)
  • Confirmed that adding a c1->c4 restored check (the naive approach) fails consistently, validating the understanding that client4's liveness-disabled config prevents route restoration via probe responses

…lake

After unblocking a client, wait for all expected-stable routes to
converge before starting the next block/unblock cycle. Without this,
Case C's c1->c3 restoration could race against BGP propagation from
the prior cycle and fail the 60s timeout.
@juan-malbeclabs juan-malbeclabs enabled auto-merge (squash) March 16, 2026 15:08
@juan-malbeclabs juan-malbeclabs merged commit 34b5ecf into main Mar 16, 2026
30 checks passed
@juan-malbeclabs juan-malbeclabs deleted the jo/3269 branch March 16, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

e2e: TestE2E_MultiClientIBRL_RouteLiveness flaky — c1->c3 route not restored within 60s after third block/unblock cycle

2 participants