Skip to content

fix(network): preserve pre-existing root qdisc on tc-based attacks#207

Draft
joshiste wants to merge 1 commit into
mainfrom
fix/preserve-root-qdisc
Draft

fix(network): preserve pre-existing root qdisc on tc-based attacks#207
joshiste wants to merge 1 commit into
mainfrom
fix/preserve-root-qdisc

Conversation

@joshiste
Copy link
Copy Markdown
Member

@joshiste joshiste commented May 28, 2026

Summary

  • Bump action_kit_commons to the new tc qdisc replace-based apply path. Network attacks (delay/loss/corruption/bandwidth) now succeed on hosts where the kernel already has a root qdisc attached (mq on GKE COS / EKS / AKS / RHCOS, where the attack previously failed with NLM_F_REPLACE needed to override).
  • Propagate netfault.Apply preflight warnings to the action Start result as Warn-level messages. Users see a warning when an interface has a user-installed root qdisc (htb, cake, …) that the kernel won't auto-restore on revert.
  • Add an e2e test (network delay preserves pre-existing root qdisc) that creates dummy interfaces with each of the kernel-auto-restored qdisc kinds (mq, fq_codel, pfifo_fast, noqueue, fq) plus a user-installed htb, and verifies the apply/revert cycle restores state.

Related ticket: https://steadybitgmbh.kanbanize.com/ctrl_board/9/cards/18920/details/

Depends on steadybit/action-kit#442 — the go.mod currently pins a pseudo-version from that branch. Will be re-pinned to v1.8.0 before merge — see checklist.

Pre-merge checklist

  • action-kit#442 merged and go/action_kit_commons/v1.8.0 tag pushed
  • Re-pin: go get github.com/steadybit/action-kit/go/action_kit_commons@v1.8.0 && go mod tidy
  • Verify CI green after re-pin

Test plan

  • go build ./... clean
  • go vet ./... clean
  • Unit tests in changed packages pass
  • E2E test network delay preserves pre-existing root qdisc runs green across containerd/cri-o/docker (CI)
  • Manual smoke on a GKE COS node (post-merge)

Notes

  • Do not enable auto-merge.
  • Do not merge before action-kit#442 is merged and a v1.8.0 of action_kit_commons is tagged.

Network attacks (delay, loss, corruption, bandwidth) on hosts where
the kernel had already attached a root qdisc to the target interface
(e.g. `mq` on GKE COS / EKS / AKS / RHCOS) previously failed with
`NLM_F_REPLACE needed to override`. Bump action_kit_commons to pick
up the `tc qdisc replace`-based apply path.

Propagate the preflight warnings returned by `netfault.Apply` to the
action Start result as Warn-level messages. The user sees a warning
when an interface has a user-installed root qdisc (htb, cake, ...)
that the kernel will not auto-restore on revert.

Add an e2e test (`network delay preserves pre-existing root qdisc`)
covering the two preflight branches: a veth interface with the
kernel-default `noqueue` (no warning expected) and a dummy with a
user-installed `htb` (warning expected). The apply path is kind-
agnostic so a single case per branch is enough; parser coverage
across qdisc kinds lives in netfault/preflight_test.go fixtures.

Note: the test deliberately does not assert which kind the kernel
attaches after `qdisc del root` — that's a kernel property dependent
on device flags (IFF_NO_QUEUE) and net.core.default_qdisc, not this
extension's behavior. We only assert that our injected `prio` is
gone.
@joshiste joshiste force-pushed the fix/preserve-root-qdisc branch from d8f766d to f63e458 Compare May 29, 2026 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant