Skip to content

fix: kill orphaned irqbalance process before waiting for result#4339

Merged
LiliDeng merged 1 commit intomainfrom
johgeorg/irqbalance_fix
Apr 2, 2026
Merged

fix: kill orphaned irqbalance process before waiting for result#4339
LiliDeng merged 1 commit intomainfrom
johgeorg/irqbalance_fix

Conversation

@johnsongeorge-w
Copy link
Copy Markdown
Collaborator

When verify_irqbalance runs irqbalance --debug via execute_async with sudo=True, spur internally wraps the command as sudo sh -c 'irqbalance --debug'. The PID tracked by LISA belongs to the sudo/sh wrapper, not to irqbalance itself.

Fix by explicitly killing irqbalance by name after kill(), ensuring the PTY channel closes before wait_result() is called. Pass raise_on_timeout=False to handle any residual channel-close latency without raising a spurious exception,

When verify_irqbalance runs `irqbalance --debug` via execute_async with
sudo=True, spur internally wraps the command as `sudo sh -c 'irqbalance
--debug'`. The PID tracked by LISA belongs to the sudo/sh wrapper, not
to irqbalance itself.

Fix by explicitly killing irqbalance by name after kill(), ensuring the
PTY channel closes before wait_result() is called. Pass
raise_on_timeout=False to handle any residual channel-close latency
without raising a spurious exception,
Copilot AI review requested due to automatic review settings March 14, 2026 16:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in the verify_irqbalance test where irqbalance --debug launched via execute_async with sudo=True could survive as an orphan process after the spur-tracked wrapper process (sudo/sh) was killed. The fix explicitly kills irqbalance by name before calling wait_result(), and uses raise_on_timeout=False to gracefully handle any residual channel-close latency.

Changes:

  • Added Kill tool import and used Kill.by_name("irqbalance") after irqbalance.kill() to ensure the orphaned irqbalance process is terminated
  • Changed wait_result() to pass raise_on_timeout=False so timeout from channel-close latency doesn't raise a spurious exception

You can also share your feedback on Copilot code review. Take the survey.

Comment thread lisa/microsoft/testsuites/network/sriov.py
@LiliDeng LiliDeng merged commit 7134740 into main Apr 2, 2026
62 checks passed
@LiliDeng LiliDeng deleted the johgeorg/irqbalance_fix branch April 2, 2026 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants