Skip to content

Conversation

@dblnz
Copy link
Contributor

@dblnz dblnz commented Dec 2, 2025

Partially addresses #1032.
This fixes an issue where the guest hangs because of a deadlock produced when an exception occurs in the tracing code and the GuestState lock is acquired. When handling the exception, the call to outb function, tries to acquire the same lock another time, which leads to a deadlock.

@dblnz dblnz requested a review from danbugs as a code owner December 2, 2025 09:46
@dblnz dblnz added the kind/bugfix For PRs that fix bugs label Dec 2, 2025
ludfjig
ludfjig previously approved these changes Dec 2, 2025
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Note that the linked issue will be closed when this merges which may or may not be your intention

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a deadlock issue in guest tracing that occurs when exceptions happen during tracing data serialization. The fix converts direct lock() calls to try_lock() throughout the guest tracing subsystem to detect re-entrant access patterns, and removes the #[instrument] macro from out32() to prevent span creation during exception handling.

Key changes:

  • Replaced lock() with try_lock().expect() in subscriber methods and lib.rs functions to detect and panic on re-entrant access
  • Removed #[instrument] from out32() which can be called from exception contexts
  • Simplified string creation in visitor.rs (refactoring cleanup)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/hyperlight_guest_tracing/src/subscriber.rs Converted all Subscriber trait methods to use try_lock() with panic on failure to detect re-entrant access
src/hyperlight_guest_tracing/src/lib.rs Updated set_start_tsc, end_trace, clean_trace_state, and guest_trace_info to use try_lock(); fixed "guset" typo
src/hyperlight_guest/src/exit.rs Removed #[instrument] from out32() and added documentation explaining why tracing is avoided in exception contexts
src/hyperlight_guest_tracing/src/visitor.rs Refactored string creation to use String::from() directly instead of creating empty String and pushing

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 7 changed files in this pull request and generated 1 comment.

ludfjig
ludfjig previously approved these changes Dec 3, 2025
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dblnz added 2 commits December 3, 2025 22:03
…t to report trace data

When guest tracing is enabled and an exception occurs while the guest
trace state lock is acquired (e.g. no more memory left to allocate a
trace event on the heap), the guest tries to report to the host what the
error was by calling the `out32` function.
This function reports the guest trace data by serializing it on the spot,
which needs to acquire the lock again, this leads to a deadlock.

Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Signed-off-by: Doru Blânzeanu <dblnz@pm.me>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 7 changed files in this pull request and generated 2 comments.

@dblnz dblnz merged commit 2c69974 into hyperlight-dev:main Dec 3, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bugfix For PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants