-
Notifications
You must be signed in to change notification settings - Fork 155
Labels
area/securityInvolves security-related changes or fixesInvolves security-related changes or fixes
Milestone
Description
What happened?
After a relative big number of iterations of the fuzzy test, the call to the guest function hangs, never returns.
What did you expect to happen?
The expectation is for the fuzzy test to never make a guest function call hang or crash.
Steps to reproduce the behavior
This issue was observed when working on running fuzzy tests that call a recursive guest function that creates spans and log events.
The test calls a guest function that makes up to 255 recursive calls and when the halt function is called to signal the completion of the function call, it leaks the serialized data.
Hyperlight Version
0.11.0
OS version
On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
$ uname -a
Linux laptop 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 x86_64 GNU/Linux
On Windows:
C:\> cmd /c ver
Not testedAdditional Information
After further debugging, I found there were two issues producing this outcome:
- When the guest calls
haltfunction, it wants to report the tracing data to the host, so it calls thehyperlight-guest-tracing::guest_trace_infofunction to serialize the events captured until then.
This function returns aVec<u8>and after calling thehltinstruction, thehypervisornever yields back control to the guest to correctlydropthe allocated memory, which causes a leak. - The problem that causes the hanging is a deadlock produced by a lock of the trace data because, at some point, the heap allocation fails which in turn generates an exception which calls
outbfunction to send the same trace data to the host, which in turn tries to get the lock.
Solution:
- Change the logic to serialize the events as they are created, and only keep a buffer containing a chunk of serialized events. This removes the need to serialize/allocate when we want to send to the host.
We can then directly send the buffer. - Never
lock()the trace data Mutex, instead usetry_lock()and in case there's a place we don't know what to do, it means something terrible happened so we can panic, otherwise just skip over reporting the tracing data if needed.
Metadata
Metadata
Assignees
Labels
area/securityInvolves security-related changes or fixesInvolves security-related changes or fixes