Problem
When a host function callback (registered via registerHostFunction or setHostPrintFn) tries to call back into the same sandbox (e.g. callHandler, snapshot, restore, unload), it deadlocks.
This happens because call_handler holds the LoadedJSSandbox mutex for the entire duration of guest execution. Host functions are dispatched via TSFN to the Node.js main thread while that lock is held. If the callback then calls any method that needs the same lock, it waits forever.
Why this doesn't happen in core hyperlight
In hyperlight-dev/hyperlight, the host function registry (Arc<Mutex<FunctionRegistry>>) uses a separate lock from the sandbox. Host functions are dispatched synchronously while the VM is paused — they don't need the sandbox lock at all. See src/hyperlight_host/src/sandbox/outb.rs.
In hyperlight-js, the QuickJS runtime invokes host function closures inside handle_event, which requires &mut self on the sandbox. The NAPI layer wraps this in a single tokio::sync::Mutex, so host function dispatch and sandbox lifecycle share the same lock.
Current workaround
PR #55 adds an executing_flag (AtomicBool) that detects reentrancy at runtime. If a callback tries to acquire the lock while guest code is executing, it returns ERR_REENTRANT instead of deadlocking. This prevents hangs but doesn't allow the operation to succeed.
Suggested fix
Separate host function dispatch from the sandbox lock, similar to how core hyperlight does it. Options:
- Move host function state out of the
&mut self borrow so callbacks don't need the sandbox lock
- Temporarily release the sandbox lock before dispatching to host functions, reacquire after
- Provide a shared FFI/binding helper crate that handles this pattern correctly for any language binding
Reproduction
const loaded = await sandbox.getLoadedSandbox();
proto.registerHostModule('mymod', (mod) => {
mod.registerHostFunction('callback', async () => {
// This deadlocks (or returns ERR_REENTRANT with the fix)
await loaded.callHandler('other_handler', {});
return 'result';
});
});
Problem
When a host function callback (registered via
registerHostFunctionorsetHostPrintFn) tries to call back into the same sandbox (e.g.callHandler,snapshot,restore,unload), it deadlocks.This happens because
call_handlerholds theLoadedJSSandboxmutex for the entire duration of guest execution. Host functions are dispatched via TSFN to the Node.js main thread while that lock is held. If the callback then calls any method that needs the same lock, it waits forever.Why this doesn't happen in core hyperlight
In
hyperlight-dev/hyperlight, the host function registry (Arc<Mutex<FunctionRegistry>>) uses a separate lock from the sandbox. Host functions are dispatched synchronously while the VM is paused — they don't need the sandbox lock at all. Seesrc/hyperlight_host/src/sandbox/outb.rs.In
hyperlight-js, the QuickJS runtime invokes host function closures insidehandle_event, which requires&mut selfon the sandbox. The NAPI layer wraps this in a singletokio::sync::Mutex, so host function dispatch and sandbox lifecycle share the same lock.Current workaround
PR #55 adds an
executing_flag(AtomicBool) that detects reentrancy at runtime. If a callback tries to acquire the lock while guest code is executing, it returnsERR_REENTRANTinstead of deadlocking. This prevents hangs but doesn't allow the operation to succeed.Suggested fix
Separate host function dispatch from the sandbox lock, similar to how core hyperlight does it. Options:
&mut selfborrow so callbacks don't need the sandbox lockReproduction