-
Notifications
You must be signed in to change notification settings - Fork 181
bug: Issues with subprocess management in Windows #2036
Description
Cortex version
dev
Describe the issue and expected behaviour
2 main issues
1. pid_t on windows is DWORD (unsigned long), but we return -1 to signal failure
cortex.cpp/engine/utils/process/utils.h
Lines 3 to 6 in ca2327a
| #if defined(_WIN32) | |
| #include <process.h> | |
| #include <windows.h> | |
| using pid_t = DWORD; |
cortex.cpp/engine/utils/process/utils.cc
Lines 102 to 104 in ca2327a
| } catch (const std::exception& e) { | |
| LOG_ERROR << "Process spawning error: " << e.what(); | |
| return -1; |
Perhaps we can use another value for failure e.g. 0, or use cpp::result like many other parts of the codebase.
2. TerminateProcess() does not kill children processes
cortex.cpp/engine/utils/process/utils.cc
Line 159 in ca2327a
| bool is_success = TerminateProcess(hProcess, 0) == TRUE; |
The solution is to use Job object https://devblogs.microsoft.com/oldnewthing/20131209-00/?p=2433
Edit: Issue 2 is more complex than I previously thought. Initially I thought Mac and Linux kill children process, but turns out it was because I was testing with uv, and we kill uv process by sending SIGTERM, which uv will propagate the signal to the children processes. If we kill uv process with SIGKILL, actually the children process will also be left hanging. Need to think of a better way for this...
Assigning this to me since this is required for my Python engine PR.
Steps to Reproduce
No response
Screenshots / Logs
No response
What is your OS?
- Windows
- Mac Silicon
- Mac Intel
- Linux / Ubuntu
What engine are you running?
- cortex.llamacpp (default)
- cortex.tensorrt-llm (Nvidia GPUs)
- cortex.onnx (NPUs, DirectML)
Hardware Specs eg OS version, GPU
No response