Fix batch of bugs found in deep code review#40224
Conversation
Bugs fixed (not covered by PR #40197): - unittests.c: Fix get_addr_info test entry pointing to wrong handler (GetSetIdTestEntry -> GetAddrInfoTestEntry) - SocketChannel.h: Fix %s format specifier in fmt-style LOG_ERROR on Linux; channel name was silently dropped from protocol error logs - p9file.cpp: Fix readlinkat return type (int -> ssize_t) to match POSIX specification - configfile.cpp: Guard ungetwc() call against WEOF to avoid undefined behavior on some implementations - init.cpp: Fix SIGCHLD race by blocking the signal before setting the handler, preventing a window where child exit could be lost - util.cpp: Extract duplicated signal skip list into SkipSignal() helper to ensure consistency between save and set handlers Test improvements: - NetworkTests.cpp: Add SO_RCVTIMEO timeout on accept() to prevent indefinite test hangs (resolves TODO) - DrvFsTests.cpp: Add LOG_IF_WIN32_BOOL_FALSE to cleanup operations to surface silent file/directory deletion failures Script hardening: - copy_and_build_tests.ps1: Replace Invoke-Expression with call operator to prevent command injection via interpolated variables - test-setup.ps1: Pass PostInstallCommand through bash -c for proper shell argument handling - deploy-to-vm.ps1: Prompt for password via Read-Host -AsSecureString when not provided, instead of creating empty SecureString Resource management: - WslConfigService.cs: Dispose FileSystemWatcher in destructor to prevent resource leak Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Full deep review covering 7 subsystems with 44 findings. 12 fixes on this branch, 10+ already in PR #40197, 4 deferred. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR applies a batch of targeted bug fixes and hardening changes across WSL’s Linux init/plan9 layers, shared infrastructure, test suites, and supporting PowerShell tooling.
Changes:
- Fix correctness issues in Linux/shared code paths (protocol logging, config parsing edge case, POSIX return type, SIGCHLD race, duplicated signal skip logic).
- Improve test reliability and debuggability (fix wrong Linux unit test entry, add host-side timeout attempt for a Windows networking test, surface cleanup failures in DrvFs tests).
- Harden developer scripts and resource management (remove PowerShell
Invoke-Expression, improve VM deploy password prompting, attempt to closeFileSystemWatcherresources).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/test/test-setup.ps1 | Run post-install commands via bash -c under wsl.exe --exec for better command handling. |
| tools/test/copy_and_build_tests.ps1 | Replace Invoke-Expression with direct script invocation for safer execution. |
| tools/deploy/deploy-to-vm.ps1 | Prompt for a password securely when none is provided. |
| test/windows/NetworkTests.cpp | Add a timeout-setting attempt intended to avoid indefinite accept() hangs. |
| test/windows/DrvFsTests.cpp | Log cleanup failures for reparse test artifacts to aid diagnosis. |
| test/linux/unit_tests/unittests.c | Fix get_addr_info test dispatch to the correct handler. |
| src/windows/wslsettings/Services/WslConfigService.cs | Dispose the FileSystemWatcher in the finalizer. |
| src/shared/inc/SocketChannel.h | Fix Linux fmt-style logging placeholder so the channel name is included. |
| src/shared/configfile/configfile.cpp | Guard against ungetwc(WEOF) undefined behavior. |
| src/linux/plan9/p9file.cpp | Use ssize_t for readlinkat result per POSIX. |
| src/linux/init/util.cpp | Deduplicate the signal skip list into a helper. |
| src/linux/init/init.cpp | Block SIGCHLD before resetting handler to close a race window. |
| code-review.md | Add a deep-review summary and tracking document for identified issues/fixes. |
| // Signals that cannot or should not be overridden by save/set handlers. | ||
| static bool SkipSignal(unsigned int Signal) | ||
| { | ||
| switch (Signal) | ||
| { | ||
| case SIGKILL: | ||
| case SIGSTOP: | ||
| case SIGCONT: | ||
| case SIGHUP: | ||
| case 32: | ||
| case 33: | ||
| case 34: | ||
| return true; |
There was a problem hiding this comment.
SkipSignal() hard-codes signal numbers 32/33/34, which are ABI-/libc-dependent (e.g., realtime signal ranges vary). Please replace these magic numbers with named constants/macros (such as SIGRTMIN/SIGRTMAX-derived values) or add a clear rationale tied to the specific environment assumptions.
| ~WslConfigService() | ||
| { | ||
| _wslConfigFileSystemWatcher?.Dispose(); | ||
| WslCoreConfigInterface.FreeWslConfig(_wslConfig); | ||
| WslCoreConfigInterface.FreeWslConfig(_wslConfigDefaults); |
There was a problem hiding this comment.
Disposing the FileSystemWatcher only in the finalizer is non-deterministic and likely won’t run during normal app lifetime (this service is registered as a singleton). If the intent is to avoid handle/resource leaks, implement IDisposable and dispose the watcher (and native configs) when the app/host shuts down.
| A comprehensive deep-dive review of the entire WSL codebase identified **44 issues** across | ||
| 7 major subsystems. **13 confirmed bugs** have been fixed on the `copilot/code-review-fixes` | ||
| branch (plus 8 more are already addressed by open PR #40197). The remaining findings are | ||
| documented below for future triage. |
There was a problem hiding this comment.
The executive summary says "13 confirmed bugs" fixed on this branch, but the PR description states 12. Please reconcile the counts (and/or clarify what the extra item is) so the documentation matches the PR scope.
| $copyScriptCommand = $PSScriptRoot + "\copy_tests.ps1 -WslTestDirPath $WslTestDirPath -DistroName $DistroName" | ||
|
|
||
| $cleanTestCommand = "rm -rf /data/test" | ||
| $buildTestCommand = "cd /data/test; ./build_tests.sh; less /data/test/log/build_output" | ||
|
|
||
| # clean test directory on linux side | ||
| Write-Output "Cleaning unit tests at $DistroPath\rootfs\data\test" | ||
| Run { wsl.exe --distribution $DistroName --user root --exec bash -c "$cleanTestCommand" } | ||
|
|
||
| # call the logic in copy_tests.ps1 | ||
| Invoke-Expression $copyScriptCommand | ||
| & (Join-Path $PSScriptRoot "copy_tests.ps1") -WslTestDirPath $WslTestDirPath -DistroName $DistroName |
There was a problem hiding this comment.
The new direct invocation of copy_tests.ps1 makes the earlier $copyScriptCommand construction unused. Consider removing the unused variable (and related string-building) to avoid confusion and keep StrictMode scripts easy to maintain.
| // Set a timeout on the listen socket to avoid an indefinite wait if the client never connects. | ||
| DWORD timeout = 10000; | ||
| VERIFY_ARE_NOT_EQUAL(setsockopt(listenSocket.get(), SOL_SOCKET, SO_RCVTIMEO, (char*)&timeout, sizeof(timeout)), SOCKET_ERROR); |
There was a problem hiding this comment.
Setting SO_RCVTIMEO does not apply to accept() on Winsock; accept can still block indefinitely here. To enforce a timeout, consider using select()/WSAPoll on the listening socket, or make it non-blocking and poll for FD_ACCEPT/readiness before calling accept.
| // Set a timeout on the listen socket to avoid an indefinite wait if the client never connects. | |
| DWORD timeout = 10000; | |
| VERIFY_ARE_NOT_EQUAL(setsockopt(listenSocket.get(), SOL_SOCKET, SO_RCVTIMEO, (char*)&timeout, sizeof(timeout)), SOCKET_ERROR); | |
| // Wait for the listening socket to become ready before calling accept(), since | |
| // SO_RCVTIMEO does not apply to accept() on Winsock. | |
| fd_set readSet; | |
| FD_ZERO(&readSet); | |
| FD_SET(listenSocket.get(), &readSet); | |
| timeval timeout{}; | |
| timeout.tv_sec = 10; | |
| timeout.tv_usec = 0; | |
| const int selectResult = select(0, &readSet, nullptr, nullptr, &timeout); | |
| VERIFY_ARE_EQUAL(selectResult, 1); | |
| VERIFY_IS_TRUE(FD_ISSET(listenSocket.get(), &readSet)); |
Summary of the Pull Request
This PR fixes 12 confirmed bugs found during a comprehensive deep review of the entire WSL codebase. Issues already covered by PR #40197 are intentionally excluded.
PR Checklist
Detailed Description of the Pull Request / Additional comments
Bug fixes (not covered by #40197)
get_addr_infotest entry pointed toGetSetIdTestEntryinstead ofGetAddrInfoTestEntry(copy-paste error — wrong test was running silently)%sformat specifier in fmt-styleLOG_ERRORon Linux; channel name was silently dropped from protocol error logsreadlinkatreturn stored asintinstead ofssize_tper POSIX specungetwc(WEOF)is undefined on some implementations; added guardsignal(SIG_DFL)andsigprocmask(SIG_BLOCK); fixed by blocking firstSkipSignal()helperTest improvements
SO_RCVTIMEOtimeout onaccept()to prevent indefinite test hangs (resolves inline TODO)LOG_IF_WIN32_BOOL_FALSEto cleanup operations to surface silent deletion failuresScript hardening
Invoke-Expressionwith call operator&to prevent command injectionPostInstallCommandthroughbash -cfor proper argument handlingRead-Host -AsSecureStringwhen not providedResource management
FileSystemWatcherin destructor to prevent resource leakValidation Steps Performed
FormatSource.ps1passed on all C/C++ changeswslservice.exe,common.lib,configfile.liball build clean)wsltests(lambda capture in UnitTests.cpp) andwslserviceproxystubare unrelated