Strip basedirs from cmdline args and direct-mode hash key#2711
Strip basedirs from cmdline args and direct-mode hash key#2711solarispika wants to merge 3 commits into
Conversation
|
Given if it is clearly written by AI, could you please refactor to only keep the relevant comments ? Also a long comment #0 isn't helpful |
| /// [`HashKeyParams::compute`] for the precise reason this is Windows-only | ||
| /// at runtime; we compile it on every platform so its unit test can run | ||
| /// in any CI matrix. | ||
| #[cfg_attr(not(target_os = "windows"), allow(dead_code))] |
There was a problem hiding this comment.
please don't add allow dead code
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2711 +/- ##
==========================================
+ Coverage 74.34% 74.48% +0.14%
==========================================
Files 70 70
Lines 39373 39631 +258
==========================================
+ Hits 29271 29520 +249
- Misses 10102 10111 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /// in any CI matrix. | ||
| #[cfg_attr(not(target_os = "windows"), allow(dead_code))] | ||
| fn collapse_escaped_backslashes(input: &[u8]) -> Vec<u8> { | ||
| let mut out = Vec::with_capacity(input.len()); |
There was a problem hiding this comment.
please refactor to avoid the mut if possible
| m.update(&[self.plusplus as u8]); | ||
| m.update(CACHE_VERSION); | ||
| m.update(self.language.as_str().as_bytes()); | ||
| // Strip basedirs from each cmdline arg before hashing. Path-bearing |
There was a problem hiding this comment.
why adding these comments?
| // `strip_basedirs` turns each `\` into `/` byte-for-byte, so the | ||
| // source-level `\\path` becomes `//path` and never matches a basedir | ||
| // stored as `/path`. | ||
| #[cfg(target_os = "windows")] |
There was a problem hiding this comment.
move that into a function
you added too many cfg here
it makes the code too hard to read
Path-bearing flags that end up in `common_args` (e.g. `-external:I <path>` on MSVC, or any other flag classified as `PassThroughPath`) embed the absolute build dir into the cache key. On CI runners whose workdir changes between job invocations (act_runner, ephemeral workers), this defeats the cache. Mirror what ccache does via `make_relative_path` for any `TAKES_PATH` arg: run each arg through `strip_basedirs` before feeding it into the digest. An explicit 0-byte separator between args is added so adjacent args don't blur into each other (`["-Ia", "b"]` must hash differently from `["-I", "ab"]`). Boundary semantics in `strip_basedirs` mean this currently only catches Separated-disposition args where the path occupies its own OsString (positions 0 in the arg byte-slice). Concatenated forms like `-IC:\path` remain a future enhancement. Test: HashKeyParams::compute with two workspaces' `-external:I <path>` args hashes equal under basedirs, divergent without; an inter-arg adjacency check catches a missing separator regression.
1c2f595 to
69d76e7
Compare
|
Revised, please check. |
| pub const CACHE_VERSION: &[u8] = b"12"; | ||
|
|
||
| /// Collapse `\\` byte pairs into a single `\`. | ||
| #[cfg(any(target_os = "windows", test))] |
There was a problem hiding this comment.
The function is pure byte manipulation, so I kept the test cross-platform to get cheap coverage from the Linux/macOS CI matrix too. Since the function only runs on Windows in production, gating both the helper and the test under #[cfg(target_os = "windows")] is also defensible — the trade-off is the byte logic then only runs in the Windows CI legs.
Happy to either way. Which do you prefer?
There was a problem hiding this comment.
just windows is probably enough :)
MSVC's preprocessor expands `__FILE__` (e.g. inside `_wassert(L"...", L"<file>", line)`) into a *C source-form* string literal where every backslash is doubled. `normalize_win_path` inside `strip_basedirs` walks the output byte-by-byte and maps each `\` to `/`, so a doubled `\\path` becomes `//path` — which never matches a basedir stored as `/path`. The result on Windows MSVC + basedirs is: cache key includes the absolute workspace path verbatim, defeating sharing across CI jobs with ephemeral workdirs. Collapse `\\` byte pairs into a single `\` before handing the preprocessor output to `strip_basedirs`. Gate both the helper and its unit test on `target_os = "windows"` since non-Windows preprocessor output already uses forward slashes natively. Tests (Windows-only): - `test_collapse_escaped_backslashes`: trivial cases, the headline case, and odd-length runs. - `test_hash_key_basedirs_windows_escaped_backslash_paths`: full `HashKeyParams::compute` round-trip showing two MSVC-shaped preprocessor outputs from two distinct workspaces hash equal under basedirs.
Direct mode (`preprocessor_cache_entry_hash_key`) hashed each arg through `OsStr::hash`, leaving absolute paths in the key. Symmetric to the preprocessor-mode fix in `HashKeyParams::compute`: run each arg through `strip_basedirs`, then a 0-byte separator. Until this lands, direct mode has to be disabled (`SCCACHE_DIRECT=false`) on any CI runner whose workdir changes between invocations, which forces every TU through the preprocessor. Notes: - Same boundary-check caveat as the preprocessor-mode fix: only Separated-disposition args (path at position 0 inside the arg byte slice) are caught. - Include paths stored *inside* a cached `PreprocessorCacheEntry` remain absolute. When workspace B looks up an entry written by workspace A, `result_matches` will fail on the project-local include paths and fall back to the preprocessor-mode key — which is fine, because the preprocessor-mode key is now also basedirs- stripped. So this change cannot regress correctness; the worst it does is fail to skip the preprocessor for a TU whose includes are all under basedirs but happen to be project-local. Test: `test_preprocessor_cache_entry_hash_key_args_basedirs` constructs two workspaces with distinct tempdir roots, verifies that args carrying each workspace's absolute path hash equal under basedirs and divergent without, and checks that the inter-arg separator catches an adjacency regression.
69d76e7 to
456bef3
Compare
Follow-up to #2521. The basedirs strip ran only on preprocessor output, leaving two leaks:
common_argsisOsStr::hashed verbatim, so path-bearing flags (e.g.-external:I <path>) embed absolute paths in the key.__FILE__strings in C source form (\\escapes);normalize_win_paththen maps each\\to//and never matches a basedir.preprocessor_cache_entry_hash_key) hashes args the same un-stripped way.Fix in 3 commits, one per gap. With
SCCACHE_BASEDIRSunset the new code is byte-identical to before. NoCACHE_VERSIONbump — existing entries were per-workdir and unreachable anyway.Tests
Unit test per commit (args strip preprocessor mode, args strip direct mode, byte-level backslash collapse, Windows full round-trip).
cargo test --lib --releasepasses on macOS 15.2 / apple-clang 17, Debian 12 / gcc-12, Windows 11 / MSVC 14.50. End-to-end: 100% cache hit rate across our 10-cell matrix (was 0% before).Not in this PR
-IC:\path,/imsvc<path>) —strip_basedirs's boundary check rejects mid-arg matches. On MSVC these flow throughpreprocessor_args(notcommon_args), so they're stripped via the preprocessor-output path already.PreprocessorCacheEntrystill stores include paths absolute, so direct-mode lookups across workspaces fall back to the preprocessor-mode key (which now hashes correctly). Cross-workspace direct-mode hits are a separate PR.