Skip to content

Strip basedirs from cmdline args and direct-mode hash key#2711

Open
solarispika wants to merge 3 commits into
mozilla:mainfrom
solarispika:basedirs-cover-args-and-direct-mode
Open

Strip basedirs from cmdline args and direct-mode hash key#2711
solarispika wants to merge 3 commits into
mozilla:mainfrom
solarispika:basedirs-cover-args-and-direct-mode

Conversation

@solarispika
Copy link
Copy Markdown

@solarispika solarispika commented May 15, 2026

Follow-up to #2521. The basedirs strip ran only on preprocessor output, leaving two leaks:

  1. common_args is OsStr::hashed verbatim, so path-bearing flags (e.g. -external:I <path>) embed absolute paths in the key.
  2. On Windows, MSVC's preprocessor emits __FILE__ strings in C source form (\\ escapes); normalize_win_path then maps each \\ to // and never matches a basedir.
  3. The direct-mode key (preprocessor_cache_entry_hash_key) hashes args the same un-stripped way.

Fix in 3 commits, one per gap. With SCCACHE_BASEDIRS unset the new code is byte-identical to before. No CACHE_VERSION bump — existing entries were per-workdir and unreachable anyway.

Tests

Unit test per commit (args strip preprocessor mode, args strip direct mode, byte-level backslash collapse, Windows full round-trip). cargo test --lib --release passes on macOS 15.2 / apple-clang 17, Debian 12 / gcc-12, Windows 11 / MSVC 14.50. End-to-end: 100% cache hit rate across our 10-cell matrix (was 0% before).

Not in this PR

  • Concatenated path args (-IC:\path, /imsvc<path>) — strip_basedirs's boundary check rejects mid-arg matches. On MSVC these flow through preprocessor_args (not common_args), so they're stripped via the preprocessor-output path already.
  • PreprocessorCacheEntry still stores include paths absolute, so direct-mode lookups across workspaces fall back to the preprocessor-mode key (which now hashes correctly). Cross-workspace direct-mode hits are a separate PR.

@sylvestre
Copy link
Copy Markdown
Collaborator

Given if it is clearly written by AI, could you please refactor to only keep the relevant comments ?

Also a long comment #0 isn't helpful

Comment thread src/compiler/c.rs Outdated
/// [`HashKeyParams::compute`] for the precise reason this is Windows-only
/// at runtime; we compile it on every platform so its unit test can run
/// in any CI matrix.
#[cfg_attr(not(target_os = "windows"), allow(dead_code))]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't add allow dead code

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

❌ Patch coverage is 98.08429% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.48%. Comparing base (c0db872) to head (69d76e7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/preprocessor_cache.rs 98.02% 3 Missing ⚠️
src/compiler/c.rs 98.16% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2711      +/-   ##
==========================================
+ Coverage   74.34%   74.48%   +0.14%     
==========================================
  Files          70       70              
  Lines       39373    39631     +258     
==========================================
+ Hits        29271    29520     +249     
- Misses      10102    10111       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/compiler/c.rs Outdated
/// in any CI matrix.
#[cfg_attr(not(target_os = "windows"), allow(dead_code))]
fn collapse_escaped_backslashes(input: &[u8]) -> Vec<u8> {
let mut out = Vec::with_capacity(input.len());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor to avoid the mut if possible

Comment thread src/compiler/c.rs Outdated
m.update(&[self.plusplus as u8]);
m.update(CACHE_VERSION);
m.update(self.language.as_str().as_bytes());
// Strip basedirs from each cmdline arg before hashing. Path-bearing
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding these comments?

Comment thread src/compiler/c.rs Outdated
// `strip_basedirs` turns each `\` into `/` byte-for-byte, so the
// source-level `\\path` becomes `//path` and never matches a basedir
// stored as `/path`.
#[cfg(target_os = "windows")]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move that into a function
you added too many cfg here
it makes the code too hard to read

Path-bearing flags that end up in `common_args` (e.g. `-external:I <path>`
on MSVC, or any other flag classified as `PassThroughPath`) embed the
absolute build dir into the cache key. On CI runners whose workdir
changes between job invocations (act_runner, ephemeral workers), this
defeats the cache.

Mirror what ccache does via `make_relative_path` for any `TAKES_PATH`
arg: run each arg through `strip_basedirs` before feeding it into the
digest. An explicit 0-byte separator between args is added so adjacent
args don't blur into each other (`["-Ia", "b"]` must hash differently
from `["-I", "ab"]`).

Boundary semantics in `strip_basedirs` mean this currently only catches
Separated-disposition args where the path occupies its own OsString
(positions 0 in the arg byte-slice). Concatenated forms like `-IC:\path`
remain a future enhancement.

Test: HashKeyParams::compute with two workspaces' `-external:I <path>`
args hashes equal under basedirs, divergent without; an inter-arg
adjacency check catches a missing separator regression.
@solarispika solarispika force-pushed the basedirs-cover-args-and-direct-mode branch from 1c2f595 to 69d76e7 Compare May 15, 2026 11:12
@solarispika
Copy link
Copy Markdown
Author

Revised, please check.

Comment thread src/compiler/c.rs Outdated
pub const CACHE_VERSION: &[u8] = b"12";

/// Collapse `\\` byte pairs into a single `\`.
#[cfg(any(target_os = "windows", test))]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ", test" ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is pure byte manipulation, so I kept the test cross-platform to get cheap coverage from the Linux/macOS CI matrix too. Since the function only runs on Windows in production, gating both the helper and the test under #[cfg(target_os = "windows")] is also defensible — the trade-off is the byte logic then only runs in the Windows CI legs.

Happy to either way. Which do you prefer?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just windows is probably enough :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Windows-only now.

MSVC's preprocessor expands `__FILE__` (e.g. inside `_wassert(L"...",
L"<file>", line)`) into a *C source-form* string literal where every
backslash is doubled. `normalize_win_path` inside `strip_basedirs`
walks the output byte-by-byte and maps each `\` to `/`, so a doubled
`\\path` becomes `//path` — which never matches a basedir stored as
`/path`. The result on Windows MSVC + basedirs is: cache key includes
the absolute workspace path verbatim, defeating sharing across CI
jobs with ephemeral workdirs.

Collapse `\\` byte pairs into a single `\` before handing the
preprocessor output to `strip_basedirs`. Gate both the helper and its
unit test on `target_os = "windows"` since non-Windows preprocessor
output already uses forward slashes natively.

Tests (Windows-only):
- `test_collapse_escaped_backslashes`: trivial cases, the headline
  case, and odd-length runs.
- `test_hash_key_basedirs_windows_escaped_backslash_paths`: full
  `HashKeyParams::compute` round-trip showing two MSVC-shaped
  preprocessor outputs from two distinct workspaces hash equal
  under basedirs.
Direct mode (`preprocessor_cache_entry_hash_key`) hashed each arg
through `OsStr::hash`, leaving absolute paths in the key. Symmetric to
the preprocessor-mode fix in `HashKeyParams::compute`: run each arg
through `strip_basedirs`, then a 0-byte separator. Until this lands,
direct mode has to be disabled (`SCCACHE_DIRECT=false`) on any CI
runner whose workdir changes between invocations, which forces every
TU through the preprocessor.

Notes:
- Same boundary-check caveat as the preprocessor-mode fix: only
  Separated-disposition args (path at position 0 inside the arg byte
  slice) are caught.
- Include paths stored *inside* a cached `PreprocessorCacheEntry`
  remain absolute. When workspace B looks up an entry written by
  workspace A, `result_matches` will fail on the project-local
  include paths and fall back to the preprocessor-mode key — which
  is fine, because the preprocessor-mode key is now also basedirs-
  stripped. So this change cannot regress correctness; the worst it
  does is fail to skip the preprocessor for a TU whose includes are
  all under basedirs but happen to be project-local.

Test: `test_preprocessor_cache_entry_hash_key_args_basedirs` constructs
two workspaces with distinct tempdir roots, verifies that args carrying
each workspace's absolute path hash equal under basedirs and divergent
without, and checks that the inter-arg separator catches an adjacency
regression.
@solarispika solarispika force-pushed the basedirs-cover-args-and-direct-mode branch from 69d76e7 to 456bef3 Compare May 18, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants