Skip to content

20.20.2: V8 accepts pre-CVE-2026-21717 code caches, SIGSEGV on property access #62781

@smaeda-ks

Description

@smaeda-ks

Version

20.20.2

Platform

only tested on `linux/amd64` so far

Subsystem

No response

What steps will reproduce the bug?

Authorship note. This write-up was primarily drafted by Anthropic's Claude (Opus 4.7) working with the human reporter during the investigation. All empirical claims — reproducer output, the bisect matrix, cache-header dumps, fuzz counts, and version compatibility tests — come from runs actually performed on the machines described below; the reproduction scripts in this directory (run.sh, verify-matched.sh, both Dockerfiles, repro.js) are the exact scripts that produced them. Interpretive claims about V8 internals are inference from the cited commits / source files and should be treated as such unless they point directly at a linked V8 source file.

Reproduction code

https://github.com/smaeda-ks/node-20.20.2-bug-report-20260417

TL;DR

vm.Script({ cachedData }) in Node.js v20.20.2 accepts a code cache that was produced by v20.20.0 or v20.20.1 (cachedDataRejected === false, v8.cachedDataVersionTag() is identical at 0x12098dbb) despite the CVE-2026-21717 HashDoS fix (commit af5c144ebc) having changed the on-disk encoding of Name hash fields. Running the cached bytecode SIGSEGVs as soon as a Proxy (or any object with a non-trivial [[Get]] accessor, e.g. URL) is accessed using one of a specific set of names. At least these names reproduce:

all  big  blink  href  join  small  status  strike  sub  sup  toLocaleString  tt

(these are the crashing names I have observed so far — not an exhaustive list; a ~10k-random-string fuzz found zero triggers, so the set appears biased toward well-known / pre-interned names. See the "Which property names reproduce" section.) The equivalent backports to 22.22.2 and 24.14.1 do not exhibit this in my tests; there V8 correctly rejects the old cache (cachedDataRejected === true).

Tested on linux/amd64 only. I haven't verified arm64 / Graviton behavior.

Minimal source that crashes when cross-version cached:

new Proxy({}, { get() { return 1; } }).href;

Reproduce

./run.sh

Expected tail output:

[make] node=v20.20.0 v8=11.3.244.8-node.33 cachedDataVersionTag=0x12098dbb bytes=432

[run]  node=v20.20.2 v8=11.3.244.8-node.38 cachedDataVersionTag=0x12098dbb cachedDataRejected=false bytes=432
*** REPRODUCED: exit=139 (SIGSEGV) ***

Root cause (bisect-confirmed)

Commit af5c144ebc ("deps,build,test: fix array index hash collision") lands the Node.js side of the March-2026 HashDoS fix (CVE-2026-21717). It turns on v8_enable_seeded_array_index_hash in tools/v8_gypfiles/features.gypi and vendors rapidhash-v8, which together change how V8 stores array-index values inside a Name's 32-bit hash field. Per the blog post, the layout is:

 bit:  31          26 25                                  2 1 0
       +-------------+-----------------------------------+---+
 old:  |  length  (6)|      raw integer value       (24) | flags |
       +-------------+-----------------------------------+---+
 new:  |  length  (6)|  SeedArrayIndexValue(value) (24)  | flags |
       +-------------+-----------------------------------+---+

SeedArrayIndexValue is a three-round xorshift-multiply bijection whose three multipliers (m1, m2, m3) are derived from rapidhash secrets that live in V8's read-only roots (baked into the V8 binary at build time). UnseedArrayIndexValue is the inverse, using precomputed modular inverses.

Cached bytecode produced by v20.20.0 or v20.20.1 encodes integer-index Name::hash_field values using the old layout (raw integer, no seeding). When v20.20.2 deserializes that cache and later runs a keyed property lookup, V8 interprets the same 24 bits as SeedArrayIndexValue(value) and calls UnseedArrayIndexValue on them — the inverse of a permutation that was never applied in the first place. For a specific set of property names (see below), this encoding mismatch causes V8 to SIGSEGV during property lookup. I did not trace the exact failing V8 code path beyond localizing it to the keyed-load slow path; the blog post's SeedArrayIndexValue / UnseedArrayIndexValue description is what I'm basing the encoding model on.

I confirmed the commit is responsible by building two v20.20.2 Node binaries from the same debian:bookworm-slim base with the same ./configure flags — one stock, one with git revert af5c144ebc applied — and running the minimal reproducer as producer/consumer in all four matched pairs:

  A. stock    -> stock    (baseline) exit=0
  B. reverted -> reverted (baseline) exit=0
  C. reverted -> stock               exit=139  <-- SIGSEGV
  D. stock    -> reverted            exit=139  <-- SIGSEGV

Same-binary pairs run cleanly; both cross-binary pairs crash. Since the only difference between stock and reverted is whether af5c144ebc is applied, the commit introduces a binary-incompatible on-disk encoding of Name hash fields without bumping any cache-header sanity field that would cause V8 to reject caches produced by the other half. (The Dockerfile.revert, Dockerfile.stock, and verify-matched.sh used for this bisect are in the repository root alongside run.sh.)

The upstream HashDoS blog post discusses performance and statistical-quality tradeoffs of the hash change in detail, but does not explicitly mention code-cache invalidation implications.

(Side note, not the primary bug: Node 20.20.2's tools/make-v8.sh also passes v8_use_default_hasher_secret=false — which, if applied, would make each V8 build generate its own rapidhash secrets instead of using the hardcoded RAPIDHASH_DEFAULT_SECRET constants from deps/v8/third_party/rapidhash-v8/secret.h. That would make any two independently-produced Node 20.20.2 builds have mutually-incompatible caches with no header field detecting it. I did not see this in practice between node:20.20.2-slim and public.ecr.aws/lambda/nodejs:20 (their first-32-byte cache headers are bit-identical), so either the release pipeline produces identical binaries or the default ./configure / make path doesn't actually forward this flag to V8. I didn't trace it further — flagging it only because it's a tail-risk worth keeping in mind for embedders that ship Node binaries.)

Which property names reproduce

"href" is not unique. I ran two sweeps against the minimal template new Proxy({}, {get(){return 1}}).NAME;, building the bytecode cache on 20.20.0 and loading it on 20.20.2 for each candidate:

  1. Exhaustive: all 2- and 3-letter lowercase ASCII combinations (676 + 17 576 = 18 252 names). Crashing names: tt, all, big, sub, sup (5 / 18 252, ~0.03%).
  2. Curated: ~250 common built-in / DOM / URL / collection method names. Crashing names: blink, small, strike, href, join, status, toLocaleString.
  3. Random: 15 000 random 4–8-char strings. Zero crashes from the random sample.

All 12 triggers I found are names that V8 is plausibly pre-interning in its read-only snapshot — the Annex-B legacy String.prototype HTML-wrapper methods (big, blink, small, strike, sub, sup), the built-ins toLocaleString / join, and common web/DOM names (href, status, all, tt). The random 15 000-string sweep produced zero crashes, which rules out a structural cache-format mismatch (which would crash on every name) and is consistent with the bug firing on strings whose Name::hash_field is pre-computed and baked into the V8 binary's RO snapshot rather than on strings hashed fresh at parse time. I did not verify V8's exact RO-snapshot string list, so "pre-interned" here is an inference from the observed pattern, not something I read out of V8's root table.

Why the syntactic form doesn't matter

All four of these crash identically:

new Proxy({}, {get(){return 1}}).href              // member expression
new Proxy({}, {get(){return 1}})["href"]           // bracket form
Reflect.get(new Proxy({}, {get(){return 1}}), "href")
new URL("http://a").href                           // real URL accessor

They all compile to V8's generic keyed-load path (something in the KeyedLoadGeneric / LoadIC_NoFeedback family for non-monomorphic receivers), which is the code path that consults the Name's hash field and branches on its HashFieldType. A plain ({}).href does not crash — the fast hidden-class path short-circuits to undefined before it ever consults the cached hash field.

Why the cache is still accepted

V8's cache header (SerializedCodeData) carries several sanity-check words, but the set of fields differs between the 20.x line (V8 11.3) and the 22.x/24.x lines (V8 12.4 / 13.x). I confirmed the layouts directly against the bundled V8 source for each release:

  • 20.x: SerializedCodeData has 6 uint32_t fields — magic, version_hash, source_hash, flag_hash, payload_length, checksum (v20.20.2 code-serializer.h).
  • 22.x / 24.x: SerializedCodeData has 7 uint32_t fields — magic, version_hash, source_hash, flag_hash, read_only_snapshot_checksum, payload_length, checksum (v22.22.2 code-serializer.h).

Dumping the first 32 bytes of createCachedData() for the same source new Proxy({}, { get(){ return 1; } }).href; on every version and lining the fields up by offset:

Offset Field name (20.x / 22.x&24.x) 20.20.1 20.20.2 22.22.1 22.22.2 24.13.1 24.14.1
+00 magic c0de05cc c0de05cc c0de0628 c0de0628 c0de0688 c0de0688
+04 version_hash 00e4c20b 00e4c20b 79dafe74 79dafe74 dc338cfa dc338cfa
+08 source_hash 0000002b 0000002b 0000002b 0000002b 0000002b 0000002b
+12 flag_hash 5f1581a7 5f1581a7 77e5cd4d 77e5cd4d 3aa21325 3aa21325
+16 payload_length / read_only_snapshot_checksum 00000198 00000198 f23585ad 48d35134 d31c4342 a3a96c2a
+20 checksum / payload_length 00000000 00000000 000001d0 000001d0 000001e0 000001e0
  • Every one of the 20.20.1 header fields is bit-identical to 20.20.2. cachedDataVersionTag() (0x12098dbb) didn't move despite the V8 embedder string bumping -node.33-node.34-node.38; neither did the flag_hash at +12. Since V8 only compares these fields in SerializedCodeData::SanityCheck, there's no signal to reject a pre-fix cache on 20.20.2.
  • On 22.22.1 → 22.22.2 and 24.13.1 → 24.14.1, the read_only_snapshot_checksum at +16 moves (emboldened above). Enabling v8_enable_seeded_array_index_hash changes the RO snapshot contents, which changes this CRC, so the sanity check rejects pre-fix caches.
  • 20.x's V8 11.3 simply doesn't have that field — its +16 slot is payload_length, which is 408 bytes for our test source no matter which patch release produced it. That's why there's no analogous rejection on the 20.x line.

Versions tested

Producer → Consumer cachedDataVersionTag cachedDataRejected outcome
v20.20.0 → v20.20.2 0x12098dbb (both) false SIGSEGV
v20.20.1 → v20.20.2 0x12098dbb (both) false SIGSEGV
v20.20.2 → v20.20.2 0x12098dbb false OK
v20.20.0 → v20.20.1 0x12098dbb (both) false OK
v22.22.1 → v22.22.2 0x4977cf6d (both) true OK (rejected)
v24.13.1 → v24.14.1 0x56c155c0 (both) true OK (rejected)

Platform: linux/amd64. public.ecr.aws/lambda/nodejs:20 is the AWS Lambda runtime base image (v20.20.2 today). Stock node:20.20.0-slim / node:20.20.1-slim from Docker Hub reproduce identically.

Suggested fix

The 20.x V8 (11.3) cache header has no per-snapshot-CRC field, so the fix has to come from one of the fields it does have. Some concrete options for the Node.js / V8 maintainers to consider:

  • Make cachedDataVersionTag() / V8's Version::Hash() differ between pre-fix and post-fix builds. On 20.20.2 the V8 embedder string did bump (-node.33-node.34-node.38), but cachedDataVersionTag() stayed at 0x12098dbb across all three, so whatever goes into Version::Hash() here apparently doesn't include the embedder string. A tweak that does feed through — e.g. a V8 patch-level bump in deps/v8/src/version.cc, or folding the embedder string into Version::Hash() — would force cache invalidation.
  • Force FlagList::Hash() to include a term that depends on v8_enable_seeded_array_index_hash being on, so the flag_hash at +12 moves when the feature is enabled.
  • Backport V8's read_only_snapshot_checksum header field to the 20.x V8 fork (the mechanism that makes 22.x / 24.x safe today). Larger change, but this is the safeguard designed into newer V8 for exactly this class of RO-snapshot-affecting patches.

Until one of these is in place, embedders that persist V8 code caches across Node.js patch upgrades (build-once-run-later tools, serverless platforms shipping warm caches, and so on) should treat a 20.x patch bump as a hard cache-invalidation event and regenerate caches on first boot of the new runtime.

How often does it reproduce? Is there a required condition?

https://github.com/smaeda-ks/node-20.20.2-bug-report-20260417

What is the expected behavior? Why is that the expected behavior?

Reproduction code should not crash

What do you see instead?

Reproduction code crashes with SIGSEGV

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions