Skip to content

fix(vector-index): respect Buffer slice metadata in base64 round-trip#585

Closed
kek-Sec wants to merge 1 commit into
rohitg00:mainfrom
kek-Sec:fix/584-base64-float32-buffer-pool-slice
Closed

fix(vector-index): respect Buffer slice metadata in base64 round-trip#585
kek-Sec wants to merge 1 commit into
rohitg00:mainfrom
kek-Sec:fix/584-base64-float32-buffer-pool-slice

Conversation

@kek-Sec
Copy link
Copy Markdown

@kek-Sec kek-Sec commented May 21, 2026

Buffer.from(b64, "base64") may return a slice of Node's internal pool (poolSize=8192). Calling new Float32Array(buf.buffer) ignores byteOffset and byteLength, producing a 2048-element view over the whole pool instead of the actual decoded slice — the root cause of the phantom "dimensions seen on disk: 2048" crashes reported in #455 and #469.

Pass byteOffset/byteLength to both helpers so the Float32Array covers only the decoded bytes. Add a regression test that serializes and deserialises 384-dim vectors (1536 bytes, within pool threshold) and asserts correct dimension and nearest-neighbour identity after reload.

Fixes #584

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed vector serialization to ensure data integrity when saving and restoring vector indices.
  • Tests

    • Added test coverage validating vector index serialization and restoration consistency.

Review Change Stack

Buffer.from(b64, "base64") may return a slice of Node's internal pool
(poolSize=8192). Calling new Float32Array(buf.buffer) ignores byteOffset
and byteLength, producing a 2048-element view over the whole pool instead
of the actual decoded slice — the root cause of the phantom
"dimensions seen on disk: 2048" crashes reported in rohitg00#455 and rohitg00#469.

Pass byteOffset/byteLength to both helpers so the Float32Array covers
only the decoded bytes. Add a regression test that serializes and
deserialises 384-dim vectors (1536 bytes, within pool threshold) and
asserts correct dimension and nearest-neighbour identity after reload.

Fixes rohitg00#584

Signed-off-by: George Petrakis <g.petrakis@natechbanking.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

@kek-Sec is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88dbb656-089b-4363-91d1-3a0853932c8a

📥 Commits

Reviewing files that changed from the base of the PR and between c14bdc5 and 60e7d6d.

📒 Files selected for processing (2)
  • src/state/vector-index.ts
  • test/vector-index.test.ts

📝 Walkthrough

Walkthrough

This PR fixes a bug in vector serialization where Float32Array views into pooled Node buffers were incorrectly encoded/decoded using the entire underlying ArrayBuffer instead of their actual slice, causing phantom dimensions (2048 instead of the true 384) and data corruption on reload. The implementation now respects byte offsets and lengths; a new test validates that vectors survive serialization cycles with correct dimensions and search identity.

Changes

Vector serialization buffer offset fix

Layer / File(s) Summary
Vector serialization buffer offset fix
src/state/vector-index.ts, test/vector-index.test.ts
float32ToBase64 and base64ToFloat32 now use byteOffset and byteLength to encode/decode only the typed array's actual slice, preventing issues when vectors are views into pooled buffers. Test validates that dimensions are preserved and post-deserialization search returns each vector as its own top match.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A phantom 2048 haunted our vectors true,
Pool buffers masked the slices they once knew!
Now byteOffset whispers the correct tale—
Each slice stands proud, no longer destined to fail. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(vector-index): respect Buffer slice metadata in base64 round-trip' accurately summarizes the primary fix: ensuring that base64 serialization/deserialization respects Buffer slice metadata (byteOffset and byteLength).
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #584: respects slice metadata in both float32ToBase64 and base64ToFloat32 helpers, and includes regression test verifying correct dimensions and nearest-neighbor identity for pooled-size vectors.
Out of Scope Changes check ✅ Passed All changes are strictly scoped to fixing the identified bug: only the two helper functions and a targeted regression test are modified, with no unrelated alterations to other code paths or functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cuongdevv
Copy link
Copy Markdown

Confirming this fix works end-to-end. I applied the exact same byteOffset/byteLength patch to both dist/src-D5arboxc.mjs and dist/index.mjs on my install (0.9.21, Node v24.12.0, EMBEDDING_PROVIDER=local, 2051-vector index that was previously phantom-2048 crashing every boot) and it loads cleanly:

[agentmemory] Embedding provider: local (384 dims)
[agentmemory] Loaded persisted vector index (2051 vectors)
[agentmemory] Ready. Triple-stream (BM25+Vector+Graph) search active.

The regression test in test/vector-index.test.ts is exactly the right shape — 384-dim (1536 bytes) is the only size that lands in Node's Buffer.poolSize=8192 pool, so it's the only case that reproduces. Larger dims (768/1024/1536) allocate standalone and were silently fine, which is why this bug only ever surfaced for local users.

Thanks @kek-Sec for turning this into a proper PR. Hope @rohitg00 can get this merged + a 0.9.22 cut soon — every day this sits unmerged is another dimensions seen on disk: 2048 false alarm someone "recovers" from by nuking their vector index via AGENTMEMORY_DROP_STALE_INDEX=true.

@kek-Sec
Copy link
Copy Markdown
Author

kek-Sec commented May 21, 2026

also closes #587

@rohitg00
Copy link
Copy Markdown
Owner

Verified the same fix lands in #653 with Float32Array.BYTES_PER_ELEMENT instead of the magic / 4 constant. Both PRs solve the same root cause (Buffer.from(b64, 'base64') returns a slice of Node's pool; new Float32Array(buf.buffer) ignores offset/length).

Recommending #653 as the merge target for the named-constant clarity. Your 5-vector round-trip + validateDimensions test is more thorough than the one in #653 though — would be great as a follow-up PR after #653 lands. Thanks @kek-Sec.

@rohitg00
Copy link
Copy Markdown
Owner

Thanks @kek-Sec — during bug-fix triage we landed an equivalent fix in #683. Closing this as superseded; really appreciate the report and the regression test that helped pin down the root cause.

@rohitg00 rohitg00 closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

base64ToFloat32 produces a phantom 2048-element view over Node's shared Buffer pool (likely root cause of #455 / #469)

3 participants