Skip to content

fix(vector-index): preserve byteOffset/byteLength in base64 round-trip (closes #587, #584)#683

Merged
rohitg00 merged 1 commit into
mainfrom
fix/587-vector-buffer-slice
May 27, 2026
Merged

fix(vector-index): preserve byteOffset/byteLength in base64 round-trip (closes #587, #584)#683
rohitg00 merged 1 commit into
mainfrom
fix/587-vector-buffer-slice

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented May 27, 2026

Closes #587. Also resolves #584 (same root cause), and unblocks #455 / #469 which reported the same 2048-element phantom view.

What was happening

Buffer.from(b64, "base64") returns a slice of Node's internal Buffer pool (8KB). new Float32Array(buf.buffer) ignores buf.byteOffset and buf.byteLength — it mints a Float32 view over the entire pool, not the slice. For embeddings whose serialised form is small enough to fit in the pool (≤8KB raw bytes, i.e. ≤2048 floats), restart-time deserialisation produces a 2048-dim vector. The live index then refuses to start with dimensions seen on disk: 2048.

Same risk on the encode side: Buffer.from(arr.buffer) drops slice metadata when arr is a sub-view (e.g. .subarray()).

Fix

Both helpers in src/state/vector-index.ts now pass byteOffset + byteLength explicitly. Decode scales length by Float32Array.BYTES_PER_ELEMENT for a self-documenting constant.

Tests

Two new regression cases:

  • 384-dim × 5 vectors round-trip (within pool threshold, hits the decode bug — fails without the fix as 2048-dim mismatch)
  • subarray sub-view encode (hits the encode bug)

Full vector-index.test.ts suite: 11/11 pass.

Summary by CodeRabbit

  • Bug Fixes

    • Improved vector index serialization to correctly handle pooled and sliced Float32Array instances, preventing data corruption during round-trip operations.
  • Tests

    • Added regression tests verifying proper vector index serialization and deserialization behavior with various array source types.

Review Change Stack

Buffer.from(b64, 'base64') returns a slice of Node's shared 8KB pool, and new Float32Array(buf.buffer) ignores byteOffset/byteLength — minting a 2048-element view over the entire pool. Same hazard on the encode side when the source Float32Array is itself a sub-view (e.g. .subarray() or a typed-array set into a larger buffer).

The encode path now passes byteOffset/byteLength explicitly; decode mints the view at the correct offset with length scaled by Float32Array.BYTES_PER_ELEMENT.

Reported as 'dimensions seen on disk: 2048' index-startup crashes in #455 / #469 / #584 / #587.

Two regression tests added:
- 384-dim x 5 vectors round-trip (within pool threshold, hits the decode bug)
- subarray sub-view encode (hits the encode bug)
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment May 27, 2026 4:53pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR fixes a bug in Float32Array serialization where encoding/decoding the entire underlying buffer could create incorrectly sized "phantom" views from shared Buffer pool slices. The fix explicitly preserves byteOffset and byteLength boundaries during base64 round-trips. Two new regression tests validate the fix for pooled and sliced Float32Array scenarios.

Changes

Float32Array Serialization Fix

Layer / File(s) Summary
Float32Array base64 encoding with slice preservation
src/state/vector-index.ts
float32ToBase64 encodes using explicit byteOffset and byteLength; base64ToFloat32 decodes into a Float32Array view sized from the decoded buffer's byteOffset and element count, replacing the prior buffer-wide approach.
Serialization round-trip regression tests
test/vector-index.test.ts
Two tests verify that serialize→deserialize round-trips preserve vector identity and sliced byte content, validating searchability, dimension validation, and cosine similarity for pooled and sliced Float32Array cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • #587: Directly implements the fix to float32ToBase64/base64ToFloat32 to preserve Float32Array byteOffset/byteLength and avoid Buffer pool–sourced phantom views.
  • #584: Addresses the phantom pooled-Buffer/Float32Array view bug with the exact buffer-slice fixes applied in this PR.

Poem

🐰 A Float32 slice, once phantom and lost,
Is anchored at last by byteOffset crossed.
The vector dreams true through serialization,
No more Buffer ghosts from shared pool creation!
Round-trip tests dance with near-perfect precision.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: fixing base64 round-trip encoding/decoding to preserve byteOffset and byteLength in Float32Array serialization, which directly addresses the core bug described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/587-vector-buffer-slice

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/state/vector-index.ts (1)

1-7: ⚡ Quick win

Trim WHAT-level implementation narration in src code.

Please reduce this to a minimal WHY note (or remove it) to align with the project’s src comment style.

As per coding guidelines "Avoid code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/state/vector-index.ts` around lines 1 - 7, The comment block in
src/state/vector-index.ts contains WHAT-level implementation details about
Node's Buffer pool and Float32Array slicing; replace it with a minimal WHY-only
note (e.g., "Pass byteOffset and byteLength to preserve correct slice bounds and
avoid Buffer pool/Float32Array view issues") or remove the comment entirely so
it follows src comment style; locate the comment near the Buffer.from(...) / new
Float32Array(...) usage and update that comment accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/state/vector-index.ts`:
- Around line 1-7: The comment block in src/state/vector-index.ts contains
WHAT-level implementation details about Node's Buffer pool and Float32Array
slicing; replace it with a minimal WHY-only note (e.g., "Pass byteOffset and
byteLength to preserve correct slice bounds and avoid Buffer pool/Float32Array
view issues") or remove the comment entirely so it follows src comment style;
locate the comment near the Buffer.from(...) / new Float32Array(...) usage and
update that comment accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc010e5b-082a-4805-b47d-3bc99e83135a

📥 Commits

Reviewing files that changed from the base of the PR and between 4dd1dde and a395f92.

📒 Files selected for processing (2)
  • src/state/vector-index.ts
  • test/vector-index.test.ts

@rohitg00 rohitg00 merged commit e68f069 into main May 27, 2026
7 checks passed
@rohitg00 rohitg00 deleted the fix/587-vector-buffer-slice branch May 27, 2026 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant