Skip to content

Fix misc memory bugs [main]#1752

Merged
badrishc merged 1 commit intomainfrom
badrishc/memory-fixes-main
Apr 30, 2026
Merged

Fix misc memory bugs [main]#1752
badrishc merged 1 commit intomainfrom
badrishc/memory-fixes-main

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

Summary

Two callsites that invoke ZADD internally to populate a destination sorted set were leaking the IMemoryOwner<byte> that the ZADD backend rents from MemoryPool<byte>.Shared.

Root cause

The ZADD backend (SortedSetAdd) writes its integer reply through RespMemoryWriter. When the caller passes a default GarnetObjectStoreOutput (whose SpanByte length is 0), the very first WriteInt32 triggers ReallocateOutput, which rents a MemoryPool<byte> buffer (minimum ~512 bytes) and assigns it to
zAddOutput.SpanByteAndMemory.Memory.

Both callsites had:

var zAddOutput = new GarnetObjectStoreOutput();
RMWObjectStoreOperationWithOutput(destinationKey, ref zAddInput, ..., ref zAddOutput);
// zAddOutput never disposed

Neither disposed the resulting Memory. Under heavy GEO*STORE/ZUNIONSTORE/ZINTERSTORE traffic this caused real MemoryPool churn and GC pressure.

Fix

Wrap each call in try/finally that disposes the Memory if !IsSpanByte:

var zAddOutput = new GarnetObjectStoreOutput();
try
{
    RMWObjectStoreOperationWithOutput(destinationKey, ref zAddInput, ..., ref zAddOutput);
    // ... use zAddOutput as needed ...
}
finally
{
    if (!zAddOutput.SpanByteAndMemory.IsSpanByte)
        zAddOutput.SpanByteAndMemory.Memory?.Dispose();
}

Files changed

  • libs/server/Storage/Session/ObjectStore/SortedSetGeoOps.cs (GEO*STORE)
  • libs/server/Storage/Session/ObjectStore/SortedSetOps.cs (ZUNIONSTORE, ZINTERSTORE)

Validation

  • All 312 RespSortedSet + RespSortedSetGeo tests pass
  • dotnet format --verify-no-changes clean

Note on related fixes on dev

This is a subset of the fixes on the companion dev branch (badrishc/memory-fixes). The other fixes from that branch were intentionally not ported here because they do not apply to main:

  • The BITOP overflow-pointer fix relies on the ISourceLogRecord / LogRecord / OverflowByteArray model that exists only on dev. On main the BITOP backend operates on SpanByte values that are always pinned in log memory, so the use-after-fixed bug fixed on dev does not exist on main.
  • The PFCOUNT/PFMERGE bounds-check tightening from dev is unnecessary on main because the main backend already validates value.Length <= dst.Length before the Buffer.MemoryCopy, so the wrong-capacity argument is gated.

Copilot AI review requested due to automatic review settings April 29, 2026 20:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes pooled-buffer leaks when internal ZADD operations are invoked to populate destination sorted sets, by ensuring heap-backed outputs are properly disposed.

Changes:

  • Wrap internal RMWObjectStoreOperationWithOutput(... ZADD ...) calls in try/finally blocks.
  • Dispose GarnetObjectStoreOutput.SpanByteAndMemory.Memory (when heap-backed) to return rented MemoryPool<byte> buffers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
libs/server/Storage/Session/ObjectStore/SortedSetOps.cs Disposes internal ZADD output for ZUNIONSTORE/ZINTERSTORE-style range-store flow to prevent MemoryPool buffer leaks.
libs/server/Storage/Session/ObjectStore/SortedSetGeoOps.cs Disposes internal ZADD output for GEO*STORE flow to prevent MemoryPool buffer leaks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libs/server/Storage/Session/ObjectStore/SortedSetOps.cs
Comment thread libs/server/Storage/Session/ObjectStore/SortedSetGeoOps.cs
@badrishc badrishc force-pushed the badrishc/memory-fixes-main branch 2 times, most recently from a6561e6 to ef0d985 Compare April 30, 2026 02:00
## Summary

Two correctness bugs in the sorted-set object store and the BITOP read
loop.

## 1. Sorted-set `IMemoryOwner` leaks in `GEO*STORE` / `ZUNIONSTORE` / `ZINTERSTORE`

Three pooled-buffer leaks where backends (`GeoSearch`, `SortedSetRange`,
internal `ZADD`'s `SortedSetAdd`) wrote replies via `RespMemoryWriter`,
which (with a default `SpanByte`) rents a `MemoryPool<byte>` buffer
(≥512 bytes) and assigns it to `.SpanByteAndMemory.Memory`:

- **`SortedSetGeoOps.cs` (`GEO*STORE`)** — the `searchOutMem.Memory`
  from `GeoSearch` was leaked (only `searchOutHandler` — the
  `MemoryHandle` from `Pin()` — was disposed). The internal `ZADD`
  invocation (`zAddOutput`) was discarded entirely.
- **`SortedSetOps.cs` (`ZUNIONSTORE` / `ZINTERSTORE`)** — same pattern:
  `rangeOutputMem.Memory` from `SortedSetRange` was leaked, and the
  internal `ZADD`'s `zAddOutput.SpanByteAndMemory.Memory` was leaked.

Under heavy `GEO*STORE` / `ZUNIONSTORE` / `ZINTERSTORE` traffic this was
real `MemoryPool` churn and GC pressure.

### Fix

- For each `*STORE`-style internal `ZADD`, wrap the
  `RMWObjectStoreOperationWithOutput` call in a `try` / `finally` that
  disposes `zAddOutput.SpanByteAndMemory.Memory` if `!IsSpanByte`.
- Extend the existing `finally` blocks that dispose `*Handler` (the
  `MemoryHandle`) to also dispose the underlying `*Mem.Memory`
  (the `IMemoryOwner<byte>`).

## 2. BITOP: pending-completion epoch tracking was broken

`StorageSession.HeadAddress` was a `readonly long` field captured at
session-construction time and never updated.
`MainStoreOps.ReadWithUnsafeContext` compared it against itself
(`HeadAddress == localHeadAddress`) to decide whether to set
`epochChanged = true` after pending completion. Two bugs:

1. The field is frozen, so the check was meaningless — the live store
   `HeadAddress` was never consulted.
2. The condition was also **inverted**: it set `epochChanged = true`
   when the addresses were equal (i.e., head did NOT move), the
   opposite of what the comment said.

In addition, `Read` can return synchronously with a pointer into the
**read cache** (a separate log with its own `HeadAddress` that can be
evicted independently of the main log). The original check would not
detect read-cache eviction.

### Fix

- Removed the stale `StorageSession.HeadAddress` field.
- Added `ClientSession.HeadAddress` and `ClientSession.ReadCacheHeadAddress`
  accessors that read the live values from `store.Log.HeadAddress` /
  `store.ReadCache?.HeadAddress`.
- `ReadWithUnsafeContext` now captures both addresses at the start of
  the BITOP loop and, after pending completion, sets `epochChanged = true`
  if **either** has advanced — correctly invalidating any pointers
  captured into either log.

## Files changed

- `libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs` —
  new `HeadAddress` and `ReadCacheHeadAddress` live accessors
- `libs/server/Storage/Session/StorageSession.cs` — removed stale
  `HeadAddress` field
- `libs/server/Storage/Session/MainStore/MainStoreOps.cs` —
  `ReadWithUnsafeContext` uses live addresses with the corrected
  comparison; signature now also takes `localReadCacheHeadAddress`
- `libs/server/Storage/Session/MainStore/BitmapOps.cs` — captures both
  live head addresses; passes them to `ReadWithUnsafeContext`
- `libs/server/Storage/Session/ObjectStore/SortedSetGeoOps.cs` — leak
  fixes (`searchOutMem` + `zAddOutput`)
- `libs/server/Storage/Session/ObjectStore/SortedSetOps.cs` — leak
  fixes (`rangeOutputMem` + `zAddOutput`)

## Validation

- All 660 sorted-set + geo + bitmap tests pass on `main`
- `dotnet format --verify-no-changes` clean

## Note on related fixes on `dev`

This is a subset of the fixes on the companion `dev` branch
(`badrishc/memory-fixes`). Other fixes from that branch were
intentionally **not** ported here because they do not apply to `main`:

- The BITOP overflow-pointer fix relies on the `ISourceLogRecord` /
  `LogRecord` / `OverflowByteArray` model that exists only on `dev`. On
  `main` the BITOP backend operates on `SpanByte` values that are always
  pinned in log memory, so the use-after-fixed bug fixed on `dev` does
  not exist on `main`.
- The PFCOUNT/PFMERGE bounds-check tightening from `dev` is unnecessary
  on `main` because the `main` backend already validates
  `value.Length <= dst.Length` *before* the `Buffer.MemoryCopy`, so the
  wrong-capacity argument is gated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@badrishc badrishc force-pushed the badrishc/memory-fixes-main branch from ef0d985 to b5c8b57 Compare April 30, 2026 03:23
@badrishc badrishc merged commit 4ddd785 into main Apr 30, 2026
34 checks passed
@badrishc badrishc deleted the badrishc/memory-fixes-main branch April 30, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants