Skip to content

ConcatenateMemory rents from CharArrayPool but allocates a fresh char[] anyway, defeating the pool #119

@stevehansen

Description

@stevehansen

Problem

ConcatenateMemory in Csv/CsvReader.cs (lines 354–374) is used to splice multiline CSV continuation lines on the ReadFromMemoryOptimized path. The intent is clearly "use the CsvMemoryOptions.CharArrayPool to avoid GC pressure during multiline concat" — but the implementation actually does both:

var buffer = memoryOptions.CharArrayPool.Rent(totalLength);
try
{
    var span = buffer.AsSpan();
    first.Span.CopyTo(span);
    separator.Span.CopyTo(span.Slice(first.Length));
    second.Span.CopyTo(span.Slice(first.Length + separator.Length));

    var result = new char[totalLength];          // <-- fresh heap allocation
    span.Slice(0, totalLength).CopyTo(result);   // <-- second copy
    return result.AsMemory();
}
finally
{
    memoryOptions.CharArrayPool.Return(buffer);
}

The pooled buffer is rented, written to, then copied into a brand-new char[] which is returned to the caller. The rented buffer is returned to the pool, having served only as scratch space that the surrounding code could have computed without renting at all.

Net effect:

  • One Rent + one Return per multiline continuation step (real cost in ArrayPool<char>.Shared overhead).
  • One fresh new char[totalLength] per step regardless — the same allocation pooling was meant to avoid.
  • An extra memory copy compared to a straightforward new char[totalLength] + three CopyTo calls.

So the pooled path is currently slower and uses more allocation pressure than a simple non-pooled implementation would. The string-based paths use string +=, which has its own allocation pattern but at least doesn't pretend to pool.

Why this wasn't caught

The intent matches the surrounding CsvMemoryOptions design (CharArrayPool, EnableZeroCopy, etc.). The code reads as if pooling, but the final new char[] step undoes the work. No benchmarks compare ReadFromMemoryOptimized allocations against ReadFromMemoryOptimized with this method bypassed, so the regression has been invisible.

This was flagged during the #118 refactor (issue #118 deliberately preserved the existing behavior verbatim to keep that diff scoped to consolidation, not optimization) but deserves its own issue.

Proposed fix

Two options, pick based on benchmark results:

Option A — drop the pool, allocate directly. If multiline continuation is rare (the common case is single-line records), the pool overhead exceeds its benefit:

var result = new char[totalLength];
var resultSpan = result.AsSpan();
first.Span.CopyTo(resultSpan);
separator.Span.CopyTo(resultSpan.Slice(first.Length));
second.Span.CopyTo(resultSpan.Slice(first.Length + separator.Length));
return result.AsMemory();

Simpler, one allocation, no pool churn.

Option B — actually pool the returned buffer. Return a ReadOnlyMemory<char> backed by the rented buffer, and have the caller (or a containing scope) Return it when done with the row. This requires lifetime tracking — the rented buffer must outlive the row record that views into it, and must be returned exactly once. This is the more invasive change and probably only worth doing if Option A's allocation profile still shows up in profiles.

Where this lives after #118

In Csv/CsvReader.Engine.cs, inside MemorySliceLineSource.Concat (and possibly MemoryReaderLineSource.Concat if it inherits the same pattern). The #118 refactor consolidated five copies of multiline-concat logic into these struct adapters, so the fix is now contained.

Acceptance

  • BenchmarkDotNet [MemoryDiagnoser] on ReadFromMemoryOptimized with a CSV containing multiline quoted fields shows fewer Gen 0 allocations after the fix than before.
  • No ArrayPool<char>.Shared.Rent call survives in the multiline-concat path unless Option B is chosen and the lifetime is correctly threaded.

Out of scope

  • Removing or redesigning CsvMemoryOptions. The other settings (StreamingThreshold, EnableZeroCopy, DirectAllocationThreshold, UseVectorization) may have similar "named-but-unused" issues but are not addressed here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions