Problem
ConcatenateMemory in Csv/CsvReader.cs (lines 354–374) is used to splice multiline CSV continuation lines on the ReadFromMemoryOptimized path. The intent is clearly "use the CsvMemoryOptions.CharArrayPool to avoid GC pressure during multiline concat" — but the implementation actually does both:
var buffer = memoryOptions.CharArrayPool.Rent(totalLength);
try
{
var span = buffer.AsSpan();
first.Span.CopyTo(span);
separator.Span.CopyTo(span.Slice(first.Length));
second.Span.CopyTo(span.Slice(first.Length + separator.Length));
var result = new char[totalLength]; // <-- fresh heap allocation
span.Slice(0, totalLength).CopyTo(result); // <-- second copy
return result.AsMemory();
}
finally
{
memoryOptions.CharArrayPool.Return(buffer);
}
The pooled buffer is rented, written to, then copied into a brand-new char[] which is returned to the caller. The rented buffer is returned to the pool, having served only as scratch space that the surrounding code could have computed without renting at all.
Net effect:
- One
Rent + one Return per multiline continuation step (real cost in ArrayPool<char>.Shared overhead).
- One fresh
new char[totalLength] per step regardless — the same allocation pooling was meant to avoid.
- An extra memory copy compared to a straightforward
new char[totalLength] + three CopyTo calls.
So the pooled path is currently slower and uses more allocation pressure than a simple non-pooled implementation would. The string-based paths use string +=, which has its own allocation pattern but at least doesn't pretend to pool.
Why this wasn't caught
The intent matches the surrounding CsvMemoryOptions design (CharArrayPool, EnableZeroCopy, etc.). The code reads as if pooling, but the final new char[] step undoes the work. No benchmarks compare ReadFromMemoryOptimized allocations against ReadFromMemoryOptimized with this method bypassed, so the regression has been invisible.
This was flagged during the #118 refactor (issue #118 deliberately preserved the existing behavior verbatim to keep that diff scoped to consolidation, not optimization) but deserves its own issue.
Proposed fix
Two options, pick based on benchmark results:
Option A — drop the pool, allocate directly. If multiline continuation is rare (the common case is single-line records), the pool overhead exceeds its benefit:
var result = new char[totalLength];
var resultSpan = result.AsSpan();
first.Span.CopyTo(resultSpan);
separator.Span.CopyTo(resultSpan.Slice(first.Length));
second.Span.CopyTo(resultSpan.Slice(first.Length + separator.Length));
return result.AsMemory();
Simpler, one allocation, no pool churn.
Option B — actually pool the returned buffer. Return a ReadOnlyMemory<char> backed by the rented buffer, and have the caller (or a containing scope) Return it when done with the row. This requires lifetime tracking — the rented buffer must outlive the row record that views into it, and must be returned exactly once. This is the more invasive change and probably only worth doing if Option A's allocation profile still shows up in profiles.
Where this lives after #118
In Csv/CsvReader.Engine.cs, inside MemorySliceLineSource.Concat (and possibly MemoryReaderLineSource.Concat if it inherits the same pattern). The #118 refactor consolidated five copies of multiline-concat logic into these struct adapters, so the fix is now contained.
Acceptance
- BenchmarkDotNet
[MemoryDiagnoser] on ReadFromMemoryOptimized with a CSV containing multiline quoted fields shows fewer Gen 0 allocations after the fix than before.
- No
ArrayPool<char>.Shared.Rent call survives in the multiline-concat path unless Option B is chosen and the lifetime is correctly threaded.
Out of scope
- Removing or redesigning
CsvMemoryOptions. The other settings (StreamingThreshold, EnableZeroCopy, DirectAllocationThreshold, UseVectorization) may have similar "named-but-unused" issues but are not addressed here.
Problem
ConcatenateMemoryinCsv/CsvReader.cs(lines 354–374) is used to splice multiline CSV continuation lines on theReadFromMemoryOptimizedpath. The intent is clearly "use theCsvMemoryOptions.CharArrayPoolto avoid GC pressure during multiline concat" — but the implementation actually does both:The pooled buffer is rented, written to, then copied into a brand-new
char[]which is returned to the caller. The rented buffer is returned to the pool, having served only as scratch space that the surrounding code could have computed without renting at all.Net effect:
Rent+ oneReturnper multiline continuation step (real cost inArrayPool<char>.Sharedoverhead).new char[totalLength]per step regardless — the same allocation pooling was meant to avoid.new char[totalLength]+ threeCopyTocalls.So the pooled path is currently slower and uses more allocation pressure than a simple non-pooled implementation would. The string-based paths use
string +=, which has its own allocation pattern but at least doesn't pretend to pool.Why this wasn't caught
The intent matches the surrounding
CsvMemoryOptionsdesign (CharArrayPool,EnableZeroCopy, etc.). The code reads as if pooling, but the finalnew char[]step undoes the work. No benchmarks compareReadFromMemoryOptimizedallocations againstReadFromMemoryOptimizedwith this method bypassed, so the regression has been invisible.This was flagged during the #118 refactor (issue #118 deliberately preserved the existing behavior verbatim to keep that diff scoped to consolidation, not optimization) but deserves its own issue.
Proposed fix
Two options, pick based on benchmark results:
Option A — drop the pool, allocate directly. If multiline continuation is rare (the common case is single-line records), the pool overhead exceeds its benefit:
Simpler, one allocation, no pool churn.
Option B — actually pool the returned buffer. Return a
ReadOnlyMemory<char>backed by the rented buffer, and have the caller (or a containing scope)Returnit when done with the row. This requires lifetime tracking — the rented buffer must outlive the row record that views into it, and must be returned exactly once. This is the more invasive change and probably only worth doing if Option A's allocation profile still shows up in profiles.Where this lives after #118
In
Csv/CsvReader.Engine.cs, insideMemorySliceLineSource.Concat(and possiblyMemoryReaderLineSource.Concatif it inherits the same pattern). The #118 refactor consolidated five copies of multiline-concat logic into these struct adapters, so the fix is now contained.Acceptance
[MemoryDiagnoser]onReadFromMemoryOptimizedwith a CSV containing multiline quoted fields shows fewerGen 0allocations after the fix than before.ArrayPool<char>.Shared.Rentcall survives in the multiline-concat path unless Option B is chosen and the lifetime is correctly threaded.Out of scope
CsvMemoryOptions. The other settings (StreamingThreshold,EnableZeroCopy,DirectAllocationThreshold,UseVectorization) may have similar "named-but-unused" issues but are not addressed here.