refactor: align internal terminology with ubiquitous language by stevehansen · Pull Request #127 · stevehansen/csv

stevehansen · 2026-05-30T11:28:15Z

What

Aligns the codebase to a single documented vocabulary (record / physical line / field / value / quoting), captured in a new UBIQUITOUS_LANGUAGE.md. This came out of a glossary pass that surfaced six pervasive terminology conflicts; this PR fixes the ones that are safe to fix now.

Why

Csv is a public package with millions of downloads, so renaming a public member is a SemVer-major break. The changes here are scoped by blast radius so nothing in the public contract moves:

Bucket	Action
🟢 Internal / private identifiers	Renamed (visible only to `Csv.Tests` via `InternalsVisibleTo` → zero ecosystem impact)
🟡 Public XML-doc text	Reworded to canonical terms (not part of the binary/source contract)
🔴 Public member names	Untouched — deferred to a future vNext behind `[Obsolete]` forwarders

Changes

Internal renames

Reader record classes: rawSplitLine→rawFields, RawSplitLine→RawFields, parsedLine→parsedValues, and the private property literally named Line (which returned the parsed field array) → ParsedValues.
Writer escape-vs-quote fix: FixedEscapeChars→QuoteTriggerChars, escapeChars→quoteTriggerChars, needsGeneralEscape→needsQuoting, and the wrap-the-field escape flag → mustQuote. needsQuoteEscape is kept (it genuinely means quote-doubling). cell/WriteCell/WriteRow/WriteLine are kept for consistency with the public writer API.

Doc rewording (non-breaking)

ColumnCount now documents "number of fields in this record"; ValidateColumnCount matches "field count per row"; Read* summaries say "Reads the records"; int indexers and ICsvLineSpan.GetSpan/GetMemory/TryGet* document a "field index"; CsvBufferWriter.WriteCell documents "quoting and escaping".

New file

UBIQUITOUS_LANGUAGE.md — the glossary, the blast-radius analysis, what changed in this pass, and a vNext rename-target table for the frozen public names (ColumnCount→FieldCount, ValidateColumnCount→ValidateFieldCount, LineHasColumn→RecordHasValue, ICsvLine→ICsvRecord).

Verification

Builds on netstandard2.0 / net8.0 / net9.0, 0 errors.
All 179 tests pass.
No public API surface changed (renames are internal/private; only XML-doc text and internal identifiers were touched).

🤖 Generated with Claude Code

Internal/private identifiers and XML-doc comments now follow a single documented vocabulary (record / physical line / field / value / quoting). No public API changes — every rename is internal or doc-only. - Reader record classes: rawSplitLine->rawFields, parsedLine->parsedValues, and the private `Line` property (which returned the parsed field array) ->ParsedValues. - Writer: fix the escape-vs-quote naming (FixedEscapeChars->QuoteTriggerChars, needsGeneralEscape->needsQuoting, the wrap-the-field `escape` flag->mustQuote). Kept cell/WriteCell/WriteRow for consistency with the public writer API. - Reword misleading public XML docs (ColumnCount counts fields, Read* yields records, int indexers take a field index, WriteCell does quoting and escaping). - Add UBIQUITOUS_LANGUAGE.md: the glossary, blast-radius analysis, and a vNext rename-target list for the frozen public names. Builds on netstandard2.0/net8.0/net9.0; all 179 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request aligns the codebase and XML documentation with a newly defined 'Ubiquitous Language' glossary, renaming internal fields and updating comments to consistently use terms like 'record', 'field', and 'quoting' instead of 'line', 'cell', and 'escaping'. Feedback points out a critical inconsistency where \r is omitted from the quote-trigger characters in CsvWriter (unlike CsvBufferWriter), which could result in malformed CSVs. Additionally, the reviewer recommends caching the quote-trigger characters in a static array to avoid unnecessary per-row allocations in the writer's serialization methods.

gemini-code-assist · 2026-05-30T11:29:31Z

 #if NET8_0_OR_GREATER
        // The separator is per-call so it can't be baked into a single cached SearchValues.
-        // Keep the fixed escape chars cached and check the separator with a separate Contains.
+        // Keep the fixed quote-trigger chars cached and check the separator with a separate Contains.
        // Without this caching, MemoryExtensions.IndexOfAny(ReadOnlySpan, ReadOnlySpan)/char[]
        // builds a fresh SearchValues<char> on the heap every call.
-        private static readonly SearchValues<char> FixedEscapeChars = SearchValues.Create("'\n");
+        private static readonly SearchValues<char> QuoteTriggerChars = SearchValues.Create("'\n");
 #endif


There is an inconsistency in the quote-trigger characters between CsvWriter and CsvBufferWriter. CsvBufferWriter correctly includes \r (carriage return) in its QuoteTriggerChars ("'\n\r"), but CsvWriter only includes '\n in its QuoteTriggerChars and quoteTriggerChars arrays.

Omitting \r can cause fields containing carriage returns to be written without quotes, resulting in malformed CSV files when read by standard parsers.

Additionally, we can define a static FixedQuoteTriggerChars array here to avoid allocating a new char[] array on every row write in WriteLine and WriteLineAsync.

#if NET8_0_OR_GREATER // The separator is per-call so it can't be baked into a single cached SearchValues. // Keep the fixed quote-trigger chars cached and check the separator with a separate Contains. // Without this caching, MemoryExtensions.IndexOfAny(ReadOnlySpan, ReadOnlySpan)/char[] // builds a fresh SearchValues<char> on the heap every call. private static readonly SearchValues<char> QuoteTriggerChars = SearchValues.Create("'\n\r"); #endif private static readonly char[] FixedQuoteTriggerChars = { '\'', '\n', '\r' };

gemini-code-assist · 2026-05-30T11:29:31Z

+            var quoteTriggerChars = new[] { separator, '\'', '\n' };
            for (var i = 0; i < columnCount; i++)
            {
                if (i > 0)
                    writer.Write(separator);

                if (i < data.Length)
                {
-                    var escape = false;
+                    var mustQuote = false;
                    var cell = data[i] ?? string.Empty;
 #if NET8_0_OR_GREATER
                    if (cell.Contains('"'))
 #else
                    if (cell.Contains("\""))
 #endif
                    {
-                        escape = true;
+                        mustQuote = true;
                        cell = cell.Replace("\"", "\"\"");
                    }
-                    else if (cell.IndexOfAny(escapeChars) >= 0)
-                        escape = true;
+                    else if (cell.IndexOfAny(quoteTriggerChars) >= 0)
+                        mustQuote = true;


Allocating a new char[] array (quoteTriggerChars) on every single call to WriteLine introduces unnecessary garbage collection pressure, especially when writing large datasets.

By using the static FixedQuoteTriggerChars array and checking the dynamic separator character separately via IndexOf, we can completely eliminate this per-row allocation.

for (var i = 0; i < columnCount; i++) { if (i > 0) writer.Write(separator); if (i < data.Length) { var mustQuote = false; var cell = data[i] ?? string.Empty; #if NET8_0_OR_GREATER if (cell.Contains('"')) #else if (cell.Contains("\"")) #endif { mustQuote = true; cell = cell.Replace("\"", "\"\""); } else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedQuoteTriggerChars) >= 0) mustQuote = true;

gemini-code-assist · 2026-05-30T11:29:31Z

+            var quoteTriggerChars = new[] { separator, '\'', '\n' };
            for (var i = 0; i < columnCount; i++)
            {


Similar to WriteLine, we can avoid allocating the quoteTriggerChars array on every call to WriteLineAsync by removing it and using FixedQuoteTriggerChars instead.

for (var i = 0; i < columnCount; i++) {

gemini-code-assist · 2026-05-30T11:29:31Z

                        await writer.WriteAsync('"').ConfigureAwait(false);
                    }
-                    else if (cell.IndexOfAny(escapeChars) >= 0)
+                    else if (cell.IndexOfAny(quoteTriggerChars) >= 0)


Update the check in WriteLineAsync to use FixedQuoteTriggerChars and check the separator separately, avoiding the allocated array.

else if (cell.IndexOf(separator) >= 0 || cell.IndexOfAny(FixedQuoteTriggerChars) >= 0)

stevehansen · 2026-05-30T12:09:02Z

Thanks @gemini-code-assist — both points are valid, and the \r one is a genuine pre-existing bug:

\r missing from CsvWriter's quote triggers (HIGH): confirmed. CsvWriter triggered on ", the separator, ', and \n but not \r, while CsvBufferWriter already included it — so a field like a\rb was written unquoted and is malformed for strict RFC-4180 parsers.
Per-row char[] allocation (MEDIUM): agreed, worth hoisting to a static array + a separate separator check.

This PR is a deliberately behavior-neutral terminology/rename pass, so to keep it clean I've addressed both in a focused follow-up rather than here: #128 (adds \r across all CsvWriter paths — sync, async, and the ReadOnlyMemory<char> paths — removes the allocation, and adds regression tests + a CHANGELOG entry). This PR stays terminology-only.

gemini-code-assist · 2026-05-30T12:09:15Z

Thanks for the update, @stevehansen. That sounds like a solid plan. Keeping the terminology refactor focused while addressing the \r bug and allocation improvements in a separate, dedicated PR (#128) is definitely the right approach to ensure both are handled cleanly and effectively. I look forward to reviewing the follow-up.

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

stevehansen mentioned this pull request May 30, 2026

fix: quote fields containing a carriage return in CsvWriter #128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: align internal terminology with ubiquitous language#127

refactor: align internal terminology with ubiquitous language#127
stevehansen wants to merge 1 commit into
masterfrom
refactor/align-internal-terminology

stevehansen commented May 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

stevehansen commented May 30, 2026

Uh oh!

gemini-code-assist Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevehansen commented May 30, 2026

What

Why

Changes

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

stevehansen commented May 30, 2026

Uh oh!

gemini-code-assist Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant