Fix ParseFormatControls to handle nested parenthesized groups#4
Merged
Merged
Conversation
ParseFormatControls now recursively expands nested parenthesized sub-groups in ISO 8211 format control strings. For example, (b11,(3b24)) correctly produces [b11, b24, b24, b24] instead of only [b11]. When a format part is itself parenthesized (e.g. (3b24) or 3(b24)), the method strips the outer parens, parses the leading repeat count, and recursively calls ParseFormatControls on the inner content. This fixes parsing of S-101 fields like C3IL whose format controls contain nested groups such as (b11,(3b24)). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ParseFormatControlsinIso8211DataDescriptiveRecordReader.csdoesn't handle nested parenthesized groups in ISO 8211 format control strings.A real-world S-101 dataset (Canadian Hydrographic Service) has the C3IL field with format controls
(b11,(3b24)). The inner(3b24)is a parenthesized sub-group meaning "3 repetitions of b24". But the parser treats this as a single format part, passes(3b24)toParseSingleFormat, which sees the leading(as an unrecognized format character and returnsnull.This causes only 1 format entry (
b11) to be produced instead of 4 (b11, b24, b24, b24).Fix
When
ParseFormatControlsencounters a format part that is itself parenthesized (e.g.,(3b24)or3(b24)), it now:ParseFormatControlson the inner contentThis handles all forms of nested groups:
(3b24)→ 3×b243(b24)→ 3×b242(I(10),b14)→ 2×[I(10), b14]Tests
Added 3 new test cases:
(b11,(3b24))→ [b11, b24, b24, b24](b11,3(b24))→ [b11, b24, b24, b24](A,2(I(10),b14))→ [A, I(10), b14, I(10), b14]All 424 existing tests continue to pass.