Fix attribute parsing edge cases for curly braces, colons, and escapes#110
Merged
dereuromark merged 1 commit intomasterfrom Mar 23, 2026
Merged
Fix attribute parsing edge cases for curly braces, colons, and escapes#110dereuromark merged 1 commit intomasterfrom
dereuromark merged 1 commit intomasterfrom
Conversation
This PR fixes three issues in attribute parsing to match the reference
JS implementation behavior:
1. Curly braces in quoted values: Use findAttributeEnd() instead of
naive strpos() for finding attribute block boundaries. This properly
handles quoted strings containing { and } characters like
`{code="{foo}"}`.
2. Colon in attribute keys: Allow colons in attribute key names to
support namespaced attributes like `xml:lang=en`. Also made key
patterns more permissive to allow underscore/hyphen prefixed keys
after whitespace, matching JS reference.
3. Backslash escape handling: Process escapes correctly for ASCII
punctuation (\\, \", \*, etc.) while keeping backslash before
alphanumerics literal (\n, \t, \U stay as-is).
Adds 17 new tests covering curly braces, colon keys, and escape handling.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #110 +/- ##
============================================
+ Coverage 93.61% 93.64% +0.03%
- Complexity 2328 2332 +4
============================================
Files 79 79
Lines 6168 6182 +14
============================================
+ Hits 5774 5789 +15
+ Misses 394 393 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes three issues in attribute parsing to match the reference JS implementation behavior:
1. Curly braces in quoted values
Use
findAttributeEnd()instead of naivestrpos()for finding attribute block boundaries. This properly handles quoted strings containing{and}characters.Before:
After:
2. Colon in attribute keys
Allow colons in attribute key names to support namespaced attributes like
xml:lang=en. Also made key patterns more permissive to allow underscore/hyphen prefixed keys after whitespace.Before:
After:
3. Backslash escape handling
Process escapes correctly for ASCII punctuation (
\\,\",\*, etc.) while keeping backslash before alphanumerics literal (\n,\t,\Ustay as-is).Before:
After:
Changes
src/Parser/InlineParser.php: Replacestrpos($text, '}', ...)withfindAttributeEnd()in 4 placessrc/Parser/Utility/AttributeParser.php:processEscapes()to only escape ASCII punctuationtests/TestCase/Parser/AttributeParserTest.php: Add 17 new testsAll verified against reference JS implementation (
@djot/djot).